Category Archives: Programming

Poor man’s tail recursion in node.js

If you find yourself in the situation of doing recursion over a large-enough input in node.js, you may encounter this:

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
RangeError: Maximum call stack size exceeded

Oops, I smashed the stack. You may reproduce it with something like this:

var foo = []
 
for (var i = 0; i < 1000000; i++) {
    foo.push(i)
}
 
var recur = function (bar) {
    if (bar.length > 0) {
        var baz = bar.pop()
        // do something with baz
        recur(bar)
    } else {
        // end of recursion, do your stuff
    }
}
 
recur(foo)

“Thanks, that’s very thoughtful. But you’re not helping.” Bear with me. The solution is the obvious tail call elimination. But JavaScript doesn’t have that optimization.

However, you may wrap the tail call in order to call the above recur() function in a new stack. The proper recur() implementation is:

var recur = function (bar) {
    if (bar.length > 0) {
        var baz = bar.pop()
        // do something with baz
        process.nextTick(function () {
            recur(bar)
        })
    } else {
        // end of recursion, do your stuff
    }
}

Warning: please read this carefully. I gave you the solution for recurring over such a large input, but the performance is poor. Using process.nextTick (or a timer function such as setTimeout for that matter, slower BTW) is an expensive operation. Didn’t test where’s the actual bottleneck (epoll itself under Linux, libuv | libev, etc).

time node recur.js
node recur.js  1.36s user 0.28s system 101% cpu 1.610 total

The cost of this method is high. Therefore, don’t attempt this in a web application. It kills the event loop. For instance, I don’t use node for writing web applications. It is a difficult task, while the cost of the event loop itself isn’t that negligible as you may think. It useful as long as the CPU time is negligible compared to the time spent doing IO. Therefore, please, don’t include me in the group of people that thinks about node as the hammer for all the problems you throw at it.

If you’re wondering why I won’t just simply iterate the object, the answer is simple: because that “do something with baz” involves some async IO that would kill the second data provider. Sequential calls ensure that everybody in the architecture stays happy. Besides, I don’t actually use bar.pop(), but something like bar.splice(0, 5000) for packing more data in less remote calls and less events. bar.shift() in a situation like this is as slow as molasses in January. In an async framework, the order of the items from a TODO list is not relevant, therefore use the fastest way.

If you’re still wondering why I still use a solution like this, the above technique is part of the cost associated with the start-up cost. The application fetches all the required data in RAM. Having the application to kill the event loop for 20-30 seconds before hitting the Internet pipe is negligible for a process that runs for hours or days. After the application hits the Internet, only then I can say that node is in use for the stuff where it shines. I know, before this, I listed all the wrong reasons for using node as a tool.

Computing file hashes with node.js

Since node.js has the shiny crypto module which binds some stuff to the openssl library, people might be tempted to compute file hashes with node.js. At least the crypto manual page shows how to do a SHA1 for a given file (mimics sha1sum). Should people do this? The answer is: NO. Some may say because it blocks the event loop. I say: because it is as slow as molasses in January. At least compared to dedicated tools.

Let’s have a look:

var filename = process.argv[2];
var crypto = require('crypto');
var fs = require('fs');
 
var shasum = crypto.createHash('sha256');
 
var s = fs.ReadStream(filename);
s.on('data', function(d) {
  shasum.update(d);
});
 
s.on('end', function() {
  var d = shasum.digest('hex');
  console.log(d + '  ' + filename);
});

time node hash.js ubuntu-10.04.3-desktop-i386.iso
208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac ubuntu-10.04.3-desktop-i386.iso
node hash.js ubuntu-10.04.3-desktop-i386.iso 28.92s user 0.80s system 100% cpu 29.661 total

time sha256sum ubuntu-10.04.3-desktop-i386.iso
208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac ubuntu-10.04.3-desktop-i386.iso
sha256sum ubuntu-10.04.3-desktop-i386.iso 4.86s user 0.21s system 99% cpu 5.093 total

time openssl dgst -sha256 ubuntu-10.04.3-desktop-i386.iso
SHA256(ubuntu-10.04.3-desktop-i386.iso)= 208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac
openssl dgst -sha256 ubuntu-10.04.3-desktop-i386.iso 4.40s user 0.17s system 100% cpu 4.567 total

Edit: to sum up for those with little patience:

node hash.js – 29.661s
sha256sum – 5.093s
openssl dgst -sha256 – 4.567s

/Edit

That’s a ~6.5X speed boost just by invoking openssl alone instead of binding to its library. node.js does something terribly wrong somewhere since the file I/O is not to blame for the slowness:

var filename = process.argv[2];
var fs = require('fs');
 
var s = fs.ReadStream(filename);
s.on('data', function(d) {
 
});
 
s.on('end', function() {
 
});

time node read.js ubuntu-10.04.3-desktop-i386.iso
node read.js ubuntu-10.04.3-desktop-i386.iso 0.62s user 0.60s system 106% cpu 1.148 total

This little example that I hacked together shows that using child_process.exec is pretty fine:

var exec = require('child_process').exec;
exec('/usr/bin/env openssl dgst -sha256 ' + process.argv[2], function (err, stdout, stderr) {
	if (err) {
		process.stderr.write(err.message);
	} else {
		console.log(stdout.substr(-65, 64));
	}
});

time node hash2.js ubuntu-10.04.3-desktop-i386.iso
208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac
node hash2.js ubuntu-10.04.3-desktop-i386.iso 4.44s user 0.19s system 100% cpu 4.630 total

So you can have your cake and eat it too. The guys with the philosophy got this one right.

Will it recur? Part 2: in depth analysis

The social experiment

This first chapter is not about recursion. One member of the community wrote that certain inflammatory statements that I use may upset people. I replied with: “buzz marketing”. Neutral articles, with neutral titles, written by nobodies like me, gain zero traction, although I may write something that’s technically sound. Cheap journalism has more success. I even have graphs to prove it now.

The second stuff is the usefulness of my little experiment. I don’t know about others, but the curiosity was the main thing behind my whole benchmark. If it isn’t useful for some people, it doesn’t mean that it isn’t useful for others.

The other thing: the lack of tail recursion. I mean, do you need a “DUH” award, or something? The whole point of a “bad” algorithm that’s mathematically correct (well, almost, I stated that fibonacci(0) is wrong) is to prove how smart are specific compilers regarding recursion. The rest, simply do brute force.

Patterns that emerge

The numbers say something if you know how to read the page. There are runtimes that are optimized for doing proper recursion without bothering the programmer with it: C, D, PyPy, V8, LuaJIT, JVM. The rest aren’t: PHP, CPython, Ruby, Perl, Lua. PyPy and V8 could do better. LuaJIT is already close to the speed of unoptimized C and D. V8 isn’t the king of the hill if you take Ruby (MRI / KRI), CPython, plain Lua VM, and PHP (Zend Engine) out of the equation. This may be another opportunity to get bashed by the node.js benchmark police with “this is irrelevant” statements, although this wasn’t something that I wanted to prove.

Thing is, that for most of the web development, I rarely needed to actually solve purely recursive problems. At most a fairly simple tree. Sometimes even that simple tree didn’t actually require recursion. Therefore I get why some don’t optimize for this specific case, although they refer the thing as being “a general purpose language”.

For the “write better algorithms” crowd … WHY? The difference between C’s 0.6 seconds and Ruby’s 5 minutes doesn’t ring any bell that some things are fundamentally flawed regarding recursion?

As for the edge cases, there are 3rd party libraries that solve this issue without bothering the programmer. Or for other edge cases, such as applications that do complicated stuff, operating at Google-like scale, there are better tools that most mere mortals won’t use. The fact that some implementation do poor recursion is indeed irrelevant when the problems you’re trying to solve don’t include this.

In the end

Initially I wanted to try more stuff such as factorial, Euclid’s GCD, or the Ackermann function, for example. Try them on runtimes that don’t take longer than the next ice age to return a value. But why bother, except maybe to give the “one true way of doing recursion in functional languages” programmers a reason to bash stuff without returning any useful output. Not even an academic paper. It’s not productive.

But the question is: will it recur? Part 1: fibonacci(40)

A rant, maybe a bad rant due to babbling about philosophical reasons, made me wonder how the programming languages stack up against this stuff: recursion. Or should I say the runtimes, since the programming language itself is nothing but a bunch of text. I know that the algorithm itself is bad, but that’s the whole point. I know that fibonacci(0) yields a wrong result, but for the sake of lazyness, I kept the original algorithm.

The source code of all the tests is available here in order to make the tests to be reproducible. There wasn’t a high number of runs, particularly for the rutimes that take more than the next ice age. But the results are pretty consistent for specific runtimes. The relevant systems specs are: Ubuntu 10.04 amd64 (up to date), Q9400 CPU.

Now, less talk, more results.

JavaScript (node.js/V8)

node -v: v0.4.12
time node fib.js
node fib.js 6.40s user 0.02s system 99% cpu 6.423 total
node fib.js 6.39s user 0.02s system 99% cpu 6.410 total

It may seem slow, but for a language with dynamic typing, it puts the rest from the same category to shame. Or most of them. Bear with me.

PHP

php -v: PHP 5.3.8 (cli)
time php fib.php
php fib.php 77.57s user 0.06s system 99% cpu 1:17.66 total
php fib.php 78.05s user 0.07s system 99% cpu 1:18.18 total

Compared to the V8 runtime, PHP seems to take an eternity. It happens that PHP isn’t bad at recursion because it uses the stack, but because the lack of speed of the runtime. But we’re not even halfway there. Stay tuned. PHP isn’t the only thing that sucks at recursion.

C

gcc -v: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
make fib
time ./fib
./fib 2.84s user 0.00s system 100% cpu 2.840 total
./fib 2.84s user 0.00s system 100% cpu 2.835 total

It wasn’t a surprise that C came up to this result. Which makes the V8 result even more interesting.

Edit:
Forgot about the compiler optimization. Caught by Jabbles on HN.

gcc -O4 fib.c -o fib
time ./fib
./fib 0.65s user 0.01s system 100% cpu 0.657 total
gcc -O3 fib.c -o fib
time ./fib
./fib 0.66s user 0.00s system 100% cpu 0.657 total
gcc -O2 fib.c -o fib
time ./fib
./fib 1.54s user 0.00s system 100% cpu 1.535 total
gcc -O1 fib.c -o fib
time ./fib
./fib 0.00s user 0.00s system 0% cpu 0.001 total
time ./fib
165580141
./fib 2.06s user 0.00s system 99% cpu 2.060 total

For some reason, the O1 flag hates this code. Printing fibonacci(40) yields a result closer to the result without any O flag. This brings it past the Java result, but only for O3+.
/End Edit.

Lua

lua -v: Lua 5.1.4
time lua fib.lua
lua fib.lua 28.02s user 0.03s system 99% cpu 28.081 total
lua fib.lua 28.86s user 0.02s system 99% cpu 28.883 total

./luajit -v: LuaJIT 2.0.0-beta8
time ./luajit fib.lua
./luajit fib.lua 10.59s user 0.00s system 99% cpu 10.591 total
./luajit fib.lua 10.58s user 0.01s system 99% cpu 10.610 total

Tested both of the implementations that I know of. I guess this article isn’t that funny for the generations of Lua coders that laugh about V8 in somebody’s face. Don’t get me wrong, I like Lua due to its simplicity, but in the speed realm, I still need to do some tests to verify some of those claims that sometimes appear to be overly inflated.

Edit:

With the following Lua script:

local function fibonacci(n)
	if n < 2 then
		return 1
	else
		return fibonacci(n - 2) + fibonacci(n - 1)
	end
end
fibonacci(40)

the results are getting better:

time lua fib.lua
lua fib.lua 24.17s user 0.08s system 99% cpu 24.281 total
lua fib.lua 24.24s user 0.01s system 99% cpu 24.307 total

[with LuaJIT v2.0.0-beta8 GIT HEAD]
time ./luajit fib.lua
./luajit fib.lua 2.02s user 0.00s system 99% cpu 2.026 total
./luajit fib.lua 2.02s user 0.00s system 99% cpu 2.023 total

Now, some of the Lua chops can have a lulz about V8. This project is getting more interesting, especially for pairing LuaJIT with luafcgid. I forgot about the local keyword since my Lua experience is limited to basic testing. Nice comeback!

/End Edit.

Python

python -V: Python 2.6.5
time python fib.py
python fib.py 59.42s user 0.02s system 99% cpu 59.494 total
python fib.py 59.27s user 0.05s system 99% cpu 59.375 total

./configure
make -j 4
./python -V: Python 2.7.2
time ./python fib.py
./python fib.py 61.29s user 0.03s system 99% cpu 1:01.35 total
./python fib.py 61.38s user 0.06s system 99% cpu 1:01.48 total

./configure
make -j 4
./python -V: Python 3.2.2
./python fib.py 71.23s user 0.08s system 99% cpu 1:11.33 total
./python fib.py 70.31s user 0.06s system 99% cpu 1:10.39 total

./pypy -V
Python 2.7.1 (d8ac7d23d3ec, Aug 17 2011, 11:51:19)
[PyPy 1.6.0 with GCC 4.4.3]
./pypy fib.py 4.61s user 0.07s system 99% cpu 4.708 total
./pypy fib.py 4.81s user 0.01s system 99% cpu 4.853 total

Happens to have 2.6.5 around because Ubuntu says so. But in order to make the potential trolls to STFU about not using the latest versions, I made some fresh builds of 2.7.2 and 3.2.2. It gets even suckier with recent versions. In fact, the CPython runtime is struggling to catch up the PHP runtime on the slowness realm. The only Python runtime that is actually very impressive about the recursion stuff is PyPy. Which brings me to the first statement: the language is just a bunch of text. The runtime is the piece that sucks or does not suck. PyPy proves that with talented people shepherding the project, the language of the runtime implementation is quite irrelevant. This is the first implementation of a JIT that passes V8 as well.

Ruby

ruby -v: ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
time ruby fib.rb
ruby fib.rb 233.40s user 66.94s system 99% cpu 5:00.55 total
ruby fib.rb 231.75s user 68.08s system 99% cpu 4:59.99 total

./configure
make -j 4
./miniruby -v: ruby 1.9.3dev (2011-09-23 revision 33323) [x86_64-linux]
time ./miniruby fib.rb
./miniruby fib.rb 36.05s user 0.01s system 99% cpu 36.073 total
./miniruby fib.rb 35.92s user 0.04s system 99% cpu 35.978 total

Everytime a Ruby fan says that “thou shalt not care about the runtime speed” makes me laugh so hard up to the point of bursting into tears. Seriously, I couldn’t imagine that MRI sucks that hard at recursion. I barely had the patience to even run this code. KRI washes part of the shame though while it scores closely to the Lua implementation. If you’re asking why I used the miniruby binary, the reason is that the ruby binary complained about not having rubygems.rb. I am bad at figuring out what’s missing from a Ruby stack. But it made the fib.rb work.

Perl

perl -v: This is perl, v5.10.1 (*) built for x86_64-linux-gnu-thread-multi
time perl fib.pl
perl fib.pl 125.60s user 0.17s system 99% cpu 2:05.92 total
perl fib.pl 124.06s user 0.10s system 99% cpu 2:04.20 total

./Configure [accepted all defaults, specifically built without threading]
make
./perl -v: This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux
./perl fib.pl 100.12s user 0.09s system 99% cpu 1:40.29 total
./perl fib.pl 100.38s user 0.05s system 99% cpu 1:40.63 total

At first I didn’t want to bother with Perl, but then I remembered the legions of Perl fans ranting about the PHP recursion. I know that this is an inflammatory statement, but next time, people, please keep up with the facts. I guess you aren’t that smug now.

D

gdc -v: gcc version 4.3.4 (Ubuntu 1:1.046-4.3.4-3ubuntu1)
gdc -o fib fib.c (same source as the C binary)
time ./fib
./fib 2.82s user 0.00s system 100% cpu 2.817 total
./fib 2.82s user 0.00s system 100% cpu 2.814 total

Predictable results from a language from the same family as C/C++. Slightly faster binary that the C version (although the same source code), but I guess most people won’t notice.

Java

javac -version: gcj-4.4 (Ubuntu 4.4.3-1ubuntu4.1) 4.4.3
java -version
java version “1.6.0_20”
OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.04.2)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
javac fib.java
time java fib
java fib 0.86s user 0.02s system 99% cpu 0.882 total
java fib 0.86s user 0.02s system 100% cpu 0.872 total

I admit that sometimes I use to tell this joke: knock! knock!; who’s there?; [very long pause]; Java. I guess that now is a good time to swallow my own words. Not only that Java puts the other JIT implementations to shame, the rest of the VMs to shame, it also obliterates the statically compiled C and D binaries at their own favorite game aka the runtime speed. My first reaction was: WTF, there’s got to be a mistake! Printing some junk to STDIO confirmed the same results between C and Java. Newbie warning: this is my first Java application. No, really! Don’t bash me for the lack of understanding of the usage of the static keyword. I don’t understand if it actually helps the runtime. I managed to put together the code by reading how to write a simple HelloWorldApp. Experienced Java chops may explain it though.