Computing file hashes with node.js – part 2

At some point, I wrote this piece about how much computing file hashes in node.js used to suck.

Fast forward for about two and half years. At least under OS X, the situation is dramatically changed:

# node.js implementation
 
time node sha256.js xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
node sha256.js xubuntu-12.04.4-desktop-amd64.iso  5.33s user 0.89s system 108% cpu 5.729 total
 
time node sha256.js xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
node sha256.js xubuntu-12.04.4-desktop-amd64.iso  4.80s user 0.63s system 108% cpu 4.977 total
# GNU coreutils sha256sum implementation
 
time gsha256sum xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
gsha256sum xubuntu-12.04.4-desktop-amd64.iso  6.23s user 0.18s system 99% cpu 6.432 total
 
time gsha256sum xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
gsha256sum xubuntu-12.04.4-desktop-amd64.iso  6.28s user 0.17s system 98% cpu 6.529 total
# openssl 0.9.8y implementation
 
time openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso
SHA256(xubuntu-12.04.4-desktop-amd64.iso)= b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6
openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso  6.28s user 0.25s system 99% cpu 6.565 total
 
time openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso
SHA256(xubuntu-12.04.4-desktop-amd64.iso)= b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6
openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso  6.28s user 0.24s system 98% cpu 6.632 total

It is nice to see that it improved so much that it sits on top of the performance list, even though the difference is pretty much negligible now. It even makes use of more than one CPU core.

portspoof trolling

Marius once told me about portspoof. A service to troll those who use various scanners by feeding the scanners with false results. Well, while the idea is good, I’m wary about a service like this as this is the kind of service where you wouldn’t want a buffer overflow.

Giving it a run inside a VM, I noticed something odd when using nmap’s service and version detection probes. This happened on the lower ports (1-50). Then I started to look at something that started to look like a pattern, therefore I increased the port range to include 1-50. portspoof is indeed a tool that trolls baddies and pen testers.

Ran it with:

nmap -sV --version-all -p 1-50
 

Really smooth guys, really smooth. Sometimes you have to see the big picture:
big-picture

Converting a file to a JSON array

For some reason I need that. OK, not any reason. For integrating a CloudInit YAML file into an AWS CloudFormation template. By using this article as reference, I made a simple node.js script for doing just that.

#!/usr/bin/env node
 
var fs = require('fs');
 
fs.readFile(process.argv[2], function (err, file) {
	if (err) {
		console.error(err);
		process.exit(1);
	}
	file = file.toString().split('\n');
	var idx, aux = [];
	for (idx = 0; idx < file.length; idx++) {
		aux.push(file[idx]);
		aux.push('\n');
	}
	file = JSON.stringify(aux);
	console.log(file);
});

Save as something.js, make it an executable, then invoke it with ./something.js /path/to/file.

The end.

Converting a DMG to ISO under OS X

There’s a lot of wrong information floating on the internets. People that usually call no-OS X stuff “lesser operating systems” but with no clue about the different internals of a CDR image and an ISO image. CDR has a native OS X filesystem (HFS+), while ISO carries ISO9660. Just rename the CDR to ISO they say. It will be an ISO they say. However, that’s far from the truth.

The correct hdiutil command for converting a DMG to ISO is this one:

hdiutil makehybrid -iso -joliet -o output.iso input.dmg

file should return something like this:

file output.iso
output.iso: ISO 9660 CD-ROM filesystem data 'LABEL'

Performance breakdown for libxml-to-js

Background

libxml-to-js was born to solve a specific problem: to support my early efforts with aws2js. At the time, the options were fairly limited. xml2js was a carry-over from aws-lib which aws2js initially forked. I was never happy with xml2js for a couple of reasons: performance and error reporting. Therefore I looked for a solution to have a drop-in replacement. Borrowed some code from Brian White, made it fit to the xml2js (v1) formal specifications, then pushed it to GitHub. At some point the project had five watchers and five contributors. I guess it hit a sweet spot. That’s why it’s got support for XPath and CDATA, most of it from external contributions. And only then I started using it for other XML related stuff.

The name was chosen to make a distinction from libxmljs which sits at the core of this library which actually binds to Gnome’s libxml2.

Due to the fact that aws2js gained some popularity and I’m doing a complete rewrite with 0.9, the output of libxml-to-js most probably won’t change beyond the “specs” of xml2js v1.

Performance

The actual reason for why I’m writing this article is the fact that people keep asking about the reason for choosing libxml-to-js over xml2js, therefore next time when this question arrives, I am going to simply link this article.

Even now, two and a half years later, with some crappy benchmark that I pushed together, it is somewhere around 25-30% faster than xml2js under usual circumstances. In only specific cases that don’t apply to the XML returned by AWS, xml2js closes in. The part where it really shines is still the error reporting where besides the fact that’s accurate, it is also screaming fast compared to xml2js. In my tests it came out to be around 27X faster.

The code:

var Benchmark = require('benchmark');
 
var suite = new Benchmark.Suite;
 
var parser1 = require('libxml-to-js');
var parser2 = new require('xml2js').Parser({
    mergeAttrs: true,
    explicitRoot: false,
    explicitArray: false
}).parseString;
 
require('fs').readFile(process.argv[2], function(err, res) {
    if (err) {
        console.error(err);
        return;
    }
    var xml = res.toString();
    // add tests
    suite.add('XML#libxml-to-js', function() {
        parser1(xml, function(err, res) {});
    })
        .add('XML#xml2js', function() {
            parser2(xml, function(err, res) {});
        })
    // add listeners
    .on('cycle', function(event) {
        console.log(String(event.target));
    })
        .on('complete', function() {
            console.log('Fastest is ' + this.filter('fastest').pluck('name'));
        })
    // run async
    .run({
        'async': true
    });
 
});

The results, based onto the XML files from the libxml-to-js unit tests and the package.json for the error speed test:

# package.json
XML#libxml-to-js x 18,533 ops/sec ±3.46% (75 runs sampled)
XML#xml2js x 673 ops/sec ±1.35% (68 runs sampled)
Fastest is XML#libxml-to-js
 
# ec2-describeimages.xml
XML#libxml-to-js x 1,122 ops/sec ±4.59% (74 runs sampled)
XML#xml2js x 818 ops/sec ±7.02% (83 runs sampled)
Fastest is XML#libxml-to-js
 
# ec2-describevolumes-large.xml
XML#libxml-to-js x 65.41 ops/sec ±3.13% (65 runs sampled)
XML#xml2js x 50.88 ops/sec ±2.14% (65 runs sampled)
Fastest is XML#libxml-to-js
 
# element-cdata.xml
XML#libxml-to-js x 14,689 ops/sec ±5.41% (72 runs sampled)
XML#xml2js x 11,551 ops/sec ±2.36% (88 runs sampled)
Fastest is XML#libxml-to-js
 
# namespace.xml
XML#libxml-to-js x 9,702 ops/sec ±5.75% (72 runs sampled)
XML#xml2js x 5,802 ops/sec ±2.41% (81 runs sampled)
Fastest is XML#libxml-to-js
 
# root-cdata.xml
XML#libxml-to-js x 22,983 ops/sec ±7.11% (69 runs sampled)
XML#xml2js x 14,849 ops/sec ±6.01% (87 runs sampled)
Fastest is XML#libxml-to-js
 
# text.xml
XML#libxml-to-js x 2,669 ops/sec ±3.68% (78 runs sampled)
XML#xml2js x 2,617 ops/sec ±2.41% (88 runs sampled)
Fastest is XML#libxml-to-js
 
# wordpress-rss2.xml
XML#libxml-to-js x 2,056 ops/sec ±4.08% (75 runs sampled)
XML#xml2js x 1,226 ops/sec ±2.79% (84 runs sampled)
Fastest is XML#libxml-to-js

The tests ran under node.js v0.10.22 / OS X 10.9 / Intel Core i5-4250U CPU @ 1.30GHz with the latest module versions for both libxml-to-js and xml2js.