printf(" SaltwaterC "); /dev/urandom Wed, 02 Jul 2014 18:45:46 +0000 en-US hourly 1 Using persistent OpenSSH connections Wed, 02 Jul 2014 18:45:46 +0000 I found out that using persistent connections greatly improves the productivity when working with SSH. However, finding the appropriate configuration turned out to be a complicated task. I wanted it to be as unobtrusive as possible, to restart the connection when the socket is closed, and to work without blocking timeouts.

After reading the ssh_config man page and some articles, here’s the best thing I came up with:

Host *
	ControlPath ~/.ssh/master-%r@%h:%p
	ControlMaster auto
	ControlPersist 4h
	TCPKeepAlive no
	GSSAPIAuthentication no
	ServerAliveInterval 60
	ServerAliveCountMax 2

The only issue with this configuration is with long hosts (eg: a really long name) as it hits the UNIX_PATH_MAX limit. Unfortunately, the proper solution to this issue isn’t merged into upstream.

The OS X users who also use brew may easily include the patch for the path issue by editing the openssh formula for OpenSSH 6.6p1 with “brew edit openssh”:

  patch do
    url ""
    sha1 "31f6df29ff7ce3bc22ba9bad94abba9389896c26"

With this patch, a value like ~/.ssh/master-%m works for ControlPath. %m is replaced by SHA1(lhost(%l) + rhost(%h) + rport(%p) + ruser(%r)) and it keeps things short and sweet.

]]> 0
Getting a HTTPS certificate information into the shell Sat, 17 May 2014 10:05:30 +0000 Due to the HeartBleed SNAFU, I needed a quick solution for getting the information from a certificate deployed on a remote machine. As I rarely leave the comfort of my terminal, as always, I simply dumped a new function into the shell’s ~/.*rc file.

Here it is:

Defaults to port 443 if the second argument is unspecified. Example:

        Version: 3 (0x2)
        Serial Number:
        Signature Algorithm: sha1WithRSAEncryption
        Issuer: C=US, O=Google Inc, CN=Google Internet Authority G2
            Not Before: May  7 12:15:37 2014 GMT
            Not After : Aug  5 00:00:00 2014 GMT
        Subject: C=US, ST=California, L=Mountain View, O=Google Inc, CN=*
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
            RSA Public Key: (2048 bit)
                Modulus (2048 bit):
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Subject Alternative Name:
                DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*, DNS:*,,,,,,,,,,
            Authority Information Access:
                CA Issuers - URI:
                OCSP - URI:
            X509v3 Subject Key Identifier:
            X509v3 Basic Constraints: critical
            X509v3 Authority Key Identifier:
            X509v3 Certificate Policies:
            X509v3 CRL Distribution Points:
    Signature Algorithm: sha1WithRSAEncryption
]]> 0
Computing file hashes with node.js – part 2 Fri, 16 May 2014 12:26:09 +0000 At some point, I wrote this piece about how much computing file hashes in node.js used to suck.

Fast forward for about two and half years. At least under OS X, the situation is dramatically changed:

# node.js implementation
time node sha256.js xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
node sha256.js xubuntu-12.04.4-desktop-amd64.iso  5.33s user 0.89s system 108% cpu 5.729 total
time node sha256.js xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
node sha256.js xubuntu-12.04.4-desktop-amd64.iso  4.80s user 0.63s system 108% cpu 4.977 total
# GNU coreutils sha256sum implementation
time gsha256sum xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
gsha256sum xubuntu-12.04.4-desktop-amd64.iso  6.23s user 0.18s system 99% cpu 6.432 total
time gsha256sum xubuntu-12.04.4-desktop-amd64.iso
b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6  xubuntu-12.04.4-desktop-amd64.iso
gsha256sum xubuntu-12.04.4-desktop-amd64.iso  6.28s user 0.17s system 98% cpu 6.529 total
# openssl 0.9.8y implementation
time openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso
SHA256(xubuntu-12.04.4-desktop-amd64.iso)= b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6
openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso  6.28s user 0.25s system 99% cpu 6.565 total
time openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso
SHA256(xubuntu-12.04.4-desktop-amd64.iso)= b952308743f1cce2089e03714a54774070891efaef4e7e537b714ee64295efe6
openssl dgst -sha256 xubuntu-12.04.4-desktop-amd64.iso  6.28s user 0.24s system 98% cpu 6.632 total

It is nice to see that it improved so much that it sits on top of the performance list, even though the difference is pretty much negligible now. It even makes use of more than one CPU core.

]]> 0
portspoof trolling Tue, 11 Mar 2014 08:28:16 +0000 Marius once told me about portspoof. A service to troll those who use various scanners by feeding the scanners with false results. Well, while the idea is good, I’m wary about a service like this as this is the kind of service where you wouldn’t want a buffer overflow.

Giving it a run inside a VM, I noticed something odd when using nmap’s service and version detection probes. This happened on the lower ports (1-50). Then I started to look at something that started to look like a pattern, therefore I increased the port range to include 1-50. portspoof is indeed a tool that trolls baddies and pen testers.

Ran it with:

nmap -sV --version-all -p 1-50
1/tcp  open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
2/tcp  open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
3/tcp  open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
4/tcp  open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
5/tcp  open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
6/tcp  open  smtp    Unrecognized SMTP service (12345 0ffffffffffffffffffffffffffffffffffffffffffffffffffff00)
7/tcp  open  smtp    Unrecognized SMTP service (12345 0fffffffffffff777778887777777777cffffffffffffffffffff00)
8/tcp  open  smtp    Unrecognized SMTP service (12345 0fffffffffff8000000000000000008888887cfcfffffffffffff00)
9/tcp  open  smtp    Unrecognized SMTP service (12345 0ffffffffff80000088808000000888800000008887ffffffffff00)
10/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffff70000088800888800088888800008800007ffffffff00)
11/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffff000088808880000000000000088800000008fffffff00)
12/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffff80008808880000000880000008880088800008ffffff00)
13/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffff000000888000000000800000080000008800007fffff00)
14/tcp open  smtp    Unrecognized SMTP service (12345 0fffffff8000000000008888000000000080000000000007fffff00)
15/tcp open  smtp    Unrecognized SMTP service (12345 0ffffff70000000008cffffffc0000000080000000000008fffff00)
16/tcp open  smtp    Unrecognized SMTP service (12345 0ffffff8000000008ffffff007f8000000007cf7c80000007ffff00)
17/tcp open  smtp    Unrecognized SMTP service (12345 0fffff7880000780f7cffff7800f8000008fffffff80808807fff00)
18/tcp open  smtp    Unrecognized SMTP service (12345 0fff78000878000077800887fc8f80007fffc7778800000880cff00)
19/tcp open  smtp    Unrecognized SMTP service (12345 0ff70008fc77f7000000f80008f8000007f0000000000000888ff00)
20/tcp open  smtp    Unrecognized SMTP service (12345 0ff0008f00008ffc787f70000000000008f000000087fff8088cf00)
21/tcp open  smtp    Unrecognized SMTP service (12345 0f7000f800770008777000000000000000f80008f7f70088000cf00)
22/tcp open  smtp    Unrecognized SMTP service (12345 0f8008c008fff8000000000000780000007f800087708000800ff00)
23/tcp open  smtp    Unrecognized SMTP service (12345 0f8008707ff07ff8000008088ff800000000f7000000f800808ff00)
24/tcp open  smtp    Unrecognized SMTP service (12345 0f7000f888f8007ff7800000770877800000cf780000ff00807ff00)
25/tcp open  smtp    Unrecognized SMTP service (12345 0ff0808800cf0000ffff70000f877f70000c70008008ff8088fff00)
26/tcp open  smtp    Unrecognized SMTP service (12345 0ff70800008ff800f007fff70880000087f70000007fcf7007fff00)
27/tcp open  smtp    Unrecognized SMTP service (12345 0fff70000007fffcf700008ffc778000078000087ff87f700ffff00)
28/tcp open  smtp    Unrecognized SMTP service (12345 0ffffc000000f80fff700007787cfffc7787fffff0788f708ffff00)
29/tcp open  smtp    Unrecognized SMTP service (12345 0fffff7000008f00fffff78f800008f887ff880770778f708ffff00)
30/tcp open  smtp    Unrecognized SMTP service (12345 0ffffff8000007f0780cffff700000c000870008f07fff707ffff00)
31/tcp open  smtp    Unrecognized SMTP service (12345 0ffffcf7000000cfc00008fffff777f7777f777fffffff707ffff00)
32/tcp open  smtp    Unrecognized SMTP service (12345 0cccccff0000000ff000008c8cffffffffffffffffffff807ffff00)
33/tcp open  smtp    Unrecognized SMTP service (12345 0fffffff70000000ff8000c700087fffffffffffffffcf808ffff00)
34/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffff800000007f708f000000c0888ff78f78f777c008ffff00)
35/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffff800000008fff7000008f0000f808f0870cf7008ffff00)
36/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffffff7088808008fff80008f0008c00770f78ff0008ffff00)
37/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffc8088888008cffffff7887f87ffffff800000ffff00)
38/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffff7088888800008777ccf77fc777800000000ffff00)
39/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffffff800888880000000000000000000800800cfff00)
40/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffffffff70008878800000000000008878008007fff00)
41/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffffffffff700008888800000000088000080007fff00)
42/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffffffffffffc800000000000000000088800007fff00)
43/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffffffffffffff7800000000000008888000008ffff00)
44/tcp open  smtp    Unrecognized SMTP service (12345 0fffffffffffffffffffffffff7878000000000000000000cffff00)
45/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffffffffffffffffffffffffffc880000000000008ffffff00)
46/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffffffffffffffffffffffffffffff7788888887ffffffff00)
47/tcp open  smtp    Unrecognized SMTP service (12345 0ffffffffffffffffffffffffffffffffffffffffffffffffffff00)
48/tcp open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
49/tcp open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)
50/tcp open  smtp    Unrecognized SMTP service (12345 0000000000000000000000000000000000000000000000000000000)

Really smooth guys, really smooth. Sometimes you have to see the big picture:

]]> 2
Converting a file to a JSON array Thu, 30 Jan 2014 16:56:02 +0000 For some reason I need that. OK, not any reason. For integrating a CloudInit YAML file into an AWS CloudFormation template. By using this article as reference, I made a simple node.js script for doing just that.

#!/usr/bin/env node
var fs = require('fs');
fs.readFile(process.argv[2], function (err, file) {
	if (err) {
	file = file.toString().split('\n');
	var idx, aux = [];
	for (idx = 0; idx < file.length; idx++) {
	file = JSON.stringify(aux);

Save as something.js, make it an executable, then invoke it with ./something.js /path/to/file.

The end. ]]> 0 Converting a DMG to ISO under OS X Fri, 17 Jan 2014 09:37:36 +0000 There’s a lot of wrong information floating on the internets. People that usually call no-OS X stuff “lesser operating systems” but with no clue about the different internals of a CDR image and an ISO image. CDR has a native OS X filesystem (HFS+), while ISO carries ISO9660. Just rename the CDR to ISO they say. It will be an ISO they say. However, that’s far from the truth.

The correct hdiutil command for converting a DMG to ISO is this one:

hdiutil makehybrid -iso -joliet -o output.iso input.dmg

file should return something like this:

file output.iso
output.iso: ISO 9660 CD-ROM filesystem data 'LABEL'
]]> 0
Performance breakdown for libxml-to-js Sat, 07 Dec 2013 23:39:16 +0000 Background

libxml-to-js was born to solve a specific problem: to support my early efforts with aws2js. At the time, the options were fairly limited. xml2js was a carry-over from aws-lib which aws2js initially forked. I was never happy with xml2js for a couple of reasons: performance and error reporting. Therefore I looked for a solution to have a drop-in replacement. Borrowed some code from Brian White, made it fit to the xml2js (v1) formal specifications, then pushed it to GitHub. At some point the project had five watchers and five contributors. I guess it hit a sweet spot. That’s why it’s got support for XPath and CDATA, most of it from external contributions. And only then I started using it for other XML related stuff.

The name was chosen to make a distinction from libxmljs which sits at the core of this library which actually binds to Gnome’s libxml2.

Due to the fact that aws2js gained some popularity and I’m doing a complete rewrite with 0.9, the output of libxml-to-js most probably won’t change beyond the “specs” of xml2js v1.


The actual reason for why I’m writing this article is the fact that people keep asking about the reason for choosing libxml-to-js over xml2js, therefore next time when this question arrives, I am going to simply link this article.

Even now, two and a half years later, with some crappy benchmark that I pushed together, it is somewhere around 25-30% faster than xml2js under usual circumstances. In only specific cases that don’t apply to the XML returned by AWS, xml2js closes in. The part where it really shines is still the error reporting where besides the fact that’s accurate, it is also screaming fast compared to xml2js. In my tests it came out to be around 27X faster.

The code:

var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var parser1 = require('libxml-to-js');
var parser2 = new require('xml2js').Parser({
    mergeAttrs: true,
    explicitRoot: false,
    explicitArray: false
require('fs').readFile(process.argv[2], function(err, res) {
    if (err) {
    var xml = res.toString();
    // add tests
    suite.add('XML#libxml-to-js', function() {
        parser1(xml, function(err, res) {});
        .add('XML#xml2js', function() {
            parser2(xml, function(err, res) {});
    // add listeners
    .on('cycle', function(event) {
        .on('complete', function() {
            console.log('Fastest is ' + this.filter('fastest').pluck('name'));
    // run async
        'async': true

The results, based onto the XML files from the libxml-to-js unit tests and the package.json for the error speed test:

# package.json
XML#libxml-to-js x 18,533 ops/sec ±3.46% (75 runs sampled)
XML#xml2js x 673 ops/sec ±1.35% (68 runs sampled)
Fastest is XML#libxml-to-js
# ec2-describeimages.xml
XML#libxml-to-js x 1,122 ops/sec ±4.59% (74 runs sampled)
XML#xml2js x 818 ops/sec ±7.02% (83 runs sampled)
Fastest is XML#libxml-to-js
# ec2-describevolumes-large.xml
XML#libxml-to-js x 65.41 ops/sec ±3.13% (65 runs sampled)
XML#xml2js x 50.88 ops/sec ±2.14% (65 runs sampled)
Fastest is XML#libxml-to-js
# element-cdata.xml
XML#libxml-to-js x 14,689 ops/sec ±5.41% (72 runs sampled)
XML#xml2js x 11,551 ops/sec ±2.36% (88 runs sampled)
Fastest is XML#libxml-to-js
# namespace.xml
XML#libxml-to-js x 9,702 ops/sec ±5.75% (72 runs sampled)
XML#xml2js x 5,802 ops/sec ±2.41% (81 runs sampled)
Fastest is XML#libxml-to-js
# root-cdata.xml
XML#libxml-to-js x 22,983 ops/sec ±7.11% (69 runs sampled)
XML#xml2js x 14,849 ops/sec ±6.01% (87 runs sampled)
Fastest is XML#libxml-to-js
# text.xml
XML#libxml-to-js x 2,669 ops/sec ±3.68% (78 runs sampled)
XML#xml2js x 2,617 ops/sec ±2.41% (88 runs sampled)
Fastest is XML#libxml-to-js
# wordpress-rss2.xml
XML#libxml-to-js x 2,056 ops/sec ±4.08% (75 runs sampled)
XML#xml2js x 1,226 ops/sec ±2.79% (84 runs sampled)
Fastest is XML#libxml-to-js

The tests ran under node.js v0.10.22 / OS X 10.9 / Intel Core i5-4250U CPU @ 1.30GHz with the latest module versions for both libxml-to-js and xml2js.

]]> 0
Fixing the AMD AHCI drivers for SB7xx on Windows 7 Tue, 17 Sep 2013 20:28:47 +0000 I heard a lot of urban legends about the Windows Update service that messes up your machine. Of course, I dismissed all of them with the classic “worksforme” as didn’t happen to me. Until Microsoft delivered a 3rd party driver update via an optional package. You know, like the stuff that comes from the vendor and it isn’t properly tested. I had the lack of inspiration to check that too instead of simply ignoring it, like I usually do with Bing Desktop and Silverlight. The next thing was a BSOD at boot.

Had to disable the AHCI in BIOS and revert to using IDE mode for the SATA ports. Which kinda sucks for some reasons. The most important: the SSD performance is hurt under IDE mode, the TRIM command won’t work under IDE mode without 3rd party software since only the MSAHCI driver implements TRIM from Windows 7, and the fact that my HDD array doesn’t support NCQ under IDE mode.

When it comes to drivers, AMD is still a shitty company. Even more, their engineers didn’t grasp the concept of backward compatibility. Uninstalling the driver that broke my installation and installing a driver that works proved to be a non-trivial task. Fortunately I found this post on

For the sake of avoiding the link rot, I’m going to reproduce the essentials for posterity, with the same disclaimer as the original – you’re on your own if you mess up your machine and I’m not taking any responsibility if you follow these:

  • Delete any older version of the amd_ahci driver from here: C:\Windows\System32\DriverStore\FileRepository. The folders with older AMD AHCI drivers are named something like: amd_sata.inf_amd64_neutral_c85cc6046149a413 (i386 on 32-bit and most probably another hash). In order to remove the directory, you need to either elevate your explorer / shell to SYSTEM privileges, or take the ownership of the driver directory, add proper permissions, then delete it.
  • From HKLM/SYSTEM/CurrentControlSet/services delete amd_sata and amd_xsata. There’s no need to remove the entries without the underscore (amdsata and amdxsata).
  • Reboot the computer. Don’t change from IDE to AHCI. The driver that actually worked for my combination, which is AMD 780G / SB700 is this one. Execute the installer, wait till it finishes to copy the files to C:\ATI\Support, then cancel the setup when the Catalyst installer starts.
  • Open the Device Manager. Action » Add legacy hardware » Advanced mode » Show All Devices » Have Disk. Browse the extraction path for the above package: C:\ATI\Support\11-12_vista32-64_ahci\Packages\Drivers\SBDrv\SB7xx\AHCI. There’s a couple of directories: LH – for 32-bit and LH64A – for 64-bit. Select “AMD SATA Controller” then continue. Unlike the author of the original material, I didn’t get an error about the device not starting.
  • Reboot the computer. Don’t change from IDE to AHCI. Go to Device Manager. Under IDE ATA/ATAPI controllers should be at least an entry with a yellow exclamation mark, AMD SATA Controller. Uninstall “AMD SATA Controller” without checking “Delete the driver software for this device”. Reboot the machine.
  • Go to BIOS, enable AHCI. After boot, the OS installs the proper drivers, then prompts for another reboot. Reboot the machine. Done.

In my case, it simply fixed the driver installation from the failed Windows update as the driver that runs on my machine is from 2013 and the driver used in the above steps is from 2011. The drivers from the latest Catalyst, 13.4 failed to install via the “Add legacy hardware” method or via a standard Catalyst setup.


Some benchmarks with a SSD drive under IDE mode:


And some benchmarks under AHCI mode:


I guess the sharp drop was due to TRIM doing its job. Yes, it’s enabled:


]]> 9
Splitting a string every nth char in shell Sat, 03 Aug 2013 21:55:21 +0000 I needed some reusable stuff that splits a string every nth char. Then I remembered that bash and zsh, the shells that I usually use, support string slicing. Kinda like Python does. Or the other way around. Made a shell function. Dropped in into .bashrc / .zshrc. Enjoy.

function string_split()
	while [ ! -z "$str" ]
		echo "${str:0:$count}"


string_split abcd 2
]]> 0
git is distributed, stupid Sun, 23 Dec 2012 17:01:28 +0000 There’s no news that pretty often the popular code hosting services, like GitHub or Bitbucket, go down when you least expect it. Especially GitHub, or so it seems. From time to time I get into my feed reader yet another entry from Hacker News that “GitHub is down”.

However, being hit by this problem, I managed to work around it by simply using stuff that’s already part of git itself. No need for going in panic mode for every GitHub hiccup. git is distributed, stupid.

I’ve seen a lot of solutions or proposals, but none of them were KISS compliant (or should I say: blog post title compliant), like using a different remote for pushing to a secondary service, or using hooks. Found out that git supports multiple url entries per remote, but the functionality isn’t exposed into the interface itself. You need to actually edit the config file.

Fortunately, git exposes a config edit shortcut: “git config -e” which opens the repository configuration file “.git/config” with the default editor. Found out that “git config -e” is easier to remember, but YMMV.

A real world example from one of my projects:

[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*
        url =
        url =

This means that every time I issue a “git push [--tags] [remote branch]” everything is automatically synced in multiple remote repositories, removing the single point of failure.

The ordering of the url entries is important as only the first is use for pulling the changes. If a specific url fails to accept the changes, then the rest of the url entries are ignored. Sure, some things may go out of sync for a while, but “eventually consistent” is the term you’re looking for in this scenario. You may pull changes between team members, but that’s not always applicable, therefore it doesn’t hurt to have some failover option.

I found out that Bitbucket is a little bit more stable that GitHub. It defaults to that. Used to be the other way though.

]]> 0
Use the cache, Luke, Part 2: don’t put all your eggs into the memcached buck … basket Mon, 17 Dec 2012 15:16:13 +0000 This is the second part of a series called: Use the cache, Luke. If you missed the first part, here it is: From memcached to Membase memcached buckets. Meanwhile, the AWS ElastiCache service proved to have better network latency than our own rolled out Membase setup, therefore the migration was easily done by simply switching the memcached config. No vendor lock in.

However, it took me a while to write this second part.

Please have a look at the above video. Besides the general common sense guidelines about how to scale your stuff, and the Postgres typical stuff, there’s a general rule: cache, cache, and then cache some more.

However, too much caching in memcache (whatever implementation) may kill the application at some point. The application may not be database dependent, but it is cache dependent. Anything that affects the cache may have the effect of a sledgehammer on your database. Of couse, you can always scale vertically that DB instance, scale horizontally by adding read-only replicas, but the not-so-fun part is that it costs a lot just to have the provisioned resources in order to survive a cache failure.

The second option is to have a short lived failover cache on the application server. Something like five minutes, while the distributed cache from memcache may last for hours. Enough to keep the database from being hit from live traffic, while you don’t have to provision a really large database instance. Of course, it won’t work with stuff that needs some “real time” junk, but it works with data that doesn’t change with each request.

There are a lot of options for a failover cache since there’s no distributed setup to think about. It may be a memcached daemon, something like PHP’s APC API, or, the fastest option: the file based caching. Now you may think that I’m insane, but memcached still has the IPC penalty, especially for TCP communication, while if you’re a PHP user, APC doesn’t perform as expected.

I say file based caching, not disk based caching, as the kernel does a pretty good job at “eating your RAM” with the disk caching stuff. It takes more to implement it since the cache management logic must be implemented into the application itself, you don’t have stuff like LRU, expiration, etc. by default, but for failover reasons, it is good enough to worth the effort. In fact, it ran for a few days on the failover cache without any measurable impact.

The next part for not using the same basket for all of your eggs is: cache everywhere you can. For example, by using the nginx FastCGI cache, we could shave off 40% of our CPU load. Nothing experimental about this last part. It is production for the last 18 months. If you get it right, then it could be a really valuable addition to a web stack. However, a lot of testing is required before pushing the changes to production. We hit a lot of weird bugs for edge cases. The rule of thumb is: if you get the cache key right, then most of the issues are gone before going live.

In fact, by adding the cache control stuff from the application itself, we could push relatively shortly lived pages to the CDN edges, shaving off a lot of latency for repeated requests as there’s no round trip from the hosting data center to the CDN edge. Yes, it’s the latency, stupid. The dynamic acceleration that CDNs provide is nice. Leveraging the HTTP caching capabilities is nicer. Having the application in a data center closer to the client is desirable, but unless your target market is more distributed than having a bunch of machines into the same geo location, it doesn’t make any sense to deploy into a new data center which adds its fair share of complexity when scaling the data layer.

]]> 0
Reverse dependencies for the installed packages in Debian + friends Wed, 12 Dec 2012 10:14:08 +0000 Some libraries are more libraries than others. It is one of those moments when you ask yourself if migrating to a newer version of a library fucks up the entire system. But you need that foo library as it implements feature bar. In my case, I wanted libpcre3 8.20+ in order to enable PCRE JIT. Though luck. Not even Debian sid packages 8.20.

Now I know that there’s apt-cache rdepends, but it lists all the reverse dependencies of a specific package. I needed just the reverse dependencies of the installed packages. With a little bash-fu, here it goes:

function package_rdepends
	for package in $(apt-cache rdepends $1 | grep -Ev "^$1$" | grep -v 'Reverse Depends:')
		apt-cache policy $package | grep 'Installed: (none)' > /dev/null 2>&1
		if [ $? -eq 1 ]
			echo $package
package_rdepends $1 | sort -u

Saved as installed-rdepends. Made executable.

./installed-rdepends libpcre3

The above script may be slow for packages with many reverse dependencies due to the fact that each package has an individual lookup. Didn’t have the patience to measure the time it takes to do a lookup for libc6. Some benchmarks for the package lookup:

time apt-cache policy libpcre3 | grep 'Installed: (none)' > /dev/null 2>&1
real	0m0.006s
user	0m0.005s
sys	0m0.003s
time dpkg -L libpcre3 > /dev/null 2>&1
real	0m0.017s
user	0m0.012s
sys	0m0.005s
time dpkg -l libpcre3 > /dev/null 2>&1
real	0m0.667s
user	0m0.600s
sys	0m0.067s
time dpkg -s libpcre3 > /dev/null 2>&1
real	0m0.587s
user	0m0.533s
sys	0m0.054s
time cat /var/lib/dpkg/available | grep -E "Package: libpcre3$" > /dev/null 2>&1
real	0m0.034s
user	0m0.015s
sys	0m0.048s

However, I didn’t try these results on a bare metal installation.

]]> 0
Inlining the PEM encoded files in node.js Fri, 07 Dec 2012 15:14:42 +0000 Multi line strings in JavaScript are a bitch. At least till ES6. The canonical example for a node.js HTTPS server is:

// curl -k https://localhost:8000/
var https = require('https');
var fs = require('fs');
var options = {
  key: fs.readFileSync('test/fixtures/keys/agent2-key.pem'),
  cert: fs.readFileSync('test/fixtures/keys/agent2-cert.pem')
https.createServer(options, function (req, res) {
  res.end("hello world\n");

All fine and dandy as the sync operation doesn’t penalize the event loop. It is associated with the server startup cost. However, jslint yells about using sync operations. As the code is part of the boilerplate for testing http-get, refactoring didn’t make enough sense. Making jslint to STFU is usually the last option. The content of the files never changes, therefore it doesn’t make any sense to read them from the disk either. Inlining is the obvious option.

Couldn’t find any online tool to play with. Therefore I fired a PHP REPL, then used my PCRE-fu to solve this one. The solution doesn’t look pretty, but it gets the job done:

php > var_dump(preg_replace('/\n/', '\n\\' . "\n", file_get_contents('server.key')));
string(932) "-----BEGIN RSA PRIVATE KEY-----\n\
-----END RSA PRIVATE KEY-----\n\
php > var_dump(preg_replace('/\n/', '\n\\' . "\n", file_get_contents('server.cert')));
string(892) "-----BEGIN CERTIFICATE-----\n\
-----END CERTIFICATE-----\n\
php >

This gave me usable multi line strings that don’t break the PEM encoding.

Update: shell one liner with Perl

cat certificate.pem | perl -p -e 's/\n/\\n\\\n/'
]]> 0
Doing what Dropbox is doing and doing it wrong Thu, 06 Dec 2012 16:35:30 +0000 Let’s take a couple of examples. Switched from an older machine recently, therefore I need to setup all my stuff. As I don’t like to depend on a single service, for redundancy’s sake, I also keep a backup for Dropbox.

SpiderOak – backs up stuff, uses client side encryption, has optional sync between your machines. So far, so good. In the latest OS X client, at least, the possibility to paste the password is missing. Thanks, I’ll me use my password manager instead with services that don’t do such a braindead thing. Seriously, there’s a thing that improves the security of the password authentication. It is called two factor authentication. Dropbox has it. Google has it. In fact, any decent service has it. Disabling the possibility to paste the password, not so much.

Google Drive – you wouldn’t think I’m letting Google of the hook this time. As I don’t trust with my data these sync services, I always do client side encryption. Dropbox doesn’t choke on it, SpiderOak doesn’t choke on it. Google Drive must be a special kind of breed as it chokes on my encrypted files with “Upload Error – An unknown issue has occurred “. Gee, let me fix the error message for you: “your piece of shit encrypted files aren’t of any use for us, there’s no personal info there”. Was it that difficult? Thanks, but the market is full of alternatives. Seriously Google, you could do better than this “not being evil” thing.

]]> 0
Async frameworks “Hello World” showdown Sat, 12 May 2012 14:05:35 +0000 This is not intended to be a proper comparison between these frameworks. However, since the “Hello World” test is the lowest common denominator, it is a pretty clear indicator that an application can’t exceed in performance these numbers. Also, what Guillermo did not understand from my comment is the fact that 1000 requests at the concurrency of 10 is way to few for get a proper picture of a “Hello World” showdown.

Tested frameworks:

  • node.js – v0.6.17
  • vert.x – v1.0 final + OpenJDK 7 installed from the Ubuntu repository – using the JavaScript bindings
  • luanode – built from the master branch using the Ubuntu provided lua dependencies
  • luvit – built from the master branch
  • react – cloned the master branch

I also wanted to test node.native, but it kept crashing on me. You can see that it is a pretty old issue. I didn’t have the patience to make the v0.1.0 branch to work with the previously used code. But I’d like to give it a run for its money.

The system used for the testing is a modest Athlon II X2 240e (2.8GHz) with 4GB or DDR2 800MHz running the latest Kubuntu 12.04 LTS amd64. Since ab pretty much takes a CPU core for itself, the frameworks ran a single process that occupied a single CPU core. I tried running a node.js HTTP server wrapped with the cluster module. Or passing -instances 2 to the vertx framework. The results were pretty much the same, therefore using just a single CPU core is a fair comparison.

The ab command that I used to hammer the Hello World! output:

ab -r -k -n 1000000 -c 1000{port_name}/

The command ran at least a couple of times before saving the results. Just to make sure that everything is properly warmed up.

The averages graph:

The test sources and full ab output is available on this gist. There’s interesting output in the results.txt file for the stats nerds.

PS: I have the impression (but did not test) that vert.x may be a little bit faster, but ab is the actual bottleneck.

Update: added React (node.php) to comparision. Too lazy to plot another graph. But at 1573.40 req/s, it is harly a match even for luanode. Used the PHP 5.3.10 from the Ubuntu repositories.

Update: added another React (node.php) to comparision, but with a custom build of PHP 5.4.3. This time, it managed to get 3727.49 req/s.

]]> 2
When no to use Amazon’s SimpleDB Fri, 11 May 2012 15:18:30 +0000 When it turns out that the cost for keeping few gigabytes of data is too fucking much.

When it turns out that it is not keeping the most basic promises. The AWS marketing machine did it. Again.

When it turns out that the latency is absolutely crap. I mean, SDB vs. RDS, as shown by New Relic: 183 ms vs. 1.6 ms. And I’m only talking about averages. Plotting the whole stuff on a graph along with the standard deviation will drive insane a statistician.

I could go about this all day long. But why bother.

]]> 0
Poor man’s tail recursion in node.js Fri, 30 Mar 2012 08:14:15 +0000 If you find yourself in the situation of doing recursion over a large-enough input in node.js, you may encounter this:

        throw e; // process.nextTick error, or 'error' event on first tick
RangeError: Maximum call stack size exceeded

Oops, I smashed the stack. You may reproduce it with something like this:

var foo = []
for (var i = 0; i < 1000000; i++) {
var recur = function (bar) {
    if (bar.length > 0) {
        var baz = bar.pop()
        // do something with baz
    } else {
        // end of recursion, do your stuff

“Thanks, that’s very thoughtful. But you’re not helping.” Bear with me. The solution is the obvious tail call elimination. But JavaScript doesn’t have that optimization.

However, you may wrap the tail call in order to call the above recur() function in a new stack. The proper recur() implementation is:

var recur = function (bar) {
    if (bar.length > 0) {
        var baz = bar.pop()
        // do something with baz
        process.nextTick(function () {
    } else {
        // end of recursion, do your stuff

Warning: please read this carefully. I gave you the solution for recurring over such a large input, but the performance is poor. Using process.nextTick (or a timer function such as setTimeout for that matter, slower BTW) is an expensive operation. Didn’t test where’s the actual bottleneck (epoll itself under Linux, libuv | libev, etc).

time node recur.js
node recur.js  1.36s user 0.28s system 101% cpu 1.610 total

The cost of this method is high. Therefore, don’t attempt this in a web application. It kills the event loop. For instance, I don’t use node for writing web applications. It is a difficult task, while the cost of the event loop itself isn’t that negligible as you may think. It useful as long as the CPU time is negligible compared to the time spent doing IO. Therefore, please, don’t include me in the group of people that thinks about node as the hammer for all the problems you throw at it.

If you’re wondering why I won’t just simply iterate the object, the answer is simple: because that “do something with baz” involves some async IO that would kill the second data provider. Sequential calls ensure that everybody in the architecture stays happy. Besides, I don’t actually use bar.pop(), but something like bar.splice(0, 5000) for packing more data in less remote calls and less events. bar.shift() in a situation like this is as slow as molasses in January. In an async framework, the order of the items from a TODO list is not relevant, therefore use the fastest way.

If you’re still wondering why I still use a solution like this, the above technique is part of the cost associated with the start-up cost. The application fetches all the required data in RAM. Having the application to kill the event loop for 20-30 seconds before hitting the Internet pipe is negligible for a process that runs for hours or days. After the application hits the Internet, only then I can say that node is in use for the stuff where it shines. I know, before this, I listed all the wrong reasons for using node as a tool.

]]> 0
Computing file hashes with node.js Tue, 01 Nov 2011 10:52:38 +0000 Since node.js has the shiny crypto module which binds some stuff to the openssl library, people might be tempted to compute file hashes with node.js. At least the crypto manual page shows how to do a SHA1 for a given file (mimics sha1sum). Should people do this? The answer is: NO. Some may say because it blocks the event loop. I say: because it is as slow as molasses in January. At least compared to dedicated tools.

Let’s have a look:

var filename = process.argv[2];
var crypto = require('crypto');
var fs = require('fs');
var shasum = crypto.createHash('sha256');
var s = fs.ReadStream(filename);
s.on('data', function(d) {
s.on('end', function() {
  var d = shasum.digest('hex');
  console.log(d + '  ' + filename);

time node hash.js ubuntu-10.04.3-desktop-i386.iso
208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac ubuntu-10.04.3-desktop-i386.iso
node hash.js ubuntu-10.04.3-desktop-i386.iso 28.92s user 0.80s system 100% cpu 29.661 total

time sha256sum ubuntu-10.04.3-desktop-i386.iso
208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac ubuntu-10.04.3-desktop-i386.iso
sha256sum ubuntu-10.04.3-desktop-i386.iso 4.86s user 0.21s system 99% cpu 5.093 total

time openssl dgst -sha256 ubuntu-10.04.3-desktop-i386.iso
SHA256(ubuntu-10.04.3-desktop-i386.iso)= 208fb66dddda345aa264f7c85d011d6aeaa5588075eea6eee645fd5307ef3cac
openssl dgst -sha256 ubuntu-10.04.3-desktop-i386.iso 4.40s user 0.17s system 100% cpu 4.567 total

Edit: to sum up for those with little patience:

node hash.js – 29.661s
sha256sum – 5.093s
openssl dgst -sha256 – 4.567s


That’s a ~6.5X speed boost just by invoking openssl alone instead of binding to its library. node.js does something terribly wrong somewhere since the file I/O is not to blame for the slowness:

var filename = process.argv[2];
var fs = require('fs');
var s = fs.ReadStream(filename);
s.on('data', function(d) {
s.on('end', function() {

time node read.js ubuntu-10.04.3-desktop-i386.iso
node read.js ubuntu-10.04.3-desktop-i386.iso 0.62s user 0.60s system 106% cpu 1.148 total

This little example that I hacked together shows that using child_process.exec is pretty fine:

var exec = require('child_process').exec;
exec('/usr/bin/env openssl dgst -sha256 ' + process.argv[2], function (err, stdout, stderr) {
	if (err) {
	} else {
		console.log(stdout.substr(-65, 64));

time node hash2.js ubuntu-10.04.3-desktop-i386.iso
node hash2.js ubuntu-10.04.3-desktop-i386.iso 4.44s user 0.19s system 100% cpu 4.630 total

So you can have your cake and eat it too. The guys with the philosophy got this one right.

]]> 3
Will it recur? Part 2: in depth analysis Wed, 12 Oct 2011 12:54:14 +0000 The social experiment

This first chapter is not about recursion. One member of the community wrote that certain inflammatory statements that I use may upset people. I replied with: “buzz marketing”. Neutral articles, with neutral titles, written by nobodies like me, gain zero traction, although I may write something that’s technically sound. Cheap journalism has more success. I even have graphs to prove it now.

The second stuff is the usefulness of my little experiment. I don’t know about others, but the curiosity was the main thing behind my whole benchmark. If it isn’t useful for some people, it doesn’t mean that it isn’t useful for others.

The other thing: the lack of tail recursion. I mean, do you need a “DUH” award, or something? The whole point of a “bad” algorithm that’s mathematically correct (well, almost, I stated that fibonacci(0) is wrong) is to prove how smart are specific compilers regarding recursion. The rest, simply do brute force.

Patterns that emerge

The numbers say something if you know how to read the page. There are runtimes that are optimized for doing proper recursion without bothering the programmer with it: C, D, PyPy, V8, LuaJIT, JVM. The rest aren’t: PHP, CPython, Ruby, Perl, Lua. PyPy and V8 could do better. LuaJIT is already close to the speed of unoptimized C and D. V8 isn’t the king of the hill if you take Ruby (MRI / KRI), CPython, plain Lua VM, and PHP (Zend Engine) out of the equation. This may be another opportunity to get bashed by the node.js benchmark police with “this is irrelevant” statements, although this wasn’t something that I wanted to prove.

Thing is, that for most of the web development, I rarely needed to actually solve purely recursive problems. At most a fairly simple tree. Sometimes even that simple tree didn’t actually require recursion. Therefore I get why some don’t optimize for this specific case, although they refer the thing as being “a general purpose language”.

For the “write better algorithms” crowd … WHY? The difference between C’s 0.6 seconds and Ruby’s 5 minutes doesn’t ring any bell that some things are fundamentally flawed regarding recursion?

As for the edge cases, there are 3rd party libraries that solve this issue without bothering the programmer. Or for other edge cases, such as applications that do complicated stuff, operating at Google-like scale, there are better tools that most mere mortals won’t use. The fact that some implementation do poor recursion is indeed irrelevant when the problems you’re trying to solve don’t include this.

In the end

Initially I wanted to try more stuff such as factorial, Euclid’s GCD, or the Ackermann function, for example. Try them on runtimes that don’t take longer than the next ice age to return a value. But why bother, except maybe to give the “one true way of doing recursion in functional languages” programmers a reason to bash stuff without returning any useful output. Not even an academic paper. It’s not productive.

]]> 0
But the question is: will it recur? Part 1: fibonacci(40) Mon, 10 Oct 2011 14:10:51 +0000 A rant, maybe a bad rant due to babbling about philosophical reasons, made me wonder how the programming languages stack up against this stuff: recursion. Or should I say the runtimes, since the programming language itself is nothing but a bunch of text. I know that the algorithm itself is bad, but that’s the whole point. I know that fibonacci(0) yields a wrong result, but for the sake of lazyness, I kept the original algorithm.

The source code of all the tests is available here in order to make the tests to be reproducible. There wasn’t a high number of runs, particularly for the rutimes that take more than the next ice age. But the results are pretty consistent for specific runtimes. The relevant systems specs are: Ubuntu 10.04 amd64 (up to date), Q9400 CPU.

Now, less talk, more results.

JavaScript (node.js/V8)

node -v: v0.4.12
time node fib.js
node fib.js 6.40s user 0.02s system 99% cpu 6.423 total
node fib.js 6.39s user 0.02s system 99% cpu 6.410 total

It may seem slow, but for a language with dynamic typing, it puts the rest from the same category to shame. Or most of them. Bear with me.


php -v: PHP 5.3.8 (cli)
time php fib.php
php fib.php 77.57s user 0.06s system 99% cpu 1:17.66 total
php fib.php 78.05s user 0.07s system 99% cpu 1:18.18 total

Compared to the V8 runtime, PHP seems to take an eternity. It happens that PHP isn’t bad at recursion because it uses the stack, but because the lack of speed of the runtime. But we’re not even halfway there. Stay tuned. PHP isn’t the only thing that sucks at recursion.


gcc -v: gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
make fib
time ./fib
./fib 2.84s user 0.00s system 100% cpu 2.840 total
./fib 2.84s user 0.00s system 100% cpu 2.835 total

It wasn’t a surprise that C came up to this result. Which makes the V8 result even more interesting.

Forgot about the compiler optimization. Caught by Jabbles on HN.

gcc -O4 fib.c -o fib
time ./fib
./fib 0.65s user 0.01s system 100% cpu 0.657 total
gcc -O3 fib.c -o fib
time ./fib
./fib 0.66s user 0.00s system 100% cpu 0.657 total
gcc -O2 fib.c -o fib
time ./fib
./fib 1.54s user 0.00s system 100% cpu 1.535 total
gcc -O1 fib.c -o fib
time ./fib
./fib 0.00s user 0.00s system 0% cpu 0.001 total
time ./fib
./fib 2.06s user 0.00s system 99% cpu 2.060 total

For some reason, the O1 flag hates this code. Printing fibonacci(40) yields a result closer to the result without any O flag. This brings it past the Java result, but only for O3+.
/End Edit.


lua -v: Lua 5.1.4
time lua fib.lua
lua fib.lua 28.02s user 0.03s system 99% cpu 28.081 total
lua fib.lua 28.86s user 0.02s system 99% cpu 28.883 total

./luajit -v: LuaJIT 2.0.0-beta8
time ./luajit fib.lua
./luajit fib.lua 10.59s user 0.00s system 99% cpu 10.591 total
./luajit fib.lua 10.58s user 0.01s system 99% cpu 10.610 total

Tested both of the implementations that I know of. I guess this article isn’t that funny for the generations of Lua coders that laugh about V8 in somebody’s face. Don’t get me wrong, I like Lua due to its simplicity, but in the speed realm, I still need to do some tests to verify some of those claims that sometimes appear to be overly inflated.


With the following Lua script:

local function fibonacci(n)
	if n < 2 then
		return 1
		return fibonacci(n - 2) + fibonacci(n - 1)

the results are getting better:

time lua fib.lua
lua fib.lua 24.17s user 0.08s system 99% cpu 24.281 total
lua fib.lua 24.24s user 0.01s system 99% cpu 24.307 total

[with LuaJIT v2.0.0-beta8 GIT HEAD]
time ./luajit fib.lua
./luajit fib.lua 2.02s user 0.00s system 99% cpu 2.026 total
./luajit fib.lua 2.02s user 0.00s system 99% cpu 2.023 total

Now, some of the Lua chops can have a lulz about V8. This project is getting more interesting, especially for pairing LuaJIT with luafcgid. I forgot about the local keyword since my Lua experience is limited to basic testing. Nice comeback!

/End Edit.


python -V: Python 2.6.5
time python
python 59.42s user 0.02s system 99% cpu 59.494 total
python 59.27s user 0.05s system 99% cpu 59.375 total

make -j 4
./python -V: Python 2.7.2
time ./python
./python 61.29s user 0.03s system 99% cpu 1:01.35 total
./python 61.38s user 0.06s system 99% cpu 1:01.48 total

make -j 4
./python -V: Python 3.2.2
./python 71.23s user 0.08s system 99% cpu 1:11.33 total
./python 70.31s user 0.06s system 99% cpu 1:10.39 total

./pypy -V
Python 2.7.1 (d8ac7d23d3ec, Aug 17 2011, 11:51:19)
[PyPy 1.6.0 with GCC 4.4.3]
./pypy 4.61s user 0.07s system 99% cpu 4.708 total
./pypy 4.81s user 0.01s system 99% cpu 4.853 total

Happens to have 2.6.5 around because Ubuntu says so. But in order to make the potential trolls to STFU about not using the latest versions, I made some fresh builds of 2.7.2 and 3.2.2. It gets even suckier with recent versions. In fact, the CPython runtime is struggling to catch up the PHP runtime on the slowness realm. The only Python runtime that is actually very impressive about the recursion stuff is PyPy. Which brings me to the first statement: the language is just a bunch of text. The runtime is the piece that sucks or does not suck. PyPy proves that with talented people shepherding the project, the language of the runtime implementation is quite irrelevant. This is the first implementation of a JIT that passes V8 as well.


ruby -v: ruby 1.8.7 (2010-01-10 patchlevel 249) [x86_64-linux]
time ruby fib.rb
ruby fib.rb 233.40s user 66.94s system 99% cpu 5:00.55 total
ruby fib.rb 231.75s user 68.08s system 99% cpu 4:59.99 total

make -j 4
./miniruby -v: ruby 1.9.3dev (2011-09-23 revision 33323) [x86_64-linux]
time ./miniruby fib.rb
./miniruby fib.rb 36.05s user 0.01s system 99% cpu 36.073 total
./miniruby fib.rb 35.92s user 0.04s system 99% cpu 35.978 total

Everytime a Ruby fan says that “thou shalt not care about the runtime speed” makes me laugh so hard up to the point of bursting into tears. Seriously, I couldn’t imagine that MRI sucks that hard at recursion. I barely had the patience to even run this code. KRI washes part of the shame though while it scores closely to the Lua implementation. If you’re asking why I used the miniruby binary, the reason is that the ruby binary complained about not having rubygems.rb. I am bad at figuring out what’s missing from a Ruby stack. But it made the fib.rb work.


perl -v: This is perl, v5.10.1 (*) built for x86_64-linux-gnu-thread-multi
time perl
perl 125.60s user 0.17s system 99% cpu 2:05.92 total
perl 124.06s user 0.10s system 99% cpu 2:04.20 total

./Configure [accepted all defaults, specifically built without threading]
./perl -v: This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux
./perl 100.12s user 0.09s system 99% cpu 1:40.29 total
./perl 100.38s user 0.05s system 99% cpu 1:40.63 total

At first I didn’t want to bother with Perl, but then I remembered the legions of Perl fans ranting about the PHP recursion. I know that this is an inflammatory statement, but next time, people, please keep up with the facts. I guess you aren’t that smug now.


gdc -v: gcc version 4.3.4 (Ubuntu 1:1.046-4.3.4-3ubuntu1)
gdc -o fib fib.c (same source as the C binary)
time ./fib
./fib 2.82s user 0.00s system 100% cpu 2.817 total
./fib 2.82s user 0.00s system 100% cpu 2.814 total

Predictable results from a language from the same family as C/C++. Slightly faster binary that the C version (although the same source code), but I guess most people won’t notice.


javac -version: gcj-4.4 (Ubuntu 4.4.3-1ubuntu4.1) 4.4.3
java -version
java version “1.6.0_20″
OpenJDK Runtime Environment (IcedTea6 1.9.9) (6b20-1.9.9-0ubuntu1~10.04.2)
OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
time java fib
java fib 0.86s user 0.02s system 99% cpu 0.882 total
java fib 0.86s user 0.02s system 100% cpu 0.872 total

I admit that sometimes I use to tell this joke: knock! knock!; who’s there?; [very long pause]; Java. I guess that now is a good time to swallow my own words. Not only that Java puts the other JIT implementations to shame, the rest of the VMs to shame, it also obliterates the statically compiled C and D binaries at their own favorite game aka the runtime speed. My first reaction was: WTF, there’s got to be a mistake! Printing some junk to STDIO confirmed the same results between C and Java. Newbie warning: this is my first Java application. No, really! Don’t bash me for the lack of understanding of the usage of the static keyword. I don’t understand if it actually helps the runtime. I managed to put together the code by reading how to write a simple HelloWorldApp. Experienced Java chops may explain it though. ]]> 16 Use the cache, Luke, Part 1: from memcached to Membase memcached buckets Wed, 21 Sep 2011 13:07:44 +0000 I start with a quote:

Matt Ingenthron said internally at Membase Inc they view Memcached as a rabbit. It is fast, but it is pretty dumb and procreates quickly. Before you know it, it will be running wild all over your system.

But this post isn’t about switching from a volatile cache to a persistent solution. It is about removing the dumb part from the memcached setup.

We started with memcached as this is the first step. The setup had its quirks since AWS EC2 doesn’t provide by default a fixed addressing method while the memcached client from PHP still has issues with the timeouts. Therefore, the fallback was the plain memcache client.

The fixed addressing issue was resolved by deploying Elastic IPs with a little trick for the internal network, as explained by Eric Hammond. This might be unfeasible for large enough deployments, but it wasn’t our case. Amazon introduced ElastiCache since then which removes this limitation, but having a bunch of t1.micros with reservation is still way much cheaper. Which makes me wonder why they won’t introduce machine addresses which internally resolve as internal address. They have this technology for a lot of their services, but it is simply unavailable for plain EC2 instances.

Back to the memcached issues. Having a Membase cluster that provides a memcached bucket is a nice drop-in replacement, if you lower a little bit your memory allocation. Membase over memcached still has some overhead as its services tend to occupy more RAM. The great thing is that the cluster requires fewer machines with fixed addressing. We use a couple for high availability reasons, but this is not the rule. The rest have the EC2 provided dynamic addresses. If a machine happens to go down, another one can take up its place.

But there still is the client issue. memcached for PHP is dumb. memcache for PHP is even dumber. None of these can actually speak the Membase goodies. This is the part where Moxi (Memcached Proxy) kicks in. For memcached buckets, Moxi can discover the newly added machines to the Membase cluster without doing any client configuration. Without any Moxi server configuration as the config is streamed to the servers via the machines that have the fixed addresses. With plain memcached, every time there was a change, we needed to deploy the application. The memcached cluster was basically nullified till it was refilled. Doesn’t happen with Moxi + Membase. Since there no “smart client” for PHP which includes the Moxi logic, we use client side Moxi in order to reduce the network round-trips. There still is a local communication over the loopback interface, but the latency is far smaller than doing server-side Moxi. Basically the memcache for PHP client connects to aka where Moxi lives, then the request hits the appropriate Membase server that holds our cached data. It also uses the binary protocol and SASL authentication which is unsupported by the memcache for PHP client.

The last of the goodies about the Membase cluster: it actually has an interface. I may not be an UI fan, I live most of my time in /bin/bash, but I am a stats junkie. The Membase web console can give you realtime info about how the cluster is doing. With plain memcached you’re left in the dust with wrapping up your own interface or calling stats over plain TCP. Which is so wrong at so many levels.

PS: v2.0 will be called Couchbase for political reasons. But currently the stable release is still called Membase.

]]> 0
Why sometimes I hate RFCs Wed, 21 Sep 2011 12:09:37 +0000 Every time when there’s a debate about the format of something that floats around the Internets, people go to RFCs in order to figure out who’s right and who’s wrong. Which may be a great thing in theory. In practice, the rocket scientists that wrote those papers might squeeze a lot of confusion into a single page of text, as the G-WAN manual states.

Today’s case was a debate about the Expires header timestamps as defined by the HTTP/1.1 specs (RFC 2616). If you read the 14.21 section regarding the Expires header, you can see the following statement:

The format is an absolute date and time as defined by HTTP-date in section 3.3.1; it MUST be in RFC 1123 date format:

Expires = “Expires” “:” HTTP-date

I made a newb mistake in thinking that the RFC 1123 dates are legal Expires timestamps. Actually, by proof reading 3.3.1 of RFC 2616 you may deduce the following: the dates in use by the HTTP/1.1 protocol are not the dates into the RFC 1123 format, but the actual format is a subset of RFC 1123. The debate started around the GMT specification which in the HTTP/1.1 contexts is actually UTC, but it must be specified as GMT anyway. Even more, +0000 which is valid timezone specifier as defined by RFC 1123 is not valid for Expires timestamps. Although some caches accept +0000 as valid timezone specifier for the HTTP timestamps, some of them don’t.

It isn’t that the RFCs are broken per se, but the language they use can be very confusing sometimes.

]]> 0
How to rotate the MySQL logs on Amazon RDS Thu, 15 Sep 2011 13:40:44 +0000 One day we enabled the MySQL’s slow_log feature as indicated by the RDS FAQ. That the (mostly) easy part. I say “mostly” because you need to add your own DB Parameter Group in order to enable the damn thing. Adding a group is easy. Editing it still requires you to use API calls (either via rds-api-tools or your own implementation).

Days started to fly, queries started to fill our log, we started to fix the slow points of the application. The thing that didn’t change is the fact that the mysql.slow_log table kept growing. Then I took some time to apply all my MySQL-fu regarding the cleanup of the mysql.slow_log table. Imagine my surprise when none of it worked. Since the master user of a RDS instance doesn’t have all the privileges, it wasn’t quite unexpected though.

For the first time, the AWS Premium Support was actually useful by sending one email that actually provides a solution. Imagine my surprise. The RDS team implemented a couple of stored procedures that can be used for rotating the slow log and the general log.

CALL mysql.rds_rotate_slow_log;
CALL mysql.rds_rotate_general_log;

Basically they move the content to a *_backup table while the original is replaced by an empty table. The exact quote:

When invoked, these procedures move the contents of the corresponding log to a backup table and clear the contents of the log. For example, invoking rds_rotate_slow_log moves the contents of the slow_log table to a new table called slow_log_backup and then clears the contents of the slow_log table. This is done by renaming tables, so no data is actually copied, making this a very light-weight, non-blocking procedure. Invoking the same procedure twice effectively purges the log from the database.

They are present since March 22, 2010 but nobody took the time to document them, apparently. All I could find via online searches was utterly useless junk. I hope this saves some time for some poor chop into the same situation as I was.

]]> 4
About dumping errors on the screen Tue, 13 Sep 2011 12:45:31 +0000 About a month ago I read an article about the possibility of using XSS vectors via the PHP error reporting. Nothing new under the Sun since the internals team love to dismiss valid concerns with the “bogus” status. Happened before, will happen again.

The thing that sometimes pisses me off is stuff like this article written by somebody who understands little about the web security. Writing the title with upper case letters doesn’t bring any value to the argument. So let’s dig a little bit better.

a PHP application that has display_errors enabled should never be in production

If you don’t see anything wrong about the quoted sentence, then you’re into the wrong field. Three of the most basic rules of security are:

  1. don’t rely onto the defaults
  2. all input is evil
  3. don’t rely onto a single layer of security

The application doesn’t have display_errors turned on. The PHP runtime does. Sometimes this may be onto a shared host that doesn’t give a crap about your application. Or the sysadmin had a bad day and simply forgot something. Or, even better, somebody defined display_errors = true with php_admin, so the application can’t do anything about its own security context. “Should not | never” is an idiom that should not be used in any text which is security related. In the end, an attacker gain access through that critical piece of code that should not fail.

Which brings us to the next rule: the input validation. People should validate their input but this doesn’t enforce the fact that mistakes don’t happen. PHP makes it quite easy to both develop fast an application and shoot yourself in the foot at the same time. Properly developed code takes a lot of effort. If you didn’t developed code that treats notifications as fatal exceptions, then you don’t have a clue what I’m talking about. I am not talking about hello.php applications, but large projects built for scale. Couple of years ago I was working on a project built on top of the Kohana framework that used this specific setup in development mode. The consequence was the fact that much of the errors that usually creep into the production code were discovered during the development phase while my team learned how to properly initialize all the stuff that floats around the application. Few people have the patience to work this way. Some of the stuff can be automatically handled via proper abstractions, some of the stuff can’t or it’s simply not practical. If you failed the input validation class, there’s the next point.

Finally the third rule which says that’s bad practice to depend onto a single layer of security aka in this case simply turning off the display_errors. If PHP allows you to shoot yourself in the foot by having display_errors = true onto a production machine, doesn’t mean that it should make a harmless mistake (dumping the error reporting content to the screen) to be more severe (having an XSS in the application). If you regard XSS as harmless, you have much to learn.

The next quote that’s also goes into the clueless realm is:

I don’t want to see anything to do with HTML if I’m not doing web-based programming with PHP (such as CLI).

Since PHP exposes the PHP_SAPI constant which tells exactly what SAPI is in use, I guess that the runtime itself is smart enough to know exactly where the output goes: down a network pipe as HTML or into STDOUT / STDERR. A smart enough runtime also knows when to encode its output as appropriate and when not to. The argument holds water maybe just for the embed SAPI which is seldom used. I dare you to find proper docs about using the embed SAPI in your C / C++ application.

I want my error messages to provide me with precise information. I do not want an error message to arbitrarily encode problematic code. If the user provided test then I want to see test. I do not want to see <s>test</s> since that is not what was submitted and, possibly, not what the problem is.

These sentences enter the realm of self parody. If the user entered <s>test</s> that’s what you should see into an error output. That’s what you can find into an error log. Rendering the HTML markup is a browser feature therefore the output should play by the browser rules. Would you also like to see <script> tags as HTML instead of “what the user actually entered”?

]]> 0
Snapshots are not backups Fri, 27 May 2011 08:07:44 +0000 Some people may slip into your head the idea that by doing snapshots, you’re free from the burden of doing proper backups. While this may sound good in theory, in practice there are a bunch of caveats. There are certain technologies that use the snapshot methodology at the core, but they make sure that your data isn’t corrupted. Some may even provide access to the actual file revisions.

The data corruption is the specific topic that snapshots simply don’t care about, at least in Amazon’s way of doing things. This isn’t exactly Amazon’s fault for EC2. EBS actually stands for Elastic Block Storage. They provide you a block storage, you do whatever you want with it. For RDS they should do a better job though as it’s a managed service where you don’t have access to the actual instance. The issue is those ‘specialists’ that put emphasis onto the ‘easy, cloud-ish way’ of doing backups by using snapshots. If you’re new to the ‘cloud’ stuff as I used to be, you may actually believe that crap. As I used to believe.

A couple of real life examples:

  • An EBS-backed instance suffered some filesystem level corruption. Since EXT3 is not as smart as ZFS if we’re talking about silent data corruption, you may never know until it’s too late. Going back through revisions in order to find the last good piece of data is a pain. I could fix the filesystem corruption, I could retrieve the lost data, but I had to work quite a lot for that. Luck is an important skill, but I’d rather not put all my eggs into the luck basket.
  • An RDS instance ran out of space. There wasn’t a notification to tell me: ‘yo dumbass, ya ran out of space’. Statistically it wasn’t the case, but a huge data import proved me wrong. I increased the available storage. Problem solved. A day later, somebody dropped by accident a couple of tables. I had to restore them. How? Take the latest snapshot, spin up a new instance, dig through the data. The latest snapshot contained a couple of corrupted databases due to the space issue, one of them being the database I needed to restore. I had to take a bunch of time in order to repair the database before the restoration process. Fortunately nothing really bad happened. But it was a signal that the RDS snapshot methodology is broken by design.

Lesson learned. The current way of doing backups puts the data, not the block storage, first. If you’re doing EBS snapshots as the sole method, you may need to rethink your strategy.

]]> 0