Category Archives: Systems Administration

Using persistent OpenSSH connections

I found out that using persistent connections greatly improves the productivity when working with SSH. However, finding the appropriate configuration turned out to be a complicated task. I wanted it to be as unobtrusive as possible, to restart the connection when the socket is closed, and to work without blocking timeouts.

After reading the ssh_config man page and some articles, here’s the best thing I came up with:

Host *
	ControlPath ~/.ssh/master-%r@%h:%p
	ControlMaster auto
	ControlPersist 4h
	TCPKeepAlive no
	GSSAPIAuthentication no
	ServerAliveInterval 60
	ServerAliveCountMax 2

The only issue with this configuration is with long hosts (eg: a really long name) as it hits the UNIX_PATH_MAX limit. Unfortunately, the proper solution to this issue isn’t merged into upstream.

The OS X users who also use brew may easily include the patch for the path issue by editing the openssh formula for OpenSSH 6.6p1 with “brew edit openssh”:

  patch do
    url ""
    sha1 "31f6df29ff7ce3bc22ba9bad94abba9389896c26"

With this patch, a value like ~/.ssh/master-%m works for ControlPath. %m is replaced by SHA1(lhost(%l) + rhost(%h) + rport(%p) + ruser(%r)) and it keeps things short and sweet.

portspoof trolling

Marius once told me about portspoof. A service to troll those who use various scanners by feeding the scanners with false results. Well, while the idea is good, I’m wary about a service like this as this is the kind of service where you wouldn’t want a buffer overflow.

Giving it a run inside a VM, I noticed something odd when using nmap’s service and version detection probes. This happened on the lower ports (1-50). Then I started to look at something that started to look like a pattern, therefore I increased the port range to include 1-50. portspoof is indeed a tool that trolls baddies and pen testers.

Ran it with:

nmap -sV --version-all -p 1-50

Really smooth guys, really smooth. Sometimes you have to see the big picture:

Splitting a string every nth char in shell

I needed some reusable stuff that splits a string every nth char. Then I remembered that bash and zsh, the shells that I usually use, support string slicing. Kinda like Python does. Or the other way around. Made a shell function. Dropped in into .bashrc / .zshrc. Enjoy.

function string_split()
	while [ ! -z "$str" ]
		echo "${str:0:$count}"


string_split abcd 2

Use the cache, Luke, Part 2: don’t put all your eggs into the memcached buck … basket

This is the second part of a series called: Use the cache, Luke. If you missed the first part, here it is: From memcached to Membase memcached buckets. Meanwhile, the AWS ElastiCache service proved to have better network latency than our own rolled out Membase setup, therefore the migration was easily done by simply switching the memcached config. No vendor lock in.

However, it took me a while to write this second part.

If you can see this, then you might need a Flash Player upgrade or you need to install Flash Player if it’s missing. Get Flash Player from Adobe. This error may appear if the URL path to the embedded object is broken or you have connectivity issue to the embedded object. Powered BY XVE Various Embed.

Please have a look at the above video. Besides the general common sense guidelines about how to scale your stuff, and the Postgres typical stuff, there’s a general rule: cache, cache, and then cache some more.

However, too much caching in memcache (whatever implementation) may kill the application at some point. The application may not be database dependent, but it is cache dependent. Anything that affects the cache may have the effect of a sledgehammer on your database. Of couse, you can always scale vertically that DB instance, scale horizontally by adding read-only replicas, but the not-so-fun part is that it costs a lot just to have the provisioned resources in order to survive a cache failure.

The second option is to have a short lived failover cache on the application server. Something like five minutes, while the distributed cache from memcache may last for hours. Enough to keep the database from being hit from live traffic, while you don’t have to provision a really large database instance. Of course, it won’t work with stuff that needs some “real time” junk, but it works with data that doesn’t change with each request.

There are a lot of options for a failover cache since there’s no distributed setup to think about. It may be a memcached daemon, something like PHP’s APC API, or, the fastest option: the file based caching. Now you may think that I’m insane, but memcached still has the IPC penalty, especially for TCP communication, while if you’re a PHP user, APC doesn’t perform as expected.

I say file based caching, not disk based caching, as the kernel does a pretty good job at “eating your RAM” with the disk caching stuff. It takes more to implement it since the cache management logic must be implemented into the application itself, you don’t have stuff like LRU, expiration, etc. by default, but for failover reasons, it is good enough to worth the effort. In fact, it ran for a few days on the failover cache without any measurable impact.

The next part for not using the same basket for all of your eggs is: cache everywhere you can. For example, by using the nginx FastCGI cache, we could shave off 40% of our CPU load. Nothing experimental about this last part. It is production for the last 18 months. If you get it right, then it could be a really valuable addition to a web stack. However, a lot of testing is required before pushing the changes to production. We hit a lot of weird bugs for edge cases. The rule of thumb is: if you get the cache key right, then most of the issues are gone before going live.

In fact, by adding the cache control stuff from the application itself, we could push relatively shortly lived pages to the CDN edges, shaving off a lot of latency for repeated requests as there’s no round trip from the hosting data center to the CDN edge. Yes, it’s the latency, stupid. The dynamic acceleration that CDNs provide is nice. Leveraging the HTTP caching capabilities is nicer. Having the application in a data center closer to the client is desirable, but unless your target market is more distributed than having a bunch of machines into the same geo location, it doesn’t make any sense to deploy into a new data center which adds its fair share of complexity when scaling the data layer.