Category Archives: Systems Administration

Splitting a string every nth char in shell

I needed some reusable stuff that splits a string every nth char. Then I remembered that bash and zsh, the shells that I usually use, support string slicing. Kinda like Python does. Or the other way around. Made a shell function. Dropped in into .bashrc / .zshrc. Enjoy.

function string_split()
{
	str="$1"
	count=$2
	while [ ! -z "$str" ]
	do
		echo "${str:0:$count}"
		str="${str:$count}"
	done
}

Example:

string_split abcd 2
ab
cd

Use the cache, Luke, Part 2: don’t put all your eggs into the memcached buck … basket

This is the second part of a series called: Use the cache, Luke. If you missed the first part, here it is: From memcached to Membase memcached buckets. Meanwhile, the AWS ElastiCache service proved to have better network latency than our own rolled out Membase setup, therefore the migration was easily done by simply switching the memcached config. No vendor lock in.

However, it took me a while to write this second part.

If you can see this, then you might need a Flash Player upgrade or you need to install Flash Player if it’s missing. Get Flash Player from Adobe. This error may appear if the URL path to the embedded object is broken or you have connectivity issue to the embedded object. Powered BY XVE Various Embed.

Please have a look at the above video. Besides the general common sense guidelines about how to scale your stuff, and the Postgres typical stuff, there’s a general rule: cache, cache, and then cache some more.

However, too much caching in memcache (whatever implementation) may kill the application at some point. The application may not be database dependent, but it is cache dependent. Anything that affects the cache may have the effect of a sledgehammer on your database. Of couse, you can always scale vertically that DB instance, scale horizontally by adding read-only replicas, but the not-so-fun part is that it costs a lot just to have the provisioned resources in order to survive a cache failure.

The second option is to have a short lived failover cache on the application server. Something like five minutes, while the distributed cache from memcache may last for hours. Enough to keep the database from being hit from live traffic, while you don’t have to provision a really large database instance. Of course, it won’t work with stuff that needs some “real time” junk, but it works with data that doesn’t change with each request.

There are a lot of options for a failover cache since there’s no distributed setup to think about. It may be a memcached daemon, something like PHP’s APC API, or, the fastest option: the file based caching. Now you may think that I’m insane, but memcached still has the IPC penalty, especially for TCP communication, while if you’re a PHP user, APC doesn’t perform as expected.

I say file based caching, not disk based caching, as the kernel does a pretty good job at “eating your RAM” with the disk caching stuff. It takes more to implement it since the cache management logic must be implemented into the application itself, you don’t have stuff like LRU, expiration, etc. by default, but for failover reasons, it is good enough to worth the effort. In fact, it ran for a few days on the failover cache without any measurable impact.

The next part for not using the same basket for all of your eggs is: cache everywhere you can. For example, by using the nginx FastCGI cache, we could shave off 40% of our CPU load. Nothing experimental about this last part. It is production for the last 18 months. If you get it right, then it could be a really valuable addition to a web stack. However, a lot of testing is required before pushing the changes to production. We hit a lot of weird bugs for edge cases. The rule of thumb is: if you get the cache key right, then most of the issues are gone before going live.

In fact, by adding the cache control stuff from the application itself, we could push relatively shortly lived pages to the CDN edges, shaving off a lot of latency for repeated requests as there’s no round trip from the hosting data center to the CDN edge. Yes, it’s the latency, stupid. The dynamic acceleration that CDNs provide is nice. Leveraging the HTTP caching capabilities is nicer. Having the application in a data center closer to the client is desirable, but unless your target market is more distributed than having a bunch of machines into the same geo location, it doesn’t make any sense to deploy into a new data center which adds its fair share of complexity when scaling the data layer.

Reverse dependencies for the installed packages in Debian + friends

Some libraries are more libraries than others. It is one of those moments when you ask yourself if migrating to a newer version of a library fucks up the entire system. But you need that foo library as it implements feature bar. In my case, I wanted libpcre3 8.20+ in order to enable PCRE JIT. Though luck. Not even Debian sid packages 8.20.

Now I know that there’s apt-cache rdepends, but it lists all the reverse dependencies of a specific package. I needed just the reverse dependencies of the installed packages. With a little bash-fu, here it goes:

#!/bin/bash
 
function package_rdepends
{
	for package in $(apt-cache rdepends $1 | grep -Ev "^$1$" | grep -v 'Reverse Depends:')
	do
		apt-cache policy $package | grep 'Installed: (none)' > /dev/null 2>&1
		if [ $? -eq 1 ]
		then
			echo $package
		fi
	done
}
 
package_rdepends $1 | sort -u

Saved as installed-rdepends. Made executable.

./installed-rdepends libpcre3
grep
libglib2.0-0
libglib2.0-dev
libpcre3-dev
libpcrecpp0

The above script may be slow for packages with many reverse dependencies due to the fact that each package has an individual lookup. Didn’t have the patience to measure the time it takes to do a lookup for libc6. Some benchmarks for the package lookup:

time apt-cache policy libpcre3 | grep 'Installed: (none)' > /dev/null 2>&1
 
real	0m0.006s
user	0m0.005s
sys	0m0.003s
 
time dpkg -L libpcre3 > /dev/null 2>&1
 
real	0m0.017s
user	0m0.012s
sys	0m0.005s
 
time dpkg -l libpcre3 > /dev/null 2>&1
 
real	0m0.667s
user	0m0.600s
sys	0m0.067s
 
time dpkg -s libpcre3 > /dev/null 2>&1
 
real	0m0.587s
user	0m0.533s
sys	0m0.054s
 
time cat /var/lib/dpkg/available | grep -E "Package: libpcre3$" > /dev/null 2>&1
 
real	0m0.034s
user	0m0.015s
sys	0m0.048s

However, I didn’t try these results on a bare metal installation.

Use the cache, Luke, Part 1: from memcached to Membase memcached buckets

I start with a quote:

Matt Ingenthron said internally at Membase Inc they view Memcached as a rabbit. It is fast, but it is pretty dumb and procreates quickly. Before you know it, it will be running wild all over your system.

But this post isn’t about switching from a volatile cache to a persistent solution. It is about removing the dumb part from the memcached setup.

We started with memcached as this is the first step. The setup had its quirks since AWS EC2 doesn’t provide by default a fixed addressing method while the memcached client from PHP still has issues with the timeouts. Therefore, the fallback was the plain memcache client.

The fixed addressing issue was resolved by deploying Elastic IPs with a little trick for the internal network, as explained by Eric Hammond. This might be unfeasible for large enough deployments, but it wasn’t our case. Amazon introduced ElastiCache since then which removes this limitation, but having a bunch of t1.micros with reservation is still way much cheaper. Which makes me wonder why they won’t introduce machine addresses which internally resolve as internal address. They have this technology for a lot of their services, but it is simply unavailable for plain EC2 instances.

Back to the memcached issues. Having a Membase cluster that provides a memcached bucket is a nice drop-in replacement, if you lower a little bit your memory allocation. Membase over memcached still has some overhead as its services tend to occupy more RAM. The great thing is that the cluster requires fewer machines with fixed addressing. We use a couple for high availability reasons, but this is not the rule. The rest have the EC2 provided dynamic addresses. If a machine happens to go down, another one can take up its place.

But there still is the client issue. memcached for PHP is dumb. memcache for PHP is even dumber. None of these can actually speak the Membase goodies. This is the part where Moxi (Memcached Proxy) kicks in. For memcached buckets, Moxi can discover the newly added machines to the Membase cluster without doing any client configuration. Without any Moxi server configuration as the config is streamed to the servers via the machines that have the fixed addresses. With plain memcached, every time there was a change, we needed to deploy the application. The memcached cluster was basically nullified till it was refilled. Doesn’t happen with Moxi + Membase. Since there no “smart client” for PHP which includes the Moxi logic, we use client side Moxi in order to reduce the network round-trips. There still is a local communication over the loopback interface, but the latency is far smaller than doing server-side Moxi. Basically the memcache for PHP client connects to 127.0.0.1:11211 aka where Moxi lives, then the request hits the appropriate Membase server that holds our cached data. It also uses the binary protocol and SASL authentication which is unsupported by the memcache for PHP client.

The last of the goodies about the Membase cluster: it actually has an interface. I may not be an UI fan, I live most of my time in /bin/bash, but I am a stats junkie. The Membase web console can give you realtime info about how the cluster is doing. With plain memcached you’re left in the dust with wrapping up your own interface or calling stats over plain TCP. Which is so wrong at so many levels.

PS: v2.0 will be called Couchbase for political reasons. But currently the stable release is still called Membase.

How to rotate the MySQL logs on Amazon RDS

One day we enabled the MySQL’s slow_log feature as indicated by the RDS FAQ. That the (mostly) easy part. I say “mostly” because you need to add your own DB Parameter Group in order to enable the damn thing. Adding a group is easy. Editing it still requires you to use API calls (either via rds-api-tools or your own implementation).

Days started to fly, queries started to fill our log, we started to fix the slow points of the application. The thing that didn’t change is the fact that the mysql.slow_log table kept growing. Then I took some time to apply all my MySQL-fu regarding the cleanup of the mysql.slow_log table. Imagine my surprise when none of it worked. Since the master user of a RDS instance doesn’t have all the privileges, it wasn’t quite unexpected though.

For the first time, the AWS Premium Support was actually useful by sending one email that actually provides a solution. Imagine my surprise. The RDS team implemented a couple of stored procedures that can be used for rotating the slow log and the general log.

CALL mysql.rds_rotate_slow_log;
CALL mysql.rds_rotate_general_log;

Basically they move the content to a *_backup table while the original is replaced by an empty table. The exact quote:

When invoked, these procedures move the contents of the corresponding log to a backup table and clear the contents of the log. For example, invoking rds_rotate_slow_log moves the contents of the slow_log table to a new table called slow_log_backup and then clears the contents of the slow_log table. This is done by renaming tables, so no data is actually copied, making this a very light-weight, non-blocking procedure. Invoking the same procedure twice effectively purges the log from the database.

They are present since March 22, 2010 but nobody took the time to document them, apparently. All I could find via online searches was utterly useless junk. I hope this saves some time for some poor chop into the same situation as I was.