For some reason I need that. OK, not any reason. For integrating a CloudInit YAML file into an AWS CloudFormation template. By using this article as reference, I made a simple node.js script for doing just that.
This is the second part of a series called: Use the cache, Luke. If you missed the first part, here it is: From memcached to Membase memcached buckets. Meanwhile, the AWS ElastiCache service proved to have better network latency than our own rolled out Membase setup, therefore the migration was easily done by simply switching the memcached config. No vendor lock in.
However, it took me a while to write this second part.
Please have a look at the above video. Besides the general common sense guidelines about how to scale your stuff, and the Postgres typical stuff, there’s a general rule: cache, cache, and then cache some more.
However, too much caching in memcache (whatever implementation) may kill the application at some point. The application may not be database dependent, but it is cache dependent. Anything that affects the cache may have the effect of a sledgehammer on your database. Of couse, you can always scale vertically that DB instance, scale horizontally by adding read-only replicas, but the not-so-fun part is that it costs a lot just to have the provisioned resources in order to survive a cache failure.
The second option is to have a short lived failover cache on the application server. Something like five minutes, while the distributed cache from memcache may last for hours. Enough to keep the database from being hit from live traffic, while you don’t have to provision a really large database instance. Of course, it won’t work with stuff that needs some “real time” junk, but it works with data that doesn’t change with each request.
There are a lot of options for a failover cache since there’s no distributed setup to think about. It may be a memcached daemon, something like PHP’s APC API, or, the fastest option: the file based caching. Now you may think that I’m insane, but memcached still has the IPC penalty, especially for TCP communication, while if you’re a PHP user, APC doesn’t perform as expected.
I say file based caching, not disk based caching, as the kernel does a pretty good job at “eating your RAM” with the disk caching stuff. It takes more to implement it since the cache management logic must be implemented into the application itself, you don’t have stuff like LRU, expiration, etc. by default, but for failover reasons, it is good enough to worth the effort. In fact, it ran for a few days on the failover cache without any measurable impact.
The next part for not using the same basket for all of your eggs is: cache everywhere you can. For example, by using the nginx FastCGI cache, we could shave off 40% of our CPU load. Nothing experimental about this last part. It is production for the last 18 months. If you get it right, then it could be a really valuable addition to a web stack. However, a lot of testing is required before pushing the changes to production. We hit a lot of weird bugs for edge cases. The rule of thumb is: if you get the cache key right, then most of the issues are gone before going live.
In fact, by adding the cache control stuff from the application itself, we could push relatively shortly lived pages to the CDN edges, shaving off a lot of latency for repeated requests as there’s no round trip from the hosting data center to the CDN edge. Yes, it’s the latency, stupid. The dynamic acceleration that CDNs provide is nice. Leveraging the HTTP caching capabilities is nicer. Having the application in a data center closer to the client is desirable, but unless your target market is more distributed than having a bunch of machines into the same geo location, it doesn’t make any sense to deploy into a new data center which adds its fair share of complexity when scaling the data layer.
Matt Ingenthron said internally at Membase Inc they view Memcached as a rabbit. It is fast, but it is pretty dumb and procreates quickly. Before you know it, it will be running wild all over your system.
But this post isn’t about switching from a volatile cache to a persistent solution. It is about removing the dumb part from the memcached setup.
We started with memcached as this is the first step. The setup had its quirks since AWS EC2 doesn’t provide by default a fixed addressing method while the memcached client from PHP still has issues with the timeouts. Therefore, the fallback was the plain memcache client.
The fixed addressing issue was resolved by deploying Elastic IPs with a little trick for the internal network, as explained by Eric Hammond. This might be unfeasible for large enough deployments, but it wasn’t our case. Amazon introduced ElastiCache since then which removes this limitation, but having a bunch of t1.micros with reservation is still way much cheaper. Which makes me wonder why they won’t introduce machine addresses which internally resolve as internal address. They have this technology for a lot of their services, but it is simply unavailable for plain EC2 instances.
Back to the memcached issues. Having a Membase cluster that provides a memcached bucket is a nice drop-in replacement, if you lower a little bit your memory allocation. Membase over memcached still has some overhead as its services tend to occupy more RAM. The great thing is that the cluster requires fewer machines with fixed addressing. We use a couple for high availability reasons, but this is not the rule. The rest have the EC2 provided dynamic addresses. If a machine happens to go down, another one can take up its place.
But there still is the client issue. memcached for PHP is dumb. memcache for PHP is even dumber. None of these can actually speak the Membase goodies. This is the part where Moxi (Memcached Proxy) kicks in. For memcached buckets, Moxi can discover the newly added machines to the Membase cluster without doing any client configuration. Without any Moxi server configuration as the config is streamed to the servers via the machines that have the fixed addresses. With plain memcached, every time there was a change, we needed to deploy the application. The memcached cluster was basically nullified till it was refilled. Doesn’t happen with Moxi + Membase. Since there no “smart client” for PHP which includes the Moxi logic, we use client side Moxi in order to reduce the network round-trips. There still is a local communication over the loopback interface, but the latency is far smaller than doing server-side Moxi. Basically the memcache for PHP client connects to 127.0.0.1:11211 aka where Moxi lives, then the request hits the appropriate Membase server that holds our cached data. It also uses the binary protocol and SASL authentication which is unsupported by the memcache for PHP client.
The last of the goodies about the Membase cluster: it actually has an interface. I may not be an UI fan, I live most of my time in /bin/bash, but I am a stats junkie. The Membase web console can give you realtime info about how the cluster is doing. With plain memcached you’re left in the dust with wrapping up your own interface or calling stats over plain TCP. Which is so wrong at so many levels.
PS: v2.0 will be called Couchbase for political reasons. But currently the stable release is still called Membase.
One day we enabled the MySQL’s slow_log feature as indicated by the RDS FAQ. That the (mostly) easy part. I say “mostly” because you need to add your own DB Parameter Group in order to enable the damn thing. Adding a group is easy. Editing it still requires you to use API calls (either via rds-api-tools or your own implementation).
Days started to fly, queries started to fill our log, we started to fix the slow points of the application. The thing that didn’t change is the fact that the mysql.slow_log table kept growing. Then I took some time to apply all my MySQL-fu regarding the cleanup of the mysql.slow_log table. Imagine my surprise when none of it worked. Since the master user of a RDS instance doesn’t have all the privileges, it wasn’t quite unexpected though.
For the first time, the AWS Premium Support was actually useful by sending one email that actually provides a solution. Imagine my surprise. The RDS team implemented a couple of stored procedures that can be used for rotating the slow log and the general log.
Basically they move the content to a *_backup table while the original is replaced by an empty table. The exact quote:
When invoked, these procedures move the contents of the corresponding log to a backup table and clear the contents of the log. For example, invoking rds_rotate_slow_log moves the contents of the slow_log table to a new table called slow_log_backup and then clears the contents of the slow_log table. This is done by renaming tables, so no data is actually copied, making this a very light-weight, non-blocking procedure. Invoking the same procedure twice effectively purges the log from the database.
They are present since March 22, 2010 but nobody took the time to document them, apparently. All I could find via online searches was utterly useless junk. I hope this saves some time for some poor chop into the same situation as I was.
Some people may slip into your head the idea that by doing snapshots, you’re free from the burden of doing proper backups. While this may sound good in theory, in practice there are a bunch of caveats. There are certain technologies that use the snapshot methodology at the core, but they make sure that your data isn’t corrupted. Some may even provide access to the actual file revisions.
The data corruption is the specific topic that snapshots simply don’t care about, at least in Amazon’s way of doing things. This isn’t exactly Amazon’s fault for EC2. EBS actually stands for Elastic Block Storage. They provide you a block storage, you do whatever you want with it. For RDS they should do a better job though as it’s a managed service where you don’t have access to the actual instance. The issue is those ‘specialists’ that put emphasis onto the ‘easy, cloud-ish way’ of doing backups by using snapshots. If you’re new to the ‘cloud’ stuff as I used to be, you may actually believe that crap. As I used to believe.
A couple of real life examples:
An EBS-backed instance suffered some filesystem level corruption. Since EXT3 is not as smart as ZFS if we’re talking about silent data corruption, you may never know until it’s too late. Going back through revisions in order to find the last good piece of data is a pain. I could fix the filesystem corruption, I could retrieve the lost data, but I had to work quite a lot for that. Luck is an important skill, but I’d rather not put all my eggs into the luck basket.
An RDS instance ran out of space. There wasn’t a notification to tell me: ‘yo dumbass, ya ran out of space’. Statistically it wasn’t the case, but a huge data import proved me wrong. I increased the available storage. Problem solved. A day later, somebody dropped by accident a couple of tables. I had to restore them. How? Take the latest snapshot, spin up a new instance, dig through the data. The latest snapshot contained a couple of corrupted databases due to the space issue, one of them being the database I needed to restore. I had to take a bunch of time in order to repair the database before the restoration process. Fortunately nothing really bad happened. But it was a signal that the RDS snapshot methodology is broken by design.
Lesson learned. The current way of doing backups puts the data, not the block storage, first. If you’re doing EBS snapshots as the sole method, you may need to rethink your strategy.