For some reason I need that. OK, not any reason. For integrating a CloudInit YAML file into an AWS CloudFormation template. By using this article as reference, I made a simple node.js script for doing just that.
This is the second part of a series called: Use the cache, Luke. If you missed the first part, here it is: From memcached to Membase memcached buckets. Meanwhile, the AWS ElastiCache service proved to have better network latency than our own rolled out Membase setup, therefore the migration was easily done by simply switching the memcached config. No vendor lock in.
However, it took me a while to write this second part.
Please have a look at the above video. Besides the general common sense guidelines about how to scale your stuff, and the Postgres typical stuff, there’s a general rule: cache, cache, and then cache some more.
However, too much caching in memcache (whatever implementation) may kill the application at some point. The application may not be database dependent, but it is cache dependent. Anything that affects the cache may have the effect of a sledgehammer on your database. Of couse, you can always scale vertically that DB instance, scale horizontally by adding read-only replicas, but the not-so-fun part is that it costs a lot just to have the provisioned resources in order to survive a cache failure.
The second option is to have a short lived failover cache on the application server. Something like five minutes, while the distributed cache from memcache may last for hours. Enough to keep the database from being hit from live traffic, while you don’t have to provision a really large database instance. Of course, it won’t work with stuff that needs some “real time” junk, but it works with data that doesn’t change with each request.
There are a lot of options for a failover cache since there’s no distributed setup to think about. It may be a memcached daemon, something like PHP’s APC API, or, the fastest option: the file based caching. Now you may think that I’m insane, but memcached still has the IPC penalty, especially for TCP communication, while if you’re a PHP user, APC doesn’t perform as expected.
I say file based caching, not disk based caching, as the kernel does a pretty good job at “eating your RAM” with the disk caching stuff. It takes more to implement it since the cache management logic must be implemented into the application itself, you don’t have stuff like LRU, expiration, etc. by default, but for failover reasons, it is good enough to worth the effort. In fact, it ran for a few days on the failover cache without any measurable impact.
The next part for not using the same basket for all of your eggs is: cache everywhere you can. For example, by using the nginx FastCGI cache, we could shave off 40% of our CPU load. Nothing experimental about this last part. It is production for the last 18 months. If you get it right, then it could be a really valuable addition to a web stack. However, a lot of testing is required before pushing the changes to production. We hit a lot of weird bugs for edge cases. The rule of thumb is: if you get the cache key right, then most of the issues are gone before going live.
In fact, by adding the cache control stuff from the application itself, we could push relatively shortly lived pages to the CDN edges, shaving off a lot of latency for repeated requests as there’s no round trip from the hosting data center to the CDN edge. Yes, it’s the latency, stupid. The dynamic acceleration that CDNs provide is nice. Leveraging the HTTP caching capabilities is nicer. Having the application in a data center closer to the client is desirable, but unless your target market is more distributed than having a bunch of machines into the same geo location, it doesn’t make any sense to deploy into a new data center which adds its fair share of complexity when scaling the data layer.
When it turns out that the cost for keeping few gigabytes of data is too fucking much.
When it turns out that it is not keeping the most basic promises. The AWS marketing machine did it. Again.
When it turns out that the latency is absolutely crap. I mean, SDB vs. RDS, as shown by New Relic: 183 ms vs. 1.6 ms. And I’m only talking about averages. Plotting the whole stuff on a graph along with the standard deviation will drive insane a statistician.
I could go about this all day long. But why bother.
Matt Ingenthron said internally at Membase Inc they view Memcached as a rabbit. It is fast, but it is pretty dumb and procreates quickly. Before you know it, it will be running wild all over your system.
But this post isn’t about switching from a volatile cache to a persistent solution. It is about removing the dumb part from the memcached setup.
We started with memcached as this is the first step. The setup had its quirks since AWS EC2 doesn’t provide by default a fixed addressing method while the memcached client from PHP still has issues with the timeouts. Therefore, the fallback was the plain memcache client.
The fixed addressing issue was resolved by deploying Elastic IPs with a little trick for the internal network, as explained by Eric Hammond. This might be unfeasible for large enough deployments, but it wasn’t our case. Amazon introduced ElastiCache since then which removes this limitation, but having a bunch of t1.micros with reservation is still way much cheaper. Which makes me wonder why they won’t introduce machine addresses which internally resolve as internal address. They have this technology for a lot of their services, but it is simply unavailable for plain EC2 instances.
Back to the memcached issues. Having a Membase cluster that provides a memcached bucket is a nice drop-in replacement, if you lower a little bit your memory allocation. Membase over memcached still has some overhead as its services tend to occupy more RAM. The great thing is that the cluster requires fewer machines with fixed addressing. We use a couple for high availability reasons, but this is not the rule. The rest have the EC2 provided dynamic addresses. If a machine happens to go down, another one can take up its place.
But there still is the client issue. memcached for PHP is dumb. memcache for PHP is even dumber. None of these can actually speak the Membase goodies. This is the part where Moxi (Memcached Proxy) kicks in. For memcached buckets, Moxi can discover the newly added machines to the Membase cluster without doing any client configuration. Without any Moxi server configuration as the config is streamed to the servers via the machines that have the fixed addresses. With plain memcached, every time there was a change, we needed to deploy the application. The memcached cluster was basically nullified till it was refilled. Doesn’t happen with Moxi + Membase. Since there no “smart client” for PHP which includes the Moxi logic, we use client side Moxi in order to reduce the network round-trips. There still is a local communication over the loopback interface, but the latency is far smaller than doing server-side Moxi. Basically the memcache for PHP client connects to 127.0.0.1:11211 aka where Moxi lives, then the request hits the appropriate Membase server that holds our cached data. It also uses the binary protocol and SASL authentication which is unsupported by the memcache for PHP client.
The last of the goodies about the Membase cluster: it actually has an interface. I may not be an UI fan, I live most of my time in /bin/bash, but I am a stats junkie. The Membase web console can give you realtime info about how the cluster is doing. With plain memcached you’re left in the dust with wrapping up your own interface or calling stats over plain TCP. Which is so wrong at so many levels.
PS: v2.0 will be called Couchbase for political reasons. But currently the stable release is still called Membase.
One day we enabled the MySQL’s slow_log feature as indicated by the RDS FAQ. That the (mostly) easy part. I say “mostly” because you need to add your own DB Parameter Group in order to enable the damn thing. Adding a group is easy. Editing it still requires you to use API calls (either via rds-api-tools or your own implementation).
Days started to fly, queries started to fill our log, we started to fix the slow points of the application. The thing that didn’t change is the fact that the mysql.slow_log table kept growing. Then I took some time to apply all my MySQL-fu regarding the cleanup of the mysql.slow_log table. Imagine my surprise when none of it worked. Since the master user of a RDS instance doesn’t have all the privileges, it wasn’t quite unexpected though.
For the first time, the AWS Premium Support was actually useful by sending one email that actually provides a solution. Imagine my surprise. The RDS team implemented a couple of stored procedures that can be used for rotating the slow log and the general log.
Basically they move the content to a *_backup table while the original is replaced by an empty table. The exact quote:
When invoked, these procedures move the contents of the corresponding log to a backup table and clear the contents of the log. For example, invoking rds_rotate_slow_log moves the contents of the slow_log table to a new table called slow_log_backup and then clears the contents of the slow_log table. This is done by renaming tables, so no data is actually copied, making this a very light-weight, non-blocking procedure. Invoking the same procedure twice effectively purges the log from the database.
They are present since March 22, 2010 but nobody took the time to document them, apparently. All I could find via online searches was utterly useless junk. I hope this saves some time for some poor chop into the same situation as I was.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.