Category Archives: Cloud

Converting a file to a JSON array

For some reason I need that. OK, not any reason. For integrating a CloudInit YAML file into an AWS CloudFormation template. By using this article as reference, I made a simple node.js script for doing just that.

#!/usr/bin/env node
 
var fs = require('fs');
 
fs.readFile(process.argv[2], function (err, file) {
	if (err) {
		console.error(err);
		process.exit(1);
	}
	file = file.toString().split('\n');
	var idx, aux = [];
	for (idx = 0; idx < file.length; idx++) {
		aux.push(file[idx]);
		aux.push('\n');
	}
	file = JSON.stringify(aux);
	console.log(file);
});

Save as something.js, make it an executable, then invoke it with ./something.js /path/to/file.

The end.

Doing what Dropbox is doing and doing it wrong

Let’s take a couple of examples. Switched from an older machine recently, therefore I need to setup all my stuff. As I don’t like to depend on a single service, for redundancy’s sake, I also keep a backup for Dropbox.

SpiderOak – backs up stuff, uses client side encryption, has optional sync between your machines. So far, so good. In the latest OS X client, at least, the possibility to paste the password is missing. Thanks, I’ll me use my password manager instead with services that don’t do such a braindead thing. Seriously, there’s a thing that improves the security of the password authentication. It is called two factor authentication. Dropbox has it. Google has it. In fact, any decent service has it. Disabling the possibility to paste the password, not so much.

Google Drive – you wouldn’t think I’m letting Google of the hook this time. As I don’t trust with my data these sync services, I always do client side encryption. Dropbox doesn’t choke on it, SpiderOak doesn’t choke on it. Google Drive must be a special kind of breed as it chokes on my encrypted files with “Upload Error – An unknown issue has occurred ”. Gee, let me fix the error message for you: “your piece of shit encrypted files aren’t of any use for us, there’s no personal info there”. Was it that difficult? Thanks, but the market is full of alternatives. Seriously Google, you could do better than this “not being evil” thing.

When no to use Amazon’s SimpleDB

When it turns out that the cost for keeping few gigabytes of data is too fucking much.

When it turns out that it is not keeping the most basic promises. The AWS marketing machine did it. Again.

When it turns out that the latency is absolutely crap. I mean, SDB vs. RDS, as shown by New Relic: 183 ms vs. 1.6 ms. And I’m only talking about averages. Plotting the whole stuff on a graph along with the standard deviation will drive insane a statistician.

I could go about this all day long. But why bother.

How to rotate the MySQL logs on Amazon RDS

One day we enabled the MySQL’s slow_log feature as indicated by the RDS FAQ. That the (mostly) easy part. I say “mostly” because you need to add your own DB Parameter Group in order to enable the damn thing. Adding a group is easy. Editing it still requires you to use API calls (either via rds-api-tools or your own implementation).

Days started to fly, queries started to fill our log, we started to fix the slow points of the application. The thing that didn’t change is the fact that the mysql.slow_log table kept growing. Then I took some time to apply all my MySQL-fu regarding the cleanup of the mysql.slow_log table. Imagine my surprise when none of it worked. Since the master user of a RDS instance doesn’t have all the privileges, it wasn’t quite unexpected though.

For the first time, the AWS Premium Support was actually useful by sending one email that actually provides a solution. Imagine my surprise. The RDS team implemented a couple of stored procedures that can be used for rotating the slow log and the general log.

CALL mysql.rds_rotate_slow_log;
CALL mysql.rds_rotate_general_log;

Basically they move the content to a *_backup table while the original is replaced by an empty table. The exact quote:

When invoked, these procedures move the contents of the corresponding log to a backup table and clear the contents of the log. For example, invoking rds_rotate_slow_log moves the contents of the slow_log table to a new table called slow_log_backup and then clears the contents of the slow_log table. This is done by renaming tables, so no data is actually copied, making this a very light-weight, non-blocking procedure. Invoking the same procedure twice effectively purges the log from the database.

They are present since March 22, 2010 but nobody took the time to document them, apparently. All I could find via online searches was utterly useless junk. I hope this saves some time for some poor chop into the same situation as I was.

Snapshots are not backups

Some people may slip into your head the idea that by doing snapshots, you’re free from the burden of doing proper backups. While this may sound good in theory, in practice there are a bunch of caveats. There are certain technologies that use the snapshot methodology at the core, but they make sure that your data isn’t corrupted. Some may even provide access to the actual file revisions.

The data corruption is the specific topic that snapshots simply don’t care about, at least in Amazon’s way of doing things. This isn’t exactly Amazon’s fault for EC2. EBS actually stands for Elastic Block Storage. They provide you a block storage, you do whatever you want with it. For RDS they should do a better job though as it’s a managed service where you don’t have access to the actual instance. The issue is those ‘specialists’ that put emphasis onto the ‘easy, cloud-ish way’ of doing backups by using snapshots. If you’re new to the ‘cloud’ stuff as I used to be, you may actually believe that crap. As I used to believe.

A couple of real life examples:

  • An EBS-backed instance suffered some filesystem level corruption. Since EXT3 is not as smart as ZFS if we’re talking about silent data corruption, you may never know until it’s too late. Going back through revisions in order to find the last good piece of data is a pain. I could fix the filesystem corruption, I could retrieve the lost data, but I had to work quite a lot for that. Luck is an important skill, but I’d rather not put all my eggs into the luck basket.
  • An RDS instance ran out of space. There wasn’t a notification to tell me: ‘yo dumbass, ya ran out of space’. Statistically it wasn’t the case, but a huge data import proved me wrong. I increased the available storage. Problem solved. A day later, somebody dropped by accident a couple of tables. I had to restore them. How? Take the latest snapshot, spin up a new instance, dig through the data. The latest snapshot contained a couple of corrupted databases due to the space issue, one of them being the database I needed to restore. I had to take a bunch of time in order to repair the database before the restoration process. Fortunately nothing really bad happened. But it was a signal that the RDS snapshot methodology is broken by design.

Lesson learned. The current way of doing backups puts the data, not the block storage, first. If you’re doing EBS snapshots as the sole method, you may need to rethink your strategy.