Category Archives: Cloud

Doing what Dropbox is doing and doing it wrong

Let’s take a couple of examples. Switched from an older machine recently, therefore I need to setup all my stuff. As I don’t like to depend on a single service, for redundancy’s sake, I also keep a backup for Dropbox.

SpiderOak – backs up stuff, uses client side encryption, has optional sync between your machines. So far, so good. In the latest OS X client, at least, the possibility to paste the password is missing. Thanks, I’ll me use my password manager instead with services that don’t do such a braindead thing. Seriously, there’s a thing that improves the security of the password authentication. It is called two factor authentication. Dropbox has it. Google has it. In fact, any decent service has it. Disabling the possibility to paste the password, not so much.

Google Drive – you wouldn’t think I’m letting Google of the hook this time. As I don’t trust with my data these sync services, I always do client side encryption. Dropbox doesn’t choke on it, SpiderOak doesn’t choke on it. Google Drive must be a special kind of breed as it chokes on my encrypted files with “Upload Error – An unknown issue has occurred ”. Gee, let me fix the error message for you: “your piece of shit encrypted files aren’t of any use for us, there’s no personal info there”. Was it that difficult? Thanks, but the market is full of alternatives. Seriously Google, you could do better than this “not being evil” thing.

When no to use Amazon’s SimpleDB

When it turns out that the cost for keeping few gigabytes of data is too fucking much.

When it turns out that it is not keeping the most basic promises. The AWS marketing machine did it. Again.

When it turns out that the latency is absolutely crap. I mean, SDB vs. RDS, as shown by New Relic: 183 ms vs. 1.6 ms. And I’m only talking about averages. Plotting the whole stuff on a graph along with the standard deviation will drive insane a statistician.

I could go about this all day long. But why bother.

How to rotate the MySQL logs on Amazon RDS

One day we enabled the MySQL’s slow_log feature as indicated by the RDS FAQ. That the (mostly) easy part. I say “mostly” because you need to add your own DB Parameter Group in order to enable the damn thing. Adding a group is easy. Editing it still requires you to use API calls (either via rds-api-tools or your own implementation).

Days started to fly, queries started to fill our log, we started to fix the slow points of the application. The thing that didn’t change is the fact that the mysql.slow_log table kept growing. Then I took some time to apply all my MySQL-fu regarding the cleanup of the mysql.slow_log table. Imagine my surprise when none of it worked. Since the master user of a RDS instance doesn’t have all the privileges, it wasn’t quite unexpected though.

For the first time, the AWS Premium Support was actually useful by sending one email that actually provides a solution. Imagine my surprise. The RDS team implemented a couple of stored procedures that can be used for rotating the slow log and the general log.

CALL mysql.rds_rotate_slow_log;
CALL mysql.rds_rotate_general_log;

Basically they move the content to a *_backup table while the original is replaced by an empty table. The exact quote:

When invoked, these procedures move the contents of the corresponding log to a backup table and clear the contents of the log. For example, invoking rds_rotate_slow_log moves the contents of the slow_log table to a new table called slow_log_backup and then clears the contents of the slow_log table. This is done by renaming tables, so no data is actually copied, making this a very light-weight, non-blocking procedure. Invoking the same procedure twice effectively purges the log from the database.

They are present since March 22, 2010 but nobody took the time to document them, apparently. All I could find via online searches was utterly useless junk. I hope this saves some time for some poor chop into the same situation as I was.

Snapshots are not backups

Some people may slip into your head the idea that by doing snapshots, you’re free from the burden of doing proper backups. While this may sound good in theory, in practice there are a bunch of caveats. There are certain technologies that use the snapshot methodology at the core, but they make sure that your data isn’t corrupted. Some may even provide access to the actual file revisions.

The data corruption is the specific topic that snapshots simply don’t care about, at least in Amazon’s way of doing things. This isn’t exactly Amazon’s fault for EC2. EBS actually stands for Elastic Block Storage. They provide you a block storage, you do whatever you want with it. For RDS they should do a better job though as it’s a managed service where you don’t have access to the actual instance. The issue is those ‘specialists’ that put emphasis onto the ‘easy, cloud-ish way’ of doing backups by using snapshots. If you’re new to the ‘cloud’ stuff as I used to be, you may actually believe that crap. As I used to believe.

A couple of real life examples:

  • An EBS-backed instance suffered some filesystem level corruption. Since EXT3 is not as smart as ZFS if we’re talking about silent data corruption, you may never know until it’s too late. Going back through revisions in order to find the last good piece of data is a pain. I could fix the filesystem corruption, I could retrieve the lost data, but I had to work quite a lot for that. Luck is an important skill, but I’d rather not put all my eggs into the luck basket.
  • An RDS instance ran out of space. There wasn’t a notification to tell me: ‘yo dumbass, ya ran out of space’. Statistically it wasn’t the case, but a huge data import proved me wrong. I increased the available storage. Problem solved. A day later, somebody dropped by accident a couple of tables. I had to restore them. How? Take the latest snapshot, spin up a new instance, dig through the data. The latest snapshot contained a couple of corrupted databases due to the space issue, one of them being the database I needed to restore. I had to take a bunch of time in order to repair the database before the restoration process. Fortunately nothing really bad happened. But it was a signal that the RDS snapshot methodology is broken by design.

Lesson learned. The current way of doing backups puts the data, not the block storage, first. If you’re doing EBS snapshots as the sole method, you may need to rethink your strategy.

Boxing on EC2 – Windows Server instances

> bashing mode on

Apparently this ought to be a relative easy task, right? Wrong! I don’t know if Amazon is to blame. Or Microsoft. Or both. Really don’t care. The annoying thing is the large pile of fail that destroys the productivity for the most basic tasks. However, besides cookies, we, the sysadmins of the Dark Side, also have solutions. We have to.

Some that know me better may think that I hate MS/Windows. I don’t, as long I don’t have to administer those machines. I do have a Windows 7 Pro box at home. While the Windows soft-RAID implementation blows for certain reasons, otherwise than this and some of the small quirks that every OS has, is a pretty dandy setup. Therefore no, I am not a pure Windows hater. Don’t get started with the wrong impression.

The first thing that’s really annoying is the long period that takes for the instance to be ready. In order to get the ‘Administrator’ password it takes ages. Amazon says it takes between 15-30 minutes. Come on EC2, you take a AMI off the S3 and spin up a new instance, then setup a new password. How hard can it be? A Linux instances is ready almost instantly after the console says it’s running. Even more, after the password is available via the console or the API (sometimes one of these gets it faster) the RDP still takes a few minutes before being available. Since Amazon & Microsoft don’t provide this for free, even more, we pay by the hour, STOP WASTING OUR TIME. Sincerely, the EC2 Windows Server users.

The second really annoying thing is the fact that Amazon recommends you to change the default password that they provide. While the recommendation is good, if you do this, you get locked out the instance. If you don’t have a second account with RDP access, you have one option: terminate the instance. The almost useless AWS Premium support probably will be able to do something about this. But if you’re in a hurry, going back to square one is probably the best option.

You have a workaround though that I discovered during my time boxing with EC2 and Windows Server: add the Administrator to the RDP allowed list, although the window says that ‘Administrator already has access’. If you change the password, the Administrator loses the RDP access, even though the password is valid. The welcome screen will tell you rubbish about using the wrong password, when clearly the input is good. I tried to reproduce this few times. Each time I got locked out. I even validated the new password by running something with runas. What? Wrong password? Listen a’holes, I asked you to change the f’ing password, not make me French fries. How hard is not to screw up such a basic task? Initially I though my Terminal Server Client is dumb, but the *nix guys don’t have any screw ups here. Using a Windows 7 client did the same.

If you want to create a new AMI off the existing instance, boy, more gotchas are waiting around the corner. Even if the Administrator password is successfully changed, the RDP access is preserved, the ‘Set Password’ options of the ‘Ec2 Service Properties’ application are unchecked, sysprep manages to tank the instance and the new AMI. Administrator completely loses the access on both instances, the existing and new new one. Tried with the ‘Set a random password’ option. Didn’t work as expected. Didn’t try the ‘SetPassword’ feature after sysprep, but I had enough anyway. I read somewhere around the Interwebs that sysprep somewhat affects the Administrator user, but I didn’t get any relevant information. Searching for it is a mess.

My workaround: create a new user part of the Administrators group. Set a strong password for that user. Amazon sets a joke anyway. I wouldn’t call that strong. Disable the Administrator account:

net user Administrator /active:no

Enjoy your newly created AMIs without surprises. Didn’t have any issues with other Administrators besides Administrator. Plus it is available as soon as the RDP service is available, unlike the ‘Amazon way’ of keeping you waiting 15 to 30 minutes, for nothing.

The third really annoying thing is the fact that the AMIs are awfully outdated. Since there’s an official partnership between Amazon and Microsoft I would expect a higher level of support. I know, there’s Windows Update, but even after all those years, it still doesn’t manage to do it into a brain dead-free way. I know I’m biased as I’m used to Debian and friends, but the OS upgrade could be a little bit faster. OK, a lot more faster. Downloading and installing 54 updates that have around 200MiB takes around a hour and half, on a c1.medium instance which is the best specs you can get from a x86 instance. It takes 3 reboots. It takes 4 times to click the ‘Check for updates’ link in order to actually be up to date in the end. I get that there are dependencies, but an OS that claims is ‘Datacenter’ grade should handle this properly: download all your junk, deal with it later. Even more, I used c1.medium because m1.small is even slower for the Windows Update part. Even when the update manager claims that it’s only downloading updates, the CPU usage won’t go under 50%. What? A download that takes a lot of CPU time? WTF? The actual update doesn’t do it nicer or faster anyway.

A fellow sysadmin suggested vbscript + AutoIt. Now why the hell should I write a script to fix issues that should not exist in the first place? Don’t know why, but Windows Server under EC2 still feels like a second class citizen, despite the fact that you pay extra for the privilege of running Windows. Anyway, in the end I managed to bake my own AMI that’s ready for spinning up new instances for our HA setup. But the experience wasn’t even remotely close to the advertised level.

> bashing mode off