Category Archives: Rant

Why sometimes I hate RFCs

Every time when there’s a debate about the format of something that floats around the Internets, people go to RFCs in order to figure out who’s right and who’s wrong. Which may be a great thing in theory. In practice, the rocket scientists that wrote those papers might squeeze a lot of confusion into a single page of text, as the G-WAN manual states.

Today’s case was a debate about the Expires header timestamps as defined by the HTTP/1.1 specs (RFC 2616). If you read the 14.21 section regarding the Expires header, you can see the following statement:

The format is an absolute date and time as defined by HTTP-date in section 3.3.1; it MUST be in RFC 1123 date format:

Expires = “Expires” “:” HTTP-date

I made a newb mistake in thinking that the RFC 1123 dates are legal Expires timestamps. Actually, by proof reading 3.3.1 of RFC 2616 you may deduce the following: the dates in use by the HTTP/1.1 protocol are not the dates into the RFC 1123 format, but the actual format is a subset of RFC 1123. The debate started around the GMT specification which in the HTTP/1.1 contexts is actually UTC, but it must be specified as GMT anyway. Even more, +0000 which is valid timezone specifier as defined by RFC 1123 is not valid for Expires timestamps. Although some caches accept +0000 as valid timezone specifier for the HTTP timestamps, some of them don’t.

It isn’t that the RFCs are broken per se, but the language they use can be very confusing sometimes.

How to fix nginx and PHP/FastCGI PATH_INFO issue

You may be disappointed by this statement: you don’t. nginx has something in it that’s broken by design, while the author didn’t bother at least to reply my email, explaining the situation. I can demonstrate this by using some comparisons.

Apache (+mod_php5) knows the difference between a script and a PATH_INFO input request that ends in .php. It would be ridiculous not to do so since the PHP runtime is part of the webserver itself. Didn’t bother to try various Apache + mod_fcgid configurations since most of the time Apache simply wastes my own time. lighttpd binds the FastCGI proxying to the file extension. This is the part where nginx fails: it tries to use a one-size-fits-all configuration logic that actually doesn’t fits all the usage modes. The FastCGI pass is done into a location (not file / file extension!) directive that doesn’t tell anything about the nature of the input. Let’s take a look at this example:

/directory.php/file.php/pathinfo.php

Except nginx, both Apache and lighttpd won’t create a mess out of a path like this. A location directive is a little big vague and no amount of regex will ever fix this. Of course, you can fix it for every single damn virtual host, but then again, can you spell boilerplate? Having different configurations for every virtual host, when clearly for other web servers this isn’t a bundled “feature”, is not fun from the system administration point of view. Usually I generate the virtual host configuration from a bash script I wrote myself. I have configuration templates for all of the applications I administer, thus is all about flags and options, unlike manually writing configuration files. Working around nginx’s inabilities to tell which stuff is which could only mean that I have to write a whole bunch of configuration boilerplate for each type of application. Doesn’t sound like one-size-fits-all anymore. Is that fun? Let me rephrase that: that’s not fun. In case AWS disappears into a black hole, I can recreate everything from scratch in a matter of minutes onto a complete new hosting service while changing Cotendo origins is a child’s play.

Here are a bunch of proposed solutions for something that can turn out into a remote exploit. I’ve being using for quite a while the same solution as provided by one of the people commenting the article:

if (-f $request_filename)
{
    fastcgi_pass php_upstream;
}

Usually because this is more readable than try_files. I usually tend to understand code blocks better.

Of course, a proper PHP script won’t save any uploaded junk to a public accessible location, but what sysadmin trust his coders anyway? I usually don’t. That doesn’t mean that they don’t do a good job, but mistakes happen. I can’t made every living thing to be as paranoid about security as I am. This exploitable situation happens when people validate their upload via the $_FILES array. I have news flash for you: the MIME type defined by the $_FILES array is defined by the browser. The browser does a lousy job at providing a proper MIME. It matches a specific MIME based onto the file extension. PHP file with JPEG extension, anyone? fileinfo would be the proper alternative. PHP should deal with this junk by design, but that’s a whole other joke about the design of PHP.

Getting back to PATH_INFO. The juicy part is that basically you can extract the PATH_INFO from an input path by using fastcgi_split_path_info, but that directive uses … regex. Which brings us to the above statement: no amount of regex will ever fix this crap. Let’s take a look at $request_filename by throwing a custom debug logging configuration that places some stuff into the access_log. Guess what, the $request_filename for the above example is … /directory.php/file.php/pathinfo.php while it’s pretty clear that the actual request filename is /directory.php/file.php. Which is the other broken-by-design thing that nginx features. Q: What damn server side variable would ever lie to your face that a $request_filename is not exactly a file? A: duh!

This doesn’t mean that you should throw nginx and PHP-FPM away and go back crying to Apache. Just simply avoid the PATH_INFO junk. However, even by using my proposed configuration directive aka check if $request_filename is actually a file before doing the FastCGI pass, you can still use fastcgi_split_path_info for a limited amount of work. fastcgi_split_path_info can replace the need for doing URL rewrites by simply using:

if (!-e $request_filename)
{
    rewrite ^ /index.php last;
}

This works for a lot of stull like WordPress, Drupal, or Zend Framework. It works for certain stuff, except the stuff containing .php somewhere. I might want to use /%postname%.php as permalink structure in WordPress. Guess what … with a properly configured nginx (cough!) + the above rewrite rule replacement simply can’t. You have to go back to:

if (!-e $request_filename)
{
    rewrite ^/(.*)$ /index.php?q=$1 last;
}

Which is exactly what I did. For all my apps. Happens to be more deterministic by nature, while I tend to sleep better when I can predict the request pipeline, no matter which is the input junk.

I guess I could give Hiawatha a run which seems to be lightweight enough to support PHP with a threaded architecture. PHP is process based, doing blocking I/O by design, therefore the web server is rarely the actual bottleneck.