Sunday evening after migrating the MMBA Forum to a new webserver I received email from a user claiming that they were unable to sign up for the forum, receiving an 500 Internal Server Error some time after clicking submit. The problem ended up being the signup page taking longer than expected to run and timing out and was resolved by increasing the timeout by adding -idle-timeout 60 to the FastCgiExternalServer line in the vhost’s config.
More specifically, I’d just moved from an older server running lighttpd to a new one using the venerable Apache HTTP Server v2.2. Both setups had per-vhost FastCGI setups pointing to PHP instances running as the user who owned the vhost, which helps ensure that compromised PHP apps affect only files/sites owned by that the user†.
For example, lighttpd would be set up something like this:
fastcgi.server = ( ".php" =>
( "socket" => "/var/run/php-fastcgi/username/username-php-fastcgi.sock",
"check-local" => "disable",
"broken-scriptfilename" => "enable"
)
)
Apache uses something like this:
FastCgiExternalServer /var/run/php-fastcgi/vhosts/example.com -socket /var/run/php-fastcgi/users/username/username-php-fastcgi.sock
AddHandler php-fastcgi .php
Action php-fastcgi /php-fastcgi
Alias /php-fastcgi /var/run/php-fastcgi/vhosts/example.com
During the forum signup, to help cut down on the number of spammy accounts created, there are both reCAPTCHA and DNS Blacklist checks that occur before the account creation actually happens. These were taking longer than the default 30 second timeout, causing the FastCGI interface to time out and close the connection, resulting in log entries such as this:
[Sun Apr 15 20:00:09 2012] [error] [client 192.168.0.2] FastCGI: comm with server "/var/run/php-fastcgi/vhosts/mmba.org" aborted: idle timeout (30 sec)
This led me to increase the FastCgiExternalServer timeout in mod_fastcgi by adding -idle-timeout 60, doubling it from its default, as follows:
FastCgiExternalServer /var/run/php-fastcgi/vhosts/example.com -socket /var/run/php-fastcgi/users/username/username-php-fastcgi.sock -idle-timeout 60
AddHandler php-fastcgi .php
Action php-fastcgi /php-fastcgi
Alias /php-fastcgi /var/run/php-fastcgi/vhosts/example.com
The problem then went away.
I’m not exactly sure why this cropped up with the move to Apache, but I suspect that on lighttpd there was a considerably longer default timeout. This can be set in the lighttpd config by setting idle-timeout, but I wasn’t able to easily figure out what the default is. It’s possible I’ll have to further tune this further in the future, but at least I now know why the problem was occurring.
† Yes, I know this isn’t a perfect solution, but it’s been proven to work when sites are compromised by automatic tools that attempt to change/delete all they can. In each case that I’ve experienced the damage has typically been limited to content in that user’s home directory. This would not be good mitigation against something which attempted privilege escalation once on the box, went after the httpd itself, etc.