Hi,
We have an issue when using IIS with FastCGI under load of post requests within our application.
After troubleshooting the issue, we have narrowed down the cause and have determined that under the specific conditions of large number of concurrent requests with large post data, IIS is failing to comply the with FCGI protocol as defined athttp://www.fastcgi.com/drupal/node/6?q=node/22
To reproduce the issue in IIS:
- Download the fastcgi developer kit at http://www.fastcgi.com/dist/fcgi.tar.gz.
- Compile the fastcgi library and the echo example, which is part of the fastcgi developer kit.
- Setup a site within IIS and configure this site to use the echo fastcgi app you compiled in the step above.
- Configure the fastcgi app to have 'Instance MaxRequests' of a very large number, say 10 million, and the 'Max Instances' to have a value of 2.
- Test your fastcgi IIS handler by loading the page in your browser. It should return an html page which says:
FastCGI echo
Request number 1, Process ID: nnnn
etc.
- Use any web server load testing which is capable of sending post requests, and ensure your post data is greater than the standard fastcgi packet size, we used approximately 2600 bytes of post data, and apache bench under linux with the following command:
ab -n 100000 -c 20 -k -p postdata.txt http://yourserver/yoursite
where the file postdata.txt had 2600 bytes of text data in it.
The above load test will do a total of 100000 requests, with a concurrency of 20 requests outstanding at any point. Since IIS is configured for a maximum of 2 process instances, this means that requests will queue within IIS, which exposes the issue. If the concurrency is less that the 'Max Instances' setting, the issue does not surface readily.
The ab tool reports a large percentage of failed requests when performing this test. Using wireshark to analyse the network packets (with TCP/IP protocol setup under the IIS fastcgi 'Advanced Settings'), we were able to track the cause of these failed requests to IIS not sending the terminating FCGI_STDIN packet of zero length. IIS sends the nextrequestId, and the fastcgi console process ignores all later requestId's, waiting for the terminatingFCGI_STDIN packet. Eventually, the fcgi consol process times out due to no respose, and the request fails.
In our testing, we found that faster computers, like an i7, did not readily reproduce the error, where as lower end computers show the error very rapidly under load.
We also found that by putting a busy waiting loop which took at least 300 milliseconds in the echo.c fastcgi loop, that even fast computers failed readily.
Hopefully this is sufficient detail to allow someone at IIS to reproduce the issue, and release a fix.
Yours Sincerely,
Stuart Smith
Striata