I'm using Nginx to forward requests to two Thin web server processes on ports 5000 and 5001. Everyone once in a while one of the Thin processes will stop responding to requests and Nginx will spit out the following error.
2014/11/28 21:40:05 [error] 21516#0: *1458 upstream timed out (110: Connection timed out) while reading response header from upstream, client: X.X.X.X, server: www.X.com, request: "HEAD / HTTP/1.1", upstream: "http://127.0.0.1:5001/", host: "www.example.com", referrer: "http://www.example.com/"
Thin will go out for a couple of minutes and start responding again on its own. When that one Thin process is in a frozen state it will also not respond to wget (e.g. wget http://127.0.0.1:5000) nor something like a request from Python (e.g. requests.get('http://127.0.0.1:5000')).
I set up three machines: two on Google Compute Engine - Debian 7.7 and Ubuntu 14.04 - and one AWS instance - Ubuntu 14.04. This error only happens on Google Compute Engine - Amazon Web Services does not have the same problem.
The software on all machines is as close to identical as can be. All operating systems are completely up to date through apt-get and the project is pulled from the same Git commit. I use the same deployment method on all three machines and they are all using the same Google Cloud SQL service.
I'm using Thin 1.5.1, Ruby 1.9.3-p448, and Rails 3.2.11. Updating Thin to 1.6.3 did not make a difference.
I setup a Python script to detect time outs and restart the offending Thin process. It makes a request to each Thin process on ports 5000 and 5001 and waits 5 seconds for a response. However, simply running the script makes the time outs happen much less frequently. I haven't run it long enough to say it completely makes time outs go away but I will easily get 24 hour periods of no time outs running the script where as time outs happen at least once every two hours (usually much more) without the script.
Aucun commentaire:
Enregistrer un commentaire