jeudi 26 février 2015

AWS/ELB connection draining issues

This question has been asked on the AWS forums without any responses. Below is the original question




Hi!


We are doing rolling upgrades of our API-instances behind an ELB and are seeing alarmingly long times when waiting for the connection draining to finish. The scenario is as follows:


We're running two identical systems, 4x c3.large behind an ELB, one system for dev and one system for production. The only difference between the two systems is that the production system continuously serves requests.


A rolling upgrade on the dev system takes about 3 minutes for all 4 instances when there is no traffic. On the production system these times fluctuate between 6 and 17+ minutes. For reasons we need to do these rolling upgrades on average about 2 times per hour and then 17+ minutes for a rolling upgrade is starting to become a problem.


All our API calls are < 100ms so there is no long running requests that should hold the connection draining back for that long. We have played around with changing the values for both idle timout and connection draining timout on the ELB with no good results.


When lowering the connection draining timeout we're seeing 502 responses from the API since it forceably drops the connections and lowering the idle timeout seems to have no effect.


All in all, we would like to know what can be done to reduce these times. As our requests all are < 100ms it should in theory not take more than a second or two to drain the connections from an instance. Is there something we are missing here?


A last note: We tried turning off connection draining all together and this seemed to work better than lowering the connection draining timout. On average there was only 1 or 2 errors per test run and some runs had no errors. Is this because the response times are so fast? Our responses are also relatively small so it might be possible that the TCP response is saved in the OS output buffer so it can respond even if connection draining is turned off? What is the difference between having connection draining timeout set to 0 and turned off?


Additional info:



  • All traffic is HTTPS

  • SSL termination happens on the instances

  • keep-alive is enabled on nginx (tried to vary the value here too without any results)


Thanks!





Aucun commentaire:

Enregistrer un commentaire