Running HAProxy behind Amazon's ELB

By Nayu Kim - http://www.flickr.com/photos/nayukim/5704132134/, CC BY 2.0, https://commons.wikimedia.org/w/index.php?curid=15797492

Amazon’s Elastic Load Balancer (ELB) is a great tool for serving traffic across availability zones in an AWS region. It can do SSL termination, can handle any amount of traffic (in a ramp-up-over-time format), and your only variable cost is the bandwidth in question. However, it is a complete black box with no ability to shape or manipulate traffic for your application’s needs (say, to send DDoS traffic into oblivion instead of blasting your web servers). For this reason it can make sense to run your own load balancer, such as HAProxy, right behind ELB.

In general, configuring HAProxy to sit behind ELB is no different than any other hosting setup but there are two settings that can improve operational efficiency and prevent some common problems.

Monitoring

We are big fans of Passenger for serving our Rails applications but until the release of Passenger Enterprise there was always a few second delay on deploys where Passenger shut down existing processes before starting up the new code, leading to queued requests waiting for the new processes to boot. When we had ELB’s Health Check hitting a direct application URI, this pseudo downtime was registering as full downtime in ELB, removing the servers from the rotation, leading to actual downtime or reduced capacity until ELB picked up a valid request again.

To work around this, we moved the monitoring URI to HAProxy itself using the monitor-uri option, and updated ELB to Health Check against this URI. To ensure that we aren’t ever serving requests to a HAProxy server with no valid backends in a given availability zone (unless everything is down), we configure every HAProxy server to talk to every Web server in our EC2 account, but we increase the weight of servers in different availability zones. For example, for a HAProxy box in us-east-1a we have the following:

backend 
  server ec2-us-east-1a.server:80 weight 25
  server ec2-us-east-1c.server:80 weight 100

With this every level of the application in question has redundancy. From availability zones to ELB to HAProxy to our app servers, every layer can survive a failure without taking the application down.

Timeouts

In this setup, ELB is always going to be your “client” to HAProxy and if your timeout client value is too small you are going to see a massive amount of <BADREQ> <NOSRV> errors flooding your HAProxy logs. It turns out that the monitoring side of ELB likes to open up a couple of sockets to the backend servers without sending any data down those sockets. These sockets have a timeout of 60s and reconnect immediately upon closing. We previously had a four second timeout on these connections. This resulted in logs getting spammed with the following:

http http/ -1/-1/-1/-1/4000 408 212 - - cR-- 17/17/0/0/0 0/0 {} ""
http http/ -1/-1/-1/-1/4001 408 212 - - cR-- 16/16/0/0/0 0/0 {} ""
http http/ -1/-1/-1/-1/4001 408 212 - - cR-- 17/17/0/0/0 0/0 {} ""
http http/ -1/-1/-1/-1/4049 408 212 - - cR-- 16/16/0/0/0 0/0 {} ""
http http/ -1/-1/-1/-1/4001 408 212 - - cR-- 17/17/0/0/0 0/0 {} ""
http http/ -1/-1/-1/-1/4001 408 212 - - cR-- 17/17/0/0/0 0/0 {} ""

The -1/-1/-1/-1/4001 408 is saying that the request timed out at 4001ms, and haproxy returned a 408 Request Timeout. The cR-- section tells us that HAProxy force closed the connection itself (c) while waiting for headers that never came (R). You can read more about the log format at the official documentation’s Logging page.

After changing the timeout to 60 seconds, we saw the following drop in overall log volume (courtesy of Loggly):

These few changes have given us a very reliable server setup and is working to keep all logging as relevant as possible, and with HAProxy in the mix we have even more control over application scalability on top of what EC2 already provides.

Photo of Jason Roelofs

Jason is a senior developer who has worked in the front-end, back-end, and everything in between. He has a deep understanding of all things code and can craft solutions for any problem. Jason leads development of our hosted CMS, Harmony.

Comments

  1. July 06, 2013 at 10:29 AM

    Thanks for the post, definitely kickstarted me on the right path. I should note that once I got rid of the “cR” NOSRV lines due to HAProxy cutting the request off, I ended up with a bunch of different “CR” NOSRV lines due to ELB cutting off its own connections after 60 seconds.

    Adding “option dontlognull” squelched these, and the haproxy logging is clean as expected now.

  2. anon@aol.com
    anon
    January 22, 2014 at 14:48 PM

    You deffently help me to clear this error 408
    many thanks to you

  3. March 27, 2018 at 6:00 AM

    I have get this in apache2 domlog
    xxx.xxx.xx.xx - - [xx/Feb/xxxx:03:27:05 +0100] “-“ 408 - “-“ “-“
    and for them same ip
    error:
    (36)File name too long: [client xxx.xxx.xx.xx:31197] AH00127: Cannot map GET

    i got refrence : https://serverfault.com/questions/485063/getting-408-errors-on-our-logs-with-no-request-or-user-agent
    but i am not much experience in this.
    what should i do?