Collective Idea

Collective Idea Logo

Jason Roelofs

Running HAProxy behind Amazon's ELB

By Jason Roelofs on January 11, 2013 in amazon, aws, elb, haproxy, load balancer, and ops

Amazon’s Elastic Load Balancer (ELB) is a great tool for serving traffic across availability zones in an AWS region. It can do SSL termination, can handle any amount of traffic (in a ramp-up-over-time format), and your only variable cost is the bandwidth in question. However, it is a complete black box with no ability to shape or manipulate traffic for your application’s needs (say, to send DDoS traffic into oblivion instead of blasting your web servers). For this reason it can make sense to run your own load balancer, such as HAProxy, right behind ELB.

In general, configuring HAProxy to sit behind ELB is no different than any other hosting setup but there are two settings that can improve operational efficiency and prevent some common problems.

Monitoring

We are big fans of Passenger for serving our Rails applications but until the release of Passenger Enterprise there was always a few second delay on deploys where Passenger shut down existing processes before starting up the new code, leading to queued requests waiting for the new processes to boot. When we had ELB’s Health Check hitting a direct application URI, this pseudo downtime was registering as full downtime in ELB, removing the servers from the rotation, leading to actual downtime or reduced capacity until ELB picked up a valid request again.

To work around this, we moved the monitoring URI to HAProxy itself using the monitor-uri option, and updated ELB to Health Check against this URI. To ensure that we aren’t ever serving requests to a HAProxy server with no valid backends in a given availability zone (unless everything is down), we configure every HAProxy server to talk to every Web server in our EC2 account, but we increase the weight of servers in different availability zones. For example, for a HAProxy box in us-east-1a we have the following:


backend 
  server ec2-us-east-1a.server:80 weight 25
  server ec2-us-east-1c.server:80 weight 100

With this every level of the application in question has redundancy. From availability zones to ELB to HAProxy to our app servers, every layer can survive a failure without taking the application down.

Timeouts

In this setup, ELB is always going to be your “client” to HAProxy and if your timeout client value is too small you are going to see a massive amount of <BADREQ> <NOSRV> errors flooding your HAProxy logs. It turns out that the monitoring side of ELB likes to open up a couple of sockets to the backend servers without sending any data down those sockets. These sockets have a timeout of 60s and reconnect immediately upon closing. We previously had a four second timeout on these connections. This resulted in logs getting spammed with the following:


http http/<NOSRV> -1/-1/-1/-1/4000 408 212 - - cR-- 17/17/0/0/0 0/0 {} "<BADREQ>"
http http/<NOSRV> -1/-1/-1/-1/4001 408 212 - - cR-- 16/16/0/0/0 0/0 {} "<BADREQ>"
http http/<NOSRV> -1/-1/-1/-1/4001 408 212 - - cR-- 17/17/0/0/0 0/0 {} "<BADREQ>"
http http/<NOSRV> -1/-1/-1/-1/4049 408 212 - - cR-- 16/16/0/0/0 0/0 {} "<BADREQ>"
http http/<NOSRV> -1/-1/-1/-1/4001 408 212 - - cR-- 17/17/0/0/0 0/0 {} "<BADREQ>"
http http/<NOSRV> -1/-1/-1/-1/4001 408 212 - - cR-- 17/17/0/0/0 0/0 {} "<BADREQ>"

The -1/-1/-1/-1/4001 408 is saying that the request timed out at 4001ms, and haproxy returned a 408 Request Timeout. The cR-- section tells us that HAProxy force closed the connection itself (c) while waiting for headers that never came (R). You can read more about the log format at the official documentation’s Logging page.

After changing the timeout to 60 seconds, we saw the following drop in overall log volume (courtesy of Loggly):

These few changes have given us a very reliable server setup and is working to keep all logging as relevant as possible, and with HAProxy in the mix we have even more control over application scalability on top of what EC2 already provides.

By Jason Roelofs on January 11, 2013 in amazon, aws, elb, haproxy, load balancer, and ops

2 Comments

  1. Scott Klein

    Scott Klein July 06, 2013 http://https://www.statuspage.io

    Thanks for the post, definitely kickstarted me on the right path. I should note that once I got rid of the “cR” NOSRV lines due to HAProxy cutting the request off, I ended up with a bunch of different “CR” NOSRV lines due to ELB cutting off its own connections after 60 seconds.

    Adding “option dontlognull” squelched these, and the haproxy logging is clean as expected now.

  2. anon

    anon January 22, 2014

    You deffently help me to clear this error 408
    many thanks to you

Post a Comment

Contact Us

Find us on Google Maps
Collective Idea
44 East 8th Street, Suite 410
Holland, Michigan 49423 USA 42.790334-86.105251

Follow us on the Interwebs

We are currently available for medium and long term projects. Please get in touch if we can be of service.