Optimizing Rails for Memory Usage Part 1: Before You Optimize

This is part one in a four-part series on optimizing a potentially memory-heavy Rails action without resorting to pagination. The posts in the series are:

Part 1: Before You Optimize

Part 2: Tuning the GC

Part 3: Pluck and Database Laziness

Part 4: Lazy JSON Generation and Final Thoughts

We recently built an API server for a mobile application that had an interesting requirement: the mobile application needed to work offline. To support this, we built an API action that can generate a dump of a user’s records from the database. The action constructs a very large JSON response. To keep the action atomic we did not use pagination.

We quickly ran out of memory on Heroku, and the memory was not reclaimed after the dump action finished. The dyno stayed over Heroku’s memory limit with degraded performance until Heroku’s daily dyno restart.

Because we ran into problems almost immediately, it suggested to us that we should fix the memory needed by our application rather than trying to work around the problem. This series is what I wish had been written when I began to optimize our memory usage.

These posts describe my recommended steps to optimize the memory usage of a Rails app. Even though it is inspired by our particular problem with one large JSON index action, many of these steps apply broadly to optimizing any Ruby memory issue.

Parts 2-4 will discuss various optimization strategies. However, before you actually optimize you need to do a couple things first: verify that you really need to optimize and then set up metrics.

Do You Need to Optimize?

The first rule of optimization is, “Don’t do it.” Of course, experienced developers still optimize from time to time because “don’t do it” is really shorthand for several warnings:

Optimization may not be necessary.
Optimization may not be feasible and will waste your time.
You are going to be tempted to optimize the wrong thing.
Even successful optimization can make your code harder to understand.

Consequently, before considering optimization ask these questions:

Do you even need to optimize? Is there an actual pain point, or just a speculated pain point? If it’s only speculation, don’t optimize. YAGNI it and work on more important functionality.
Do you know the root of the problem? If not, find it. Otherwise you will guess and probably guess wrong.
Is there a clean way to solve the problem? Can it be solved at another layer? Think creatively. You want to keep your application code clean. For example, to avoid the complexities of optimizing, intermittent memory problems might be solved by setting up a worker killer on your server to restart any Ruby process that starts using too much memory. Search the netz for <my server here> worker killer. Restarting workers is a band-aid, but if your memory problems are minor it may effectively defer those problems for a long time. YAGNI.

So you’ve asked yourself these questions and discovered there’s no other way out. You know it’s time to optimize. The next pre-optimization step is to set up metrics.

Set Up Metrics

Pink and plaid do not go together, so if you get dressed in the dark you may surprise your coworkers when you walk into the office. Similarly, you can’t optimize code in the dark: you need numbers to show that you are making progress. It’s time to set up metrics.

You need to measure the memory used after tens or hundreds of requests. One request is not sufficient because Ruby’s memory allocation isn’t perfect. Even if your app is not technically leaking memory there may be fragmentation that grows the memory usage over time.

Use this template script to measure the total memory usage of your Rails process after 30 requests. Modify the script as necessary for your setup.

The script only measures the resident set size of the Rails application. The script will show you if you are actually making improvements, but you will need other tools to discover where exactly all the memory is coming from in your application. We will mention some of those tools in part 3, but first you should do something easier: optimize Ruby’s garbage collection parameters.

On to Part 2: Tuning the GC →

About Brian Hempel

Brian worked with us long before he came on full-time, and had we seen the baby face lurking beneath his programmer beard, we probably wouldn’t have assumed he was as smart. He proved quickly that he has earned the beard, both as a graduate of Michigan Tech in Bioinformatics and Biochemistry/Molecular Biology, and as an experienced coder who picks up new tools quickly.

An occasional violinist and lover of birds, Brian is a cheerful addition to the office.

Comments

Kenn
February 27, 2015 at 12:44 PM

I thought you might be interested: https://github.com/kenn/memstat

It’s important to consider CoW (copy-on-write) memory optimization for forking servers like Unicorn, where PSS could be a more important metric than RSS. However Ruby 2.0’s Bitmap GC was not as CoW-friendly as we hoped, but Ruby is continuously improving. So…

The crux of the idea is that it’s best to take a look at the memory usage from higher / OS level rather than from internal / user-land. Messing with the GC params is a black magic and shouldn’t be recommended for production use IMO… :)
Michael Joseph Cohen
March 02, 2015 at 16:38 PM

I get that there’s a lot of useful things to share in this regard, but in this case it seems that using postgres to generate the JSON might have saved y’all a bunch of work.

https://github.com/dockyard/postgres_ext-serializers can do this if you’re using ActiveModelSerializers or it can be done by hand fairly easily, as well.
Brian Hempel
March 04, 2015 at 14:56 PM

@Kenn: Cool gem. Are you using it in production? My experience is that memory usage can grow over time regardless of GC, so I would expect that having a fixed memory threshold above which the GC is triggered will cause GC to occur more and more often as the server process ages. Do you see that behavior?

PSS can be a better metric. I do not believe it is supported on Mac OS X, however. As you noted, PSS shouldn’t give substantially different results compared to RSS.

While GC tuning might be a sort of magic, I’m surprised to hear it called a “black” magic. Is there a reason? Ruby has a long history of GC parameter tuning in production. See REE for example: http://www.rubyenterpriseedition.com/
Brian Hempel
March 04, 2015 at 15:23 PM

@Michael Joseph Cohen: Thanks for your comment. When I began the memory optimization, I hoped there would be one single, simple fix that would get us where we needed to be. Instead, I discovered that a lot of smaller changes added up. JSON caching helped significantly, but accounted for less than half of the memory usage improvement. Caching may have helped more if our action was a simple list view, but our action had a sort of git diff functionality to it that required some fancy database access.
Kenn
March 15, 2015 at 10:31 AM

@Brian Hempel Thanks for your reply.

> Are you using it in production? My experience is that memory usage can grow over time regardless of GC, so I would expect that having a fixed memory threshold above which the GC is triggered will cause GC to occur more and more often as the server process ages. Do you see that behavior?

Yes and Yes. Whatever we do to optimize the GC behavior, it just slows down the heap growth, not stop. So we need the unicorn-worker-killer gem, too. But even that is not enough - when the memory pressure is tight on the system (e.g. a small DigitalOcean instance that runs Unicorn together with MySQL / Redis, etc.) any processes can be killed by the OOM-killer, so it’s a good idea to add the following as the last line of defense:

after_fork do |server, worker|
# Children should die first when OOM kicks in
File.write “/proc/#{Process.pid}/oom_score_adj”, ‘999’
end

Note that we let worker processes get killed first than manually whitelisting processes like mysqld, redis-server or sshd, as it’s prone to error.

> I’m surprised to hear it called a “black” magic. Is there a reason?

It’s about the sheer number of configurables. If it was like “for web apps use config A, for batch apps use config B” kind of setting, I was definitely ok.

Even though I have the personal interest in playing around with GC params, optimal configuration at one point does not mean it’s also optimal 12 month later, and it’s possible that I won’t be working on the project - I wouldn’t expose those kind of complexity as part of the project scope.

So if you have great Ruby devs in the team, it should be totally fine - I just suppose that’s a rather rare occasion than typical. :)
Coolman
December 21, 2016 at 23:49 PM

Cool