Fixing Intermittent Failing Tests

Some tricks to help you fix tests that sometimes fail

Fail Parking Meter by Karl Norling is licensed under CC BY 2.0

I’m deep in a codebase (Rails with RSpec and Cucumber) that has a variety of intermittent test failures. They drive me nuts because it means we can’t fully trust the test suite. Sometimes they pass, sometimes they fail, and we can’t predict what will happen. We get used to seeing broken tests, which means we might miss real problems.

Here are 4 issues I fixed. Hopefully, they’ll help you out if you stumble across this post.

#1 Don’t mess with Time

We’re using the timecop gem to muck with time in our test suite. I realized there was a problem when I found this in an RSpec before(:each):

before(:each) do
  Timecop.return
  ...
end

Uh oh. Someone knew we had some test spillage and figured that if we reset Timecop at the start of the test, we’ll be fine. Sure, but only in these tests which is why we were still seeing intermittent failure.

Instead of this, I switched to an after(:each). I went a step further and put it in my spec_helper.rb to know it will always get reset.

config.after(:each) do
  Timecop.return
end

I’d rather be more precise, but this is a huge test suite so we’ll start by being overzealous and refactor after we have passing tests.

#2 Be careful of class variables

This one is a bit harder to describe without showing pages of code, but I’ll try.

In an RSpec before(:each) we create a few Category records. While these are in the database, they aren’t modified on the server so we had a model that caches some of the IDs as class variables.

def self.special_category_id
  @@special_category_id ||= Category.find_by(name: "Special Category").id
end

Tests were failing due to categories not being associated correctly with other objects.

The class variable gets set once, but the database is cleared out between test cases, and the Category won’t be the same anymore.

In the interest of getting the tests passing, I added an after to clear out the cached variables.

after do
  # Navigator caches the category IDs, but they are different between tests.
  # Unsetting them ensures we get the correct data in every test run.
  Navigator.class_variable_set "@@special_category_id", nil
end

When we get the tests all passing again, we’ll work on a cleaner solution.

#3 Watch out for rogue requests

If you’re testing in the browser (we’re using Capybara with Firefox) you can hit issues where there are still requests floating around after a test has finished. These requests can then hit during another test without the data you expected.

This was happening here. We had some javascript that checks if another user is online. It would hit this action in a controller:

def check_online_status
  @user = User.find_by_id(params[:id])
  render :json => @user.online_status
end

The error we were getting was a NoMethodError on online\_status, which hints that our User isn’t there. As an intermittent test failure, I suspected rogue requests.

In one test, it makes a request for the online status of User X. The test finishes, but the request hits after User X has been deleted from the database, causing the next test to fail with an exception.

I changed the code slightly:

def check_online_status
  @user = User.find_by_id(params[:id]) || User.new(online_status: "unavailable")
  render :json => @user.online_status
end

Now it returns “unavailable” when the user doesn’t exist. Our code will happily move along and the problem is avoided. It also helps us avoid leaking data about which User IDs are actual users.

#4 Use expectations to avoid rogue requests

A final rogue request was a doozy. We had an in-browser test which used Capybara to switch between multiple sessions, so it could simulate two users entering a chat room and interacting. At the end of the test, we had something like this:

[@teacher.username, @student.username].each do |username|
  Capybara.session_name = username
  expect(page).to have_link("Exit Chat")
  click_link "Exit Chat"
end

Unfortunately, this gave another NoMethodError that was harder to track down. It looked like the teacher had the wrong data setup, but only sometimes.

Finally, we realized that the click\_link "Exit Chat" wasn’t finishing before the loop continued. It was redirecting, and the redirected URL was hitting as the student, not the teacher.

A temptation is to put a sleep(5) here, which fixes the bug, but we can do better.

The solution is to make an expectation for what the page should show, that way we know the redirection is finished before we continue with the next session.

[@teacher.username, @student.username].each do |username|
  Capybara.session_name = username
  expect(page).to have_link("Exit Chat")
  click_link "Exit Chat"
  expect(page).to have_content("Come back soon!")
end

Summary

Phew, lots of crazy intermittent failing tests, but we have them fixed now. These are annoying because the test that’s the problem usually isn’t the one that’s failing.

Hopefully, some of these tricks will help you. If you have other techniques, leave them in the comments!

About Daniel Morrison

Daniel founded Collective Idea in 2005 to put a name to his growing and already full-time freelance work. He works hard writing code, teaching, and mentoring.

Comments

Michael
May 27, 2015 at 21:54 PM

Thanks guys, love your work! I’ve found setting expectations on page content to be a good way to ensure everything’s finalised with a test before it moves on. Thanks for the other ideas!

By Daniel Morrison

May 26, 2015