Building Better Test Suites

Ensure your test suite stays maintainable, usable and a joy to work in over the long term

Business Technology Notebook by Goumbik is licensed under CC0 Creative Commons

Here at Collective Idea, we are strong proponents of not only writing tests but of Test Driven Development (TDD), where we write the tests first and then implement after. In our experience, writing tests first goes a long way towards promoting better code design and confidence that the application functions as expected. If tests are written afterward, they often miss important test cases, implement too much at once, or both. We’ve also learned that maintaining an application’s test suite can be more important than maintaining the application itself! We’ve all seen situations where an application has a huge test suite that takes forever to run, has random failures, and no longer inspires confidence. No one plans on building such unmaintainable behemoths, but if no effort is spent to keep the test suite clean, it will rot like any other piece of code.

While I can’t claim to have a one-size-fits-all answer, I do have a number of patterns and practices that I’ve seen and used to help keep maintainable test suites. Specifically, I’ll be going over high-level architecting and organizing of test suites rather than how to use specific testing tools and techniques. Before we get started, though, we need to first figure out how to tell a “good” test suite from a “bad” one. A good test suite provides the confidence to make significant changes to an application and the safety net to know if something broke unexpectedly or unintentionally. In my experience, a good test suite does this by adhering to three primary attributes: it must be Reliable, Comprehensive, and Fast.

Reliable: First, the tests in a test suite must pass, must pass every time they are run, and must fail if the code they cover breaks in unexpected or unintended ways. If a test suite is not regularly passing or has random failures, try to fix those up as soon as possible. A persistently failing test suite is quickly ignored since it provides no confidence that it’s catching actual problems.

Comprehensive: Second, the test suite must cover enough of the application’s functionality to inspire confidence in its protection. Some teams will prefer concrete metrics like percentage of covered code, while others will prefer a focus on the most important features and user flows. The meaning of “comprehensive” will vary for every application and will require each team to figure out their own comfort levels.

Fast: Last, but definitely not least, a test suite needs to be fast. By “fast” I will say that the full test suite should finish in under a minute and individual tests should take, at most, a couple of seconds. Obviously there will be some exceptions, but for a test suite to inspire confidence, developers should want to and be able to run it continuously.

I’d like to clarify that even though these three attributes are ordered here, they are all of equal importance. If you have a reliable and comprehensive, but slow test suite, the test suite will not get run enough, or it will only get run in CI. This will lead to eventual loss of confidence and in many cases reliability will suffer as well. You can have reliable and fast, but if you’re missing comprehensive, there will be no confidence that the test suite will catch problems. Likewise, if the suite is comprehensive and fast, that doesn’t necessarily mean it’s reliable. This situation may sound strange but if the test suite, for example, overuses mocks, it can end up in a situation where the tests are testing implementation. Even worse, it could mean they’re only testing themselves and never actually hitting the application!

Now that we’ve defined what a good test suite is, how do we get there? Regardless if you’re starting fresh or working with an existing test suite, I’d like to argue that focusing on fast, first and foremost, will lead towards the other two aspects better than any other focus you could take. In my experience, nothing destroys confidence and reliability in a test suite more than slow tests. I’ve dealt with test suites that passed the hour mark and I never want to do that again. To understand why I believe fast is the right focus, let’s take a look at the Ideal Test Suite organization (otherwise known as the Test Suite Pyramid):

Collective Idea - the-right-way - test suite.jpg

A majority of tests in any test suite should be unit tests. These are tests that test one single piece of functionality at the lowest possible level, touching as few dependencies as possible. They are also the fastest to run and easiest to write, maintain, and prove correct. However, a test suite of only unit tests, while potentially comprehensive, is rarely reliable. This is because there are no tests confirming that interactions between these units also work, and that’s where the integration layer comes in.

Integration tests ensure that all of the piping between units is set up correctly. In Rails, many consider controller and request tests (using RSpec parlance) to be integration tests. These tests will be slower as they often require more data setup and execute more of the application per test. Given that the units themselves are already well tested, fewer integration tests are required for comprehensiveness, which helps offset the slower speed in keeping the overall test suite fast. But we’re still not really reliable yet, because there’s one more layer that hasn’t been touched: how the user interacts with your application.

You can have a comprehensive, 100% code coverage and fast test suite that’s still unreliable if it all passes when the user experience isn’t hooked up yet. This is where the final part of the pyramid comes into play: end-to-end tests. These tests click through the application like a user, exercising the entire stack to ensure everything from the top to the bottom works. In Rails/RSpec land, these are the feature tests and they are commonly written using tools like capybara and Selenium. These tests are orders of magnitude slower than any unit test and as such there should be as few of them as possible to ensure your desired level of comprehensiveness.

Unfortunately, there is an easy trap to fall into when building out a test suite, and that’s putting too much importance on comprehensiveness. While it’s tempting to want to test every possible user interaction in an application, doing so is just not possible (see J. B. Rainsberger’s post and talk entitled “Integrated Tests are a Scam”). Attempting to do this will quickly lead to a painful, difficult to maintain test suite. That greater-than-an-hour test suite I mentioned before? It was because a majority of the test suite was implemented in the End-to-End layer. Every use case, every error path, every bug was put into an end-to-end test. Unlike the image above, this application’s test suite flipped the sections of the pyramid around.

Collective Idea - the-wrong-way - test suite.jpg

There’s another aspect about fast that doesn’t get as much discussion as it should and that’s how fast developers can confirm their changes didn’t break anything. Yes, the test suite itself should be fast, but developers should also be running the tests that have the greatest chance of catching a problem most often. Or to put it another way, we should be able to run the fastest tests all of the time, the slower tests less often, and the slowest tests can be held until a feature or bug fix is considered complete. This, at least in Rails and RSpec land, has proven to be a bit more difficult and less often practiced, mainly because of (in my opinion) poor decisions and defaults provided by the rspec-rails gem. When using RSpec with Rails, the library encourages developers to put all of their tests in spec/ where every test will run together in the same suite. As such, many Rails test suites end up looking like a test “blob” more than a pyramid.

Collective Idea - the-test-blob - test suite.jpg

This type of test structure negatively impacts the reliability and speed of a test suite for a couple of reasons. First, it’s difficult to run just a certain set of tests because there’s no well defined boundary to denote which are Unit tests, which are Integration, and which are End-to-End. Second, ensuring the full test suite is green takes longer because you have to wait for every type of test to run to know if anything is failing. Third, it’s no longer possible to use some features Rails provides to make tests fast, like fixtures (yes! Fixtures are awesome!). That’s because end-to-end tests can rarely use the same test setup as your unit tests, and instead they rely on database truncation and building new records for every single test.

Even though we at Collective Idea don’t use it anymore, this is one aspect of testing that I believe Cucumber gets right. Cucumber tests live in their own folder (features/) with their own setup and support framework. Additionally, they are run separately from any other test suite. We can do the same with RSpec, but it just takes a little extra work. If we set up a features directory to contain all end-to-end tests, running those tests requires an explicit include:

rspec -Ifeatures features/...

Also, running this test suite along with any others via rake is done with its own task: do |task|
  features_dir = Rails.root.join("features")
  task.pattern = "#{features_dir}/**/*_spec.rb"
  task.rspec_opts = ["-I#{features_dir}"]

And hooking this up to whatever task your CI tool-of-choice runs is also simple:

task ci: [:spec, :features, ...]

With this type of setup, the normal rspec suite will stay super fast and the end-to-end tests are runnable on their own without interfering with any other suite. Don’t be afraid to set up multiple test suites; you will be grateful for the explicit separation.

To recap, a test suite that inspires confidence and is a joy to work with is one that is reliable, comprehensive, and fast. To get there, write a lot of fast unit tests, fewer integration tests, and even fewer end-to-end tests. I’d like to close with one more recommendation:

Fix The Pain!

Above all else, if something you’re building is painful to work with, whether it be in the application or the tests, don’t ignore that pain, fix it! TDD pushes strongly this notion that if a given piece of functionality is difficult to test, then the application’s design has a problem. This applies doubly true for the test suite itself. If the test suite is getting painful, stop, identify the pain, and fix it. Nothing else will improve a code base faster.

Photo of Jason Roelofs

Jason is a senior developer who has worked in the front-end, back-end, and everything in between. He has a deep understanding of all things code and can craft solutions for any problem. Jason leads development of our hosted CMS, Harmony.