Programming Tech Industry

The How and Why of End-to-End Testing

Perhaps the most significant and under-appreciated aspect of Rails and Agile software development of the last roughly 15 years is the culture and discipline around testing and test-driven development.

I’ve never come to understand why testing and TDD is often maligned by the loudest, most vocal developers: It’s too slow, it takes longer, the boss didn’t want or ask for it, they’ll say.

You don’t hear about these developers or this ideology often in professional circles, but you encounter them quickly in the wild west of freelance development.

[SEIZURE WARNING: There is an animated GIF towards the bottom of this page that flashes. Please disable animated GIFs if you are susceptible to flashing lights.]

Indeed, much of the popular and discussed rhetoric in the community of Rails is about a codebase’s test suite (a suite of tests for your whole application, this collective is called “the tests” or “the specs”): How much of your codebase is covered (as measured in %, which I discuss further below)? How easy are the tests to write? Do they use factories or fixtures? How brittle are they? Do they test the right things and are they valuable?

All of these are correct questions. Although there is no substitute for the day-in-day-out practice of this to become great at testing, I will try to offer some broad ‘best practice’ answers to these questions.

The enlightened developers don’t ask or care about whether or not the boss told us to write a tested codebase. We just know the answers to the above questions and do what’s right for the codebase: write specs.

Testing has varying degrees, varying methods, varying strengths.

In 99 Bottles of OOP, Metz, Owen, and Stankus make this interesting observation:

Belief in the value of TDD has become mainstream, and the pressure to follow this practice approaches an unspoken mandate. Acceptance of this mandate is illustrated by the fact that it’s common for folks who don’t test to tender sheepish apologies. Even those who don’t test seem to believe they ought to do so.

(Metz, et al, 99 Bottles of OOP, Second Edition, 2020. p 43)

So testing exists in a murky space: The top dev shops and teams know it is essential, but its implementation is inconsistent. Sadly, I’ve seen lots of development happen where people either just don’t write tests, write tests blindly, use tests as a cudgel, or skip end-to-end testing altogether.

Many years in this industry have led me to what seems like an extreme position. Not writing tests should be seen as akin to malpractice in software development. Hiring someone to write untested code should be outlawed.

Having a tested codebase is absolutely the most significant benchmark in producing quality software today. If you are producing serious application development but you don’t have tests, you have already lost.

Having a good test suite is not only the benchmark of quality, it means that you can refactor with confidence.

There are two kinds of tests you should learn and write:

  • Unit testing (also called model testing or black-box testing)
  • End-to-end testing (also called integration testing, feature testing, or system tests)

These go by different names. Focus on the how and why of testing and don’t get lost in the implementation details of the different kinds of tests. (To learn to do testing in Ruby, you can check out my course where I go over all the details.)

Unit Testing

Unit tests are the “lowest-level” tests. In unit testing, we are testing only one single unit of code: Typically for Rails, a model. When we talk about Unit testing in other languages, it means the same as it does for Rails, but might be applied in other contexts.

The thing you are testing is a black box. In your test, you will give your black box some inputs, tell it to do something, and assert that a specific output has been produced. The internals (implementation details) of the black box should not be known to your unit test.

This fundamental tenet of unit testing is probably one of the single most commonly repeated axioms of knowledge in software development today.

The way to miss the boat here (unfortunately) is to follow the axiom strictly but misunderstand why you are doing it.

Testing, and especially learning to practice test-driven development (that’s when you force yourself not to write any code unless you write a test first), is in fact a lot deeper and more significant than just about quality, refactoring, and black boxes. (Although if you’ve learned that much by now you’re on the right track.)

Most people think that software, especially web software, is written once and then done. This is a fallacy: Any serious piece of software today is iterated and iterated. Even if you are writing an application for rolling out all at once, on the web there should always be a feedback loop.

Perhaps one of the worst and most problematic anti-patterns I’ve ever seen is when contractors write code, it is deployed, and nobody ever looks at any error logs. Or any stack-traces. Or even at the database records. (Typically this happens less in the context of companies hiring employees because employees tend to keep working for your company on an ongoing basis whereas contractors tend to ‘deliver’ the product and then leave.)

It’s not just about “catching a bug’” here or there. Or tweaking or modifying the software once it’s live. (Which, to be fair, most developers don’t actually like to do.)

It’s about the fact that once it is live, anything and everything can and will happen. As a result, the data in your data stores might get into all kinds of states you weren’t expecting. Or maybe someone visits your website in a browser that doesn’t support the Javascript syntax you used. Or maybe this, or maybe that. It’s always something.

This is the marriage of testing & ‘real life’: You want your tests to be ‘as isolated’ as possible, yet at the same time ‘as realistic’ as they need to be in order to anticipate what your users will experience.

That’s the right balance. Your code doesn’t exist in a vacuum, and the test environment is only a figment of your imagination. The unit test is valuable to you because it is as realistic as it needs to be to mimic what will happen to your app in the real, wild world of production.

With unit testing, you aren’t actually putting the whole application through its places: You’re just testing one unit against a set of assertions.

In the wild (that is, real live websites), all kinds of chaos happens. Your assumptions that user_id would never be nil, for example, proves out not to be the case in one small step of the workflow because the user hasn’t been assigned yet. (Stop me if you’ve heard this one before.)

You never wrote a spec for the user_id being nil, because you assumed that that could never happen. Well, it did. Or rather, it might.

Many developers, especially the ones with something to prove, get too focused on unit testing. For one thing, they use the percentage of codebase covered as a badge of honor.

Percentage of Codebase Covered

When you run your tests, a special tool called a coverage reporter can scan the lines of code in your application to determine if that line of code was run through during your test. It shows you which lines got the test to run over them and which lines were ‘missed.’

It doesn’t tell you if your test was correct, that it asserted the correct thing of course. It just tells you where you’ve missed lines of code. The typical benchmark for a well-tested Rails application is about 85–95% test coverage. (Because of various nuanced factors, there are always some files that you can’t or don’t need to test— typically not your application files.)

Here I use a tool in Ruby called simplecov-rcov to show which lines (precisely, line-by-line, and file-by-file) are covered. Here in this baby little project of mine, I have an unfortunate 36.55% of my codebase covered:

example coverage report

As you see, the files are sorted with the least covered files shown up top. The top files are in red and say “0.00 %” covered because the test suite does not go into that file.

When I click into the file, I can actually see which lines are covered and uncovered, in red & green like so:

Code coverage report showing an untested line of Ruby

(Here’s a great example of that “it only happens in the wild” thing I was talking about earlier. In theory, I should never get passed a room id (params[:room]) that is not in my database [see line 4], but in practice, for some reason, while I was debugging I did. So I added a small little guard to catch for this while debugging, thus making the line of code inside the if statement uncovered by my test suite.)

Correlating the total percentage of test coverage to your code quality and/or value of the tests is often a fallacy: Look at the percentage of codebase covered, but not every day.

The problem with over-emphasis on unit testing is the dirty little secret of unit testing: Unit tests rarely catch bugs.

So why do we unit test at all then? Unit tests do catch all of your problems when you are upgrading.

You should unit test your code for the following four reasons:

(1) It helps you think about and structure your code more consistently.

(2) It will help you produce cleaner, more easily reasoned code as you refactor.

(3) Refactoring will, in turn, reveal more about the form (or shape) of your application that you couldn’t realize upfront.

(4) Your unit tests will catch bugs quickly when you upgrade Rails.

That’s it. Notice that not listed hereis ‘catching regressions’ (or bugs). That’s significant because many developers think unit testing cover all of their bases. Not only do they not cover all of your bases: They don’t even catch or prevent regressions (bugs) in live production apps very often.

Testing is important. Unit testing and end-to-end testing are both important, but between the two, end-to-end testing is the most important of all.

End-To-End Testing

End-to-end testing goes by many names: System specs, integration specs, Capybara, Cypress, Selenium.

End-to-end testing for Javascript applications means the following things:

  1. Your test starts in the database. I like factories, but fixtures are also popular.
  2. Your test ‘hits’ the server (Rails, Node, Java, etc)
  3. The server returns a front-end in Javascript
  4. Your test interacts in Javascript with your web page

If you do not have all four of those components, you do not have end-to-end testing. Using Capybara, you are really doing all of these things.

If you’ve never seen a Capybara test run, here’s what it looks like:

Moving Visualization of Selenium testing
A moving visualization showing a Selenium suite running in a Rails application.

I like to show this to people because I don’t think many people see it. Often the specs are run in headless mode, which means those things are happening just not on the screen. (But you’re still really doing invisibly them which is the important part.) While headless mode is much faster (and typically preferred by developers), using Selenium to control a real browser is an extraordinarily powerful tool— not for the development, but for evangelizing these techniques and spreading the good word of end-to-end-testing.

Most non-developers simply don’t even know what this is. I’ve talked to countless CEOs, product people, people who’ve worked in tech for years and have never even seen an end-to-end test be run. (They’ve literally never witnessed with their own eyes what I’ve shown you in the animated GIF above.)

What these people don’t even understand is that TDD, and end-to-end testing, is a practice of writing a web application development that is itself an advancement. The advancement facilitates a more rapid development process, less code debt, and a lower cost of change.

Without having actually witnessed the test runner run against the browser, it is shocking to me how many people in positions of authority are happy to hire teams of QA people to do manual testing for every new feature or release. (Disparigingly called “monkey testing” by the code-testing community.) With the easy and “inexpensive” availability of remote QA people, an industry of people are happy to keep monkey testing until judgment day. What they don’t know is that those of us who are code-testing are already in the promised land of sweet milk and honey.

My biggest disappointment personally moving from Rails to the Javascript world (Vue, Ember, Angular, React) is the lack of end-to-end-testing in Javascript. It’s not that JSers don’t ever do end-to-end testing— it’s that it might not be possible in your setup or your team.

If you are only working on the frontend, by definition you don’t have access to the database or the backend.

The fundamental issue with the shift away from Rails monoliths and towards microservices is: How are these apps tested?

I don’t know about you, but after years of being a user of microservices, I’m not entirely sold.

Don’t get me wrong: I am not categorically opposed to microservices. (Your database, and Redis, both probably already in your app, could be thought of as microservices and they work very well for us Rails developers.)

But designing applications around microservices is a paradigm ideal for huge conglomerate platforms that simultaneously want to track youshow you ads, and curate massive amounts of content using algorithms.

Most apps aren’t Facebook. I hypothesize that the great apps of the 2020s and 2030s won’t be like Facebook either.

That’s why having the power to do database migrations without involving “a DBA” (or a separate database team), or having to get the change through a backend team— something which is normal for smaller startups and Rails — has been so powerful for the last 15 years.

The social media companies are well poised for leveraging microservices, but most small-medium (even large) Rails apps are not, and here’s why: Doing end-to-end testing with a suite of microservices is a huge headache.

It’s a lot of extra work, and because it’s so hard many developers just don’t do it. Instead, they fall back lazily to their unit testing and run their test coverage reports and say they have tested code. What? The API sent a field to the React Native app that it couldn’t understand so there’s a bug?

Oh well, that was the React Native developer’s problem. OR, that was the services layer problem.

It’s a slow, creeping NIMBY (not-in-my-backyard) or NIH (not-invented-here) kind of psychology that I see more and more as I learn about segregated, siloed teams where there’s a backend in Rails, a frontend in React or another JS framework, and a mobile app — all written by segregated, separated teams who need to have product managers coordinate changes between them.

Already we see lots of major companies with websites made up of thousands of microservices. I don’t think our web is better because of it: for me, most of my experience using these websites is spinning and waiting to load. Every interaction feels like a mindless, aimless journey waiting for the widget to load the next set of posts to give me that dopamine-kick. Everywhere I look things kind of work, mostly, but every now and then just sort of have little half-bugs or non-responses. It’s all over Facebook, and sadly it seems like more and more of the web I use this degradation in experience quality has gotten worse and worse over the last few years.

It’s a disaster. I blame microservices.

I hear about everybody rushing into mobile development or Node development or re-writing it all in React and I just wonder: Where are the lessons learned by the Rubyists of the last 15 years?

Does anyone care about end-to-end-testing anymore? I predict the shortsightedness will be short-lived, and that testing will see a resurgence of the importance of popularity in the 2020s.

I don’t know where the web or software will go next, but I do know that end-to-end testing, as pioneered by Selenium in the last 10 years, is one of the most significant stories to happen to software development. There will always be CEOs who say they don’t care about tests. Don’t listen to them (also, don’t work for and don’t fund them). Keep testing and carry on.

[Disclaimer: The conjecture made herein should be thought of in the context of web application development, specifically modern Javascript apps. I wouldn’t presume to make generalizations about other kinds of software development but I know that testing is a big deal in other software development too.]


Capybara: Taming the Hydrochoerus (with Poltergeist, database_cleaner and friends)

If you’re a Ruby or Rails developer looking for some advice on how to get better at integration testing: congratulations! You’ve reached the highest level of difficulty in all of the areas of the stack you must conquer to become a great Ruby developer.

Integration testing is hard, but it doesn’t have to be. This the subtle of this truth lies in the fact that you must be skilled in both the backend and front-end of your app: you must understand factories and your Ruby objects, and if you have a Javascript-heavy app, the deep fundamentals of Javascript as well.

First things first, you will want to learn how to debug in both the front and back-ends. For the sake of this post, I’m going to assume you have learned a backend debugging tool like byebug. If not, try this tutorial now.

Second, you need to know that Capybara is a syntax for writing Ruby – “a DSL” – for telling a browser what to do. It can work against several of different browsers – Firefox, Chrome, and ‘headless’ browsers you can’t see. If you use a browser you can see, you get the neat effect of being able to view your results as they run, which can be fun (and you should do it) but may not work on a Continuous Testing / Continuous Integration platform.

A headless browser (of which I will discuss two: webkit and poltergeist) is complex to debug, and requires a command of all the parts of the stack.

Occasionally, you may write some Javascript code that will work in one browser and not another (you should learn to avoid this) – that’s why you can run Capybara with a single syntax against many different browsers.

The bad news is, in short, despite it being 2016 and Rails having been around for nearly 12 years none of the drivers is perfect.

Sometime ago I wrote about a neat little trick to view console messages while debugging Capybara webkit.

Driver’s name Browser The Bad The Good
Selenium Firefox Firefox doesn’t let you paste into the console
Chrome Chrome The Chrome debugging experience has some annoying gotchas. Don’t try to open the debugger while your spec is running, unless you pause on the back-end (for example, byebug). If you do pause on the Rails-side, you should be able to also fall into the debugger on the Chrome driver side too. If you do actually manage to open Developer Tools, you can reasonably debug your Javascript
webkit headless There are problems with PATCH requests when using this legacy headless driver. Take note that this PATCH problem was fixed in PhantomJS version 2.0. Webkit also requires you install QT on your system. Webkit lets you inspect status codes using driver.status_code and as mentioned in the post above, console messages too.
poltergeist headless If you app makes PATCH requests, note that poltergeist needs you to be running on Phantom JS 2.0 or higher to be able to process PATCH requests corruptly (when they aren’t, they come through on the server side as empty requests)
By default, anything that is sent from your Javascript as a console message makes your spec run fail (this can be turned off).
“The Worst Except For All the Others” (as Churchill said). This is the one I use primarily. Your console.log output is automatically ported from your Javascript into your test results.

Here’s a list of other notes of things to keep in mind.

  1. You should be using Capybara version 2.7.1 or higher. Earlier versions do not wait for all sessions to close before kicking off Database cleaner’s truncation. When truncation happens before all sessions are closed, bad things happen (like intermittent failing tests). Waiting and timing is explained in detail below.
  2. This applies to you if you app makes PATCH requests: Make sure you are on Phantom JS 2.0 or higher. Note this is a binary to install and on CI server it probably is a global (shell) configuration. (On ours, Semaphore, you need to specify the Phantom JS in the global build commands, not just in your Gemfile.) You to be running on Phantom JS 2.0 or higher to be able to process PATCH requests corruptly. When they aren’t, they come through on the server side as empty requests, which can lead to unexpected results.
  3. Capybara-webkit sucks. It just does. Don’t use it. The intermittent issues alone are enough to throw it out. Use Poltergeist instead. It was an older technology and by and large it has been replaced by Poltergeist. Experienced developers know this and don’t use webkit for this reason. Junior developers fight in vein trying to get webkit to work and waste lots of time believing in something that simply is a shitty piece of technology.
  4. When working with ChromeDriver note that it is annoyingly difficult to open the Developer Tools while the test is running. This is a knonw-issue, and the Chrome developers advise you pause your test to open Chrome Dev toos. This is explained here.
  5. When using Database cleaner with Truncation, Make sure you have it in an append_after hook and not in config.after(:each) (several tutorials will mistakenly lead you down the wrong path here.) It should look like this:
    config.append_after(:each) do
  6. Prefer transaction instead of truncation for all non-Javascript tests (unit tests, controller tests, etc). For Javascript integration specs, you need truncation. An explanation about why can be found at
  7. Use Factories and don’t use fixture data. Fixture data can lead to brittle tests. Generally the entire Rails community has learned from the Dark Days and recommends factories over fixtures.
  8. Don’t use connection pooling. Some people on the internet will tell you to use connection pooling to solve thread-locking problems – don’t listen to them. Capybara already has dealt with this under the hood, make sure you are on a recent version of Capybara.
  9. Avoid using .trigger. Sometimes if an element isn’t visible Capybara will advise you when it fails you can ‘work around’ the element not being on the page by referencing the element and calling .trigger. You’re just trying to get around the on-screen-and-visible enforcement by Capybara, but this isn’t a good idea. If the thing isn’t on the screen and visible, it probably means there’s a bug and you want to catch that as a failure. Remember your tests are only as valuable as what they catch when things mess up.
  10. Circular Dependancy when trying to load ___
    This development issue that causes race condition (intermittent) failures has been explained on this Thoughtbot blog post.

    To fix if you’re on Rails 4.1 or prior, set allow_concurrency = false in test.rb (Rails 4.1 + earlier only)

    Set this in your config/environments/test.rb file set this:

    config.allow_concurrency = false

    You do not need this if you are on Rails 4.2 and above.


Timing is super hard to debug, but there’s an art to it. Tame your Capy specs like a lion tamer. Make them jump through hoops and bedazzel them to calm them down. You need to understand 3 things: Capybara’s native waiting, (2) a wait helper, and (3) an explicit sleep.

Capybara Native Waiting Behavior
If you’re on Capy 2.7 understand that Capybara natively waits for content on a page when you assert it to be there, even when Ajax and rendering might not have it ready to be there at the very moment the assertion runs. Thomas Wolpole, author of Capybara, advises me:

The way 2.7.1 is handling this is through middleware that keeps a counter of any current requests being processed by the app. First it tells the browser to visit about:blank and waits for that to happen, at which point the browser should not be initiating any more requests to the app. Then it waits for the active request counter to be 0, and then continues on.

Instead of using sleep, use expect(page).to have_content(…) to wait for the content you want to appear. Specifically I believe that using expect/have_content waits for the page to have the content you want it to have, but expect/value/to eq does not actually wait. For this reason, sprinkle in some expect(page).to have_content even when you don’t have to just to get Capybara to pause until the page is re-rendered.

You often find yourself writing

expect(page).to have_content(“xxx”)

over and over again. Contrary to the instinct to not Repeat Yourself, that’s a good thing! If this really irks you may write yourself helpers to make this repeated step more encapsulated. What you are really doing is putting the UX through it paces, so think of it like a player piano instructions not like the code you so work on to make beautiful.

It will be easier to do this if your app has natural-language responses like “You have logged in successfully.” For this reason you should encourage your Product Owners/Stakeholders to put in such natural language indicators – it makes your site easier, safer, and your regression suite more solid. And your users will appreciate instant feedback it too. If your product managers insist on ‘silent’ feedback, remember you can use Capybara to assert that things are or aren’t disabled, grayed-out, etc.

Basically, although you can sometimes get away with expectations that do direct Ruby object lookup, you really shouldn’t, or should use it as infrequently as possible

Wait Helpers (Ruby metaprograms Javascript)

Sometimes you’ll see a spec failure that will pass if you add sleep 1 or sleep 2. Avoid this, but use a very fast sleep (I use 0.1) when necessary. Instead of sleeping, turn off animations and write wait helpers for yourself to pause until certain conditions are met.

You should use wait helpers to wait for:

– Ajax requests that Capybara doesn’t seem to pick up natively (Later versions of Capy are supposed to count the number of outstanding Ajax requests but I’ve had difficulty getting this to work consistently. You can and should assert content is on page and prefer Capybara’s native waiting to a wait helper)

– Your app is doing something like initializing (you can even write your app to set itself a global flag when initialization has finished which can be checked from Capy helpers)

Here’s an example of a wait helper that waits for an Ajax request. Note here we are using page.evaluate_script to metaprogram Javascript by way of Ruby code, waiting until a condition is met before continuing the spec.

def wait_for_ajax
 counter = 0
 while page.evaluate_script(“$.active > 0”)
  counter += 1
  print “_”
  if counter >= 100
   msg = “AJAX request took longer than 10 seconds.”
   if page.driver.respond_to?(:console_messages)
    msg << ” console messages at time of failure: ” + page.driver.console_messages.inspect
   raise msg

Here’s an example of a wait helper that would wait for your app’s own initialization cycle, provided yourApp is the variable in Javascript where you app is namespaced, and when it is finished with its own initialization cycle it sets _initialized to true (on itself). You can write your own wait helpers appropriate to things you app does.

def wait_for_your_app
 counter = 0
 while page.evaluate_script(“typeof(yourApp) === ‘undefined’ || typeof(yourApp._initialized) === ‘undefined'”)
  counter += 1
  print “~”
  raise “Your app failed to initialize after 10 seconds” if counter >= 100

Explicit Sleeps

When all else fails sometimes you just need a sleep, which you just do in ruby as sleep X where [X] is the number of seconds you want to sleep.

You should use sleeps very rarely, but I’ve found they are needed in these cases:
– After an Ajax request, sometimes a sleep is needed to let the database catch up. (I try to keep these at about 0.5 seconds)
– A small timing delay (no longer than 0.1 seconds) for your app doing something like re-rendering

In theory you can wait for anything, so try to use Capybara’s internal waiting mechanisms first. In this order, your toolkit is:

1. Capybara’s internal waiting
2. A wait helper (as explained above)
3. An actual explicit sleep (try to keep all sleeps under 0.2 secs)

Remember each knife is shaper than the next, and so you should strive for minimal intrusiveness, but know that a combination of all three is likely necessary. The more astray you go from the art the more likely you experience timing delays.

Warning for anyone who has an expires_in set as a cache-control header in your controller endpoints (html or json).

Yes you! Go look in your code right now for expires_in set in your controllers and if you have any pay attention.

As I documented here, you’ve got to watch out if you have endpoints that have non-Zero cache-control headers on them. The headless driver (poltergeist or webkit) will hang onto the HTTP response between specs. This can be detrimental to you, if, say, the content of that endpoint’s response is what you are testing. In my case, I just used an inelegant hack to work around this- suggestions for improvements welcome.

if Rails.env.test?
 expires_in 0.minutes, :public => false
 expires_in 3.minutes, :public => true


Try to keep your feature specs to about 1-3 minutes to run per file, also maybe split them off when they are about 200-300 lines long. Be mindful of the total run time – since they are so valuable you can afford a little leeway here but keep in mind it slows down you time to develop new features.

Be careful about assertions that reach back into the database. Although you can get it to work, reloading objects and asserting things have changed is prone to race conditions, particularly with database cleaner. Remember that you have two threads operating separately, and even if you are able to do .reload on the object to get it into the right state, it’s actually nearly always better when writing Capybara specs to just assert the UX has changed the way you think it will.

And finally: Patience, discipline, know that others have been here before you and others will come here again. You are on the pinnacle of Rails development – don’t fall! Patience and faith.

Suggestion or feedback? Log in with your Stackoverflow, Github, Facebook or Google account to leave a comment.


A Better wait_for_ajax

def wait_for_ajax
 counter = 0
 while page.evaluate_script(“typeof($) === ‘undefined'”)
  counter += 1
  print “^”
  raise “Jquery not initialized after 10 seconds.” if counter >= 100

 counter = 0
 while page.evaluate_script(“$.active > 0”)
  counter += 1
  print “_”
  if counter >= 100
   msg = “AJAX request took longer than 10 seconds.”
   if page.driver.respond_to?(:console_messages)
    msg << ” console messages at time of failure: ” + page.driver.console_messages.inspect
   raise msg