Testing Effectively
The What, Why, When, and How of Writing Tests
This is a simplified transcription of a presentation I gave on writing software tests.
Introduction ¶
Hello, my name is Sam Bossley and I'm a senior engineer on the frontend web team. I'm here today to talk about tests in software - what they are, why we write them, when we write them, and how to write them. In software, it's an implicit given that everyone should write tests for their software, but the reason is not obvious. First, let's talk about what tests are.
What Are Tests? ¶
So what are tests, really? Tests are checks that ensure your software is working as expected. There are many different types, categories, and tooling around testing but the primary goal of a test is to ensure correctness in your code. The keyword here is "correctness". We want to know that when our tests pass, our code is correct.
Tests aren't always written as code but can also come in the form of checks. I'll talk more on this in a second.
From a frontend perspective, we use four different types of tests:
- Static tests are tests that do not execute any of your code and instead statically analyze code. Some example tools are Sonarqube, TypeScript, and Eslint.
- Unit tests are tests that check the functionality of a single method, class, or function. An example tool for unit testing is Jest.
- Integration tests are tests that check the functionality of multiple parts of software in tandem. Integration tests are one step above unit tests in that regard. An example tool for integration testing is Jest.
- E2E (End-to-End) tests are tests that check the functionality of the end-user product in a production (or close to production) environment. Usually these tools can simulate user actions such as clicking, navigating, and typing. The technology in this category is fascinating. Some example E2E tools are Cypress, Detox, Playwright, Selenium, and even manual QA testing.
It's important to understand that some of these definitions might differ from the "standard definition" or how your individual team might define them. This is simply how I was taught and learned the different types of tests and I think it's a great mental model of them.
Test Type Tradeoffs ¶
Why do we write different kinds of tests? Based on my previous definitions, you might think that E2E tests are the only kind of tests necessary because they are the closest approximation of real user interactions.
Unfortunately, there's a tradeoff for each type of test you write.
I've shamelessly stolen this graphic from one of Kent C. Dodds' articles on testing but it summarizes the tradeoffs well. It's a bit of a messy graphic but in the middle we have the four types of tests and on the left and right we have arrows. These arrows symbolize the monetary cost and the time cost of each test type respectively. As you transition from static to E2E tests, your tests get stronger and more representative of your software, but runtime and costs increase. Running E2E tests means you need to be paying for some CI credits to run all these tests, and E2E test software is still painfully slow as of 2024. If we convert every single one of our [REDACTED]
tests to E2E tests, we might have better testing overall, but at the cost of paying significantly more to CircleCI, and waiting 5 hours for CI to succeed per merge request. It's not always worth the marginal benefit.
Test Coverage ¶
When we talk about tests we also need to understand the concept of code coverage. Code coverage is the approximate percentage of code cases fully covered by tests.
We ideally want to strive for 100% code coverage in all projects, but unfortunately that's not always practical. With any software development there's always a time tradeoff: If getting from 95% to 100% test coverage requires 3 straight years of test writing, it's very likely not worth the time investment.
Here is our test coverage for the projects you might be working on. On [REDACTED]
we maintain around 80-90% code coverage and on [REDACTED]
we maintain around 75-80% code coverage. As I mentioned earlier, lower code coverage percentages aren't always bad.
Why Do We Write Tests? ¶
But why should we write tests? Do we write tests for safety, or best practice? And is it acceptable to not write tests, or should we always have 100% test coverage?
If there's anything you should take away from this presentation, it's this:
Tests give you confidence that your software is working as expected.
What this means is that you should write the amount of tests necessary to give you confidence that your specific software works. Confidence is the key here.
What does it mean to be *confident" in your software? Depending on your software, you might have different threshold of confidence than someone else's software. For example, a demo React application requires a vastly lower level of confidence than the firmware for an airplane autopiloting system. If you have a bug in your React app, it might be a button that is styled green instead of blue. On the other hand, if you have a bug in your autopilot firmware, it might result in a horrific plane crash. Imagine if your pilot said on the intercom, "we only have 50% code coverage in our firmware"!
To summarize, each piece of software you write will have a different testing strategy, different level of confidence, and different amount of tests. There's no "one size fits all". In general, tests are beneficial because they bestow confidence and help you improve your software down the road.
When Do We Write Tests? ¶
In what scenarios should we write tests? And how many tests should we write? Here is a general overview of when I would write tests:
- If you're working on a new feature, write at least one test for each business requirement or acceptance criteria. This will ensure that your business logic never breaks unexpectedly without a test failure. It also documents your software: a new developer can read the tests to understand the goals and purposes of the software without having to read the code.
- If you're fixing a bug, write a failing test before fixing the bug. You can be fairly confident that you've fixed the issue when the test begins to pass.
- If you're working on a chore or refactor, you should write tests if they don't exist, but don't be alarmed if no tests are written. Because refactors should not change any logic, you should expect your existing tests to continue passing after the refactor is complete.
There is no such thing as too many tests. Just like there is no such thing as too few bugs. When in doubt, write a test!
How Do We Write Tests? ¶
Structure ¶
Now let's talk about testing structure and how we can write really robust tests. I'm using Jest for code demonstrations. In unit, integration, and e2e tests, the basic structure of a test should follow this template:
- Given - What is the initial state of the software? What are the existing conditions?
- When - What action is being taken? What changed has taken place?
- Then - What happens as a result of the action? What is the final state of the software?
The template reads in English like this: "Given X, when Y, then Z." In fact, all business requirements should be documented this way. As an example, "Given I am logged into the system, when I press the log out button, then I am logged out of the system".
Structure in Code ¶
Here is how this test structure might look in code:
test("opens the dialog", async () => {
// given
jest.mock(sendAnalytics).mockReturnValue(0);
const { user } = render(<Page />);
expect(screen.queryByRole("dialog")).not.toBeInTheDocument();
// when
await user.click(screen.getByRole("button"));
// then
expect(screen.getByRole("dialog", { name: "Confirmation" })).toBeInTheDocument();
expect(screen.getByText("Confirm your choice."));
});
First, we set up the given state of the component. We render the component and verify that a dialog is not present in the document. Then, we press the button to update the state. Finally, we verify that the state has changed.
If you adhere to this structure, it becomes very easy to write tests and even easier to write tests according to some business requirement.
Tips and Tricks ¶
Here are some super effective tips and tricks I've learn from experience on writing tests that will more your tests less flaky and catch more bugs. Some of these tips might seem obvious to follow, but I've seen plenty of tests in our repositories that don't follow these tips.
I also want to clarify that these tips are mostly specific to the frontend domain.
Set up mocks before documents or DOMs. ¶
When the component below is rendered, what value does useCart
return?
jest.mocked(useCart).mockReturnValue(result1)
render(<Cart />)
jest.mocked(useCart).mockReturnValue(result2)
// should we expect to see result1 or result2?
This is a trick question! We don't know and can't definitively say for sure. If we move mocks to the top of the test, it makes thinking about the initial state of the test much simpler.
jest.mocked(useCart).mockReturnValue(result1)
jest.mocked(useCart).mockReturnValue(result2)
render(<Cart />)
Now we know with absolute certainty that when <Cart />
renders it will use result2
.
Mock as little as possible. ¶
The less you mock, the stronger your tests are going to be. Consider the following example:
import { add } from "utils/add";
jest.mock("utils/add");
function addPlusOne(x, y) {
return add(x, y) + 1;
}
test("adds two numbers", () => {
jest.mock(add).mockReturnValue(4);
expect(addPlusOne(2, 2)).toBe(5);
});
Because we're mocking out the majority of the functionality, there's not really much functionality we're testing. We're just asserting that our mocks are mocked properly. When it comes to tests, you should almost always never mock utilities and pure functions. Even raw API calls themselves can be mocked with something like axios-mock-adapter so there's no need to mock React Query mutations.
It would be better to write the test like this:
function addPlusOne(x, y) {
return add(x, y) + 1;
}
test("adds two numbers", () => {
expect(addPlusOne(2, 2)).toBe(5);
});
In general, you should only mock libraries, language/browser APIs, and production data - in other words, libraries and functions outside your control.
Use specific selectors for positive assertions and broad selectors for negative assertions. ¶
When I say "selector", I'm referring to the expression we use to query an element in the document. When I refer to a "positive" or "negative" assertion, I'm talking about assertions that assert the existence or absence of an element respectively.
Consider the following assertion:
expect(screen.getByText(/click here/i)).toBeInTheDocument();
This selector will match the text content "Click here". There's nothing inherently wrong with this assertion. However, because the selector uses regex and case insensitivity to broadly find text, it will also match the following texts:
- "click here" (uncapitalized)
- "CLIck HeRe" (case is completely wrong)
- "double click heresy" (substring of another string we might not intend to match)
- "PLEASE DO NOT click here" (substring of another string we might not intend to match)
The best practice for asserting existence of an element is to use a very specific selector to find the exact element.
expect(screen.getByRole("button", { name: "Click here" })).toBeInTheDocument();
Now we're using an exact string literal to ensure that our text exactly matches a DOM element's text. We've also added a role to double check that we're actually matching a <button />
.
You can even go further: testing-library lets you work with selectors within selectors with their within
function:
expect(
within(screen.getByRole("dialog"))
.getByRole("button", { name: "Click here" })
).toBeInTheDocument();
In contrast, the opposite is true for asserting the absence of an element. If we write the assertion below and make an accidental typo:
expect(screen.queryByText("click here")).not.toBeInTheDocument(); // "click" should have been capitalized
This assertion will always pass, regardless of whether the element exists or not. The best practice here is to use a very broad selector to prevent typos and refactors from creating false positives in your tests.
expect(screen.queryByText(/click/i)).not.toBeInTheDocument();
If we use regex and case insensitivity, we're prevent our human errors (typos) from creating false positives in our tests.
Better yet, if you can broaden your scope even further, write an assertion only based on role:
expect(screen.queryByRole("button")).not.toBeInTheDocument();
Conclusion ¶
Writing tests for software gives you confidence that your code works as expected and can catch critical bugs before they happen. The more you write tests, the better you will become at writing great tests. Remember, you're writing tests to document your business logic and prevent yourself and other developers from making obvious mistakes years from now. Thanks for listening!