ArticlesEngineering

Mock soup

Why and how to move away from test mocking

I write a lot of code, not so many tests. I've not been sneaky not writing tests or anything like that, I just haven't found most tests to be valuable.

Where I am still writing tests is around logic and API/user boundaries. I'm testing simple, pure functions. And it's usually the case that I can take a complex set of side-effecting operations, littered with the real logic, and split it into something testable, a "policy," and the procedural code that executes the side-effects. Nothing ideal or purist here, these things muddle often into large functions where it makes sense to not split them into chunks.

Don't mock me

What I'm not writing is mocks or code "expectations." In JavaScript/Jest parlance these would be expect(x).toBeSomething() style assertions. In Ruby, it would be Class.expect(:method) or some such meta-language/DSL.

This expectation-style mocking is useless and a net negative for these reasons:

Mocked tests test things that don't happen

Unless you're very careful or clever, mocking allows code not to run. Inject your spies, method replacements, etc it all boils down to making the code behave differently than it would in production.

Mocked tests are brittle

The implementation details that no end-user sees ought to be able to be revised and refactored with ease. We don't care if method x was called, we care that our contract with the edge (the API, the end-user) is upheld.

The carrying cost of mocked tests is huge. They don't reflect production as stated above. So not only does code need to be refactored, those mocked tests also need to be refactored. Tests littered with expectations rarely have much explanation for them. A codebase of mocks becomes a mentally taxing game of trying to decipher what's real real, what's fake real, and what is now irrelevant either because the code now does something different or because the test never really tested anything useful to begin with. The ratio of code change time to test fixing time in codebases I come into with mocking is always at least 1:3.

Mocks are a false sense of security

Developers are lulled into a false sense of security by the presence of heavily mocked tests. A developer seeing a mountain of tests and having no time to delve into their intricacies may assume their change is safe when it passes existing tests. They may not know that everything is passing simply because no code is executed. Especially after fighting through a mountain of brittle tests, one might think, "Yes, finally. I know this change is safe."

This is especially the case for integrations, i.e. external API calls and events. There's too much dogma that every code change needs a test. When you're writing an API integration you have three choices: write tests with heavy mocking, toil away in dependency injection to achieve the same useless outcome, or not write automated tests (but this correct choice isn't allowed).

Alternatives

"Okay, mocks suck duh," said anyone who's ever worked in one of these codebases. What should we do instead?

Abandon mocks

If you work with others, mocks are just too sexy to someone needing to ship hot garbage on a tight deadline. Mocks are the glorious answer to test failures: just turn everything that fails off just enough to squeeze the code change into production.

This is not an article on banning shipping "bad" code quickly. That's not always the wrong thing to do (see: any startup). Mocks are particularly bad because they won't ever get cleaned up and will be a long term maintenance burden. There are better ways to ship bad code quickly.

It's prudent to provide your developers no way to write mocks at all. This means dropping the hot garbage test frameworks like JavaScript's Jest and choosing something more like ava where the worst someone can do is use dependency injection patterns (strictly better).

With no mocks allowed in the tool belt, developers will be forced into finding or building the better options. What are those options?

Feature flags

Feature flags still fit that bill of allowing anyone to ship bad code to production quickly. Instead of littering mocks, someone can instead litter disable_flag(name) and enable_flag(name) in all their tests to get them passing.

Feature flags are themselves mocks in automated tests, but they are strictly better.

Flags are much, much less powerful. A flag can only be on or off (generally, there are crazy feature flag systems that overpriced companies peddle). Storing a single bit of difference, flags represent the smallest amount of mocking possible. This divides the code into two states and at least one branch must be real code that does get executed in production.

Flags are also much simpler. Did you know that no matter what mocking/expectation library or DSL you use, you can't possibly know all its features? With flags you can, in fact I just presented them above and they were obvious: disable_flag(name) and enable_flag(name). Much better than that expect(x).toBeBlueOnATuesday() nonsense.

You can build a very nice system of hygiene around flags, getting everyone to either drive them to completion or remove them if they never get turned on just with some light nudging. We couldn't do the same for mocks: they're too complex, those get-shit-done hacks that riddled the code base don't have one nice name, like a flag does, to identify them by. It's all just spaghetti in the codebase and the only cure is bankruptcy.

Feature flags are awesome because you probably should have been decoupling your deployment from your behavior change anyways. Good stuff & thank me later!

Real test dependencies

I don't mock but that doesn't mean I can't write more than just unit tests. I can use real dependencies just fine. For example, following an operation that is supposed to write a record to the database I can in my test query for the presence of that record in the database.

You can do this testing poorly too, like by asserting the whole record needs to match a certain shape as opposed to just making assertions on what you need. And real dependencies, if you're not keeping an eye on test times and abuse through overuse, can become super slow. But like feature flags, there are tools, ratchets, and hygiene you can build to mitigate it.

At the end of the day, real dependencies are strictly better: they're testing real behavior.

Better testing environments

I'm no QA or staging shill, that stuff is hugely wasteful and hugely useless to maintain in a steady state. An absence of mocks can put good pressure on an organization to invest in quickly reproducible testing environments. Again, nothing crazy here, just some ready-made test accounts, API keys, and fake data to hit up real APIs and see what happens.

This is especially valuable when it's local development in an environment like a REPL. Getting to run and debug code locally is hugely underrated these days, but it really shouldn't be.