Code quality: Locality

Locality is a measure of where code is organized and how it is interacted with. It's an important dimension to writing good code as it's the largest factor in development velocity.

In general, you want good locality between relevant pieces of code. It's like writing a good story: you explain things in a certain order and a series of events are located in the same chapter. However, in a sufficiently large code base, not everything can be local and you would not want it to be anyway: too many things are hard to reason about. The story would be too long and you don't have time to read it all.

Placement

Locality at a more "physical" level is how things are grouped:

  • Similar methods are grouped together within a section of a file
  • Similar sections are grouped into a file
  • Similar files are grouped in a directory
  • Similar directories are grouped into a package, repo, and so on

The elephant in the room is “what does similar mean?” There are two trains of thought:

  • Group by type of thing
  • Group by feature or concern

Supposing we have a simple app with users logging in and checking their bank balance, if we were to group by type of thing we'd have:

my-app/
  models/
    user
    user_session
    bank_account
    balance_entries
  views/
    login
    logout
    balance_page
  servers/
    primary

And this same structure grouped by feature we'd have:

my-app/
  authentication/
    user
    user_session
    login
    logout
  banking
    bank_account
    balance_entries
    balance_page
  servers/
    primary

In this feature (or “concern”) case, you'll notice that servers remain the same as presumably the server is hosting both the authentication and the banking features. It does not distinctly belong in one or the other. In each style there will be things that don't all fit into neat boxes.

But let's compare these two approaches:

  • Grouping by type of thing, it's immediately obvious how many things of each type there are. For example, we can plainly see there are 4 models. However, with all models placed in the same folder, it is hard to conceptually drive a wedge between how models are used and how they ought to interact.

  • Grouping by feature, it's immediately clear that user_session is for authentication and balance_entry is for banking. However, what is a model and what is a view we cannot be certain.

When we consider change management, the "type of thing" approach is generally more optimal for the "type of thing" team to work on: everything they need to touch regarding the big picture of models is in one place. It's easier to make sweeping changes to all models and oversee their usages. The feature approach is more optimal for feature developers as the elements that compose into a useful feature are in one place and can be iterated on is lockstep. Ideally if an entire feature is made obsolete, you can simply delete the feature's folder and move on.

There's an inherent struggle here on which locality scheme is optimal. In my experience which style is adopted in a given situation is oriented around the organization of people working on the code. An infrastructure team will naturally bias towards the "type of thing" they are responsible for: this code structure is neat and orderly for them. A product team will group by feature as that's how they iterate: building feature after feature, some succeed and some fail. They are primarily responsible for producing value, not organizing things such that they can steward a single feature long term or maintain it indefinitely.

I recommend a balanced approach here where infrastructure teams and product teams are catered to. The reasoning:

  • Product teams are nomadic. Once a feature is built, they move on. This is optimal: innovation comes from chasing, taking risks, and failing. With this mobility and churn, knowledge transfer breaks down more readily: people come and go and don't understand the things that have been built. As such you want to allow product teams the liberty to isolate features and draw proper boundaries. Those isolated features that do succeed tend to be easier for infrastructure teams to adopt and champion.

  • Infrastructure teams grind. All those successful features become entrenched and hard to change. Strewn throughout a codebase with no way to grok the big picture and make cross-feature changes to these common types of things leaves infrastructure teams in a tough spot. They may not be able to dig themselves out of a hole and in the meantime product teams working in isolation in aggregate exacerbate the problems. In order for infrastructure teams to keep pace with product teams, they need to be trailing but not too far behind.

  • Culture and leadership are major factors. Leaders who value features and won't see the value in infrastructure see developer velocity grind to halts. Leaders who allow the infrastructure to always be tip-top never see features. Interestingly, in these environments, the best engineers are often running the unspoken counter-culture: incremental releases of features and continual, strategic investment in velocity. This is the balanced approach.

A balanced approach to the above, in which you can enable both styles to coexist, looks like:

my-app/
  authentication/
    models/
      user
      user_session
    views/
      login
      logout
  banking
    models/
      bank_account
      balance_entries
    views/
      balance_page
  servers/
    primary

We can build tools to observe code structured in this format using both product and infrastructure lenses. We treat the authentication and banking directories as features (more commonly called modules or packages), modeling them as a "type of thing." We build tooling to support introspecting these features in aggregate, e.g. "Give me a list of all the features."

This serves as a starting point to drive a consistent structure within features. For example, enforce that all models must live in the ${feature}/models directory. Now we can get all of the models by querying for all models within each feature. In this case, product teams write against a prescribed pattern but otherwise code how they want to model and work out a problem. And infrastructure teams can still understand the big picture and make sweeping changes across models as they need.

At a macro-level code locality is an approximation of the organization of people which maintain the code. Expect the optimal locality to change over a feature's lifetime.

Immediacy

Code locality is an optimization problem of:

  1. Make code easier to find.
  2. Make code easier to read. (If you can't find it, you can't read it.)
  3. Make code easier to change. (If you can't read it, good luck changing it.)

Logical and consistent grouping, in addition to search tools, make finding code much easier than without. This is the fundamental prerequisite to any development not starting from scratch. That grouping aids in reading: it's the difference between reading a chaotic novel and a beloved short story. And finally, accomplishing a change in a single grouping is much easier than making a single change in dozens of disparate places.

But great code locality is not merely about the right place. It is also about the right time. Discovery, readability, and iteration can be aided by a spectrum of tools and engineers ought to be diligent and mindful of taking advantage of them.

Here's a list of assistive tools sorted from best to worse:

  • Auto-complete
  • Type-checking
  • Tests
  • Code comments
  • Documentation
  • Human assistance

The "right time" with regard to code locality is almost always as soon as possible. Auto-complete helps you type, type-checking helps you as you type (in most interactive development environments (IDEs)), tests help you as you save changes locally (if they are fast enough (so make them fast)). Code comments, documentation, and other humans help provide context not encoded (and perhaps outdated); each being progressively slower to find/use.

One instance in which timeliness of code locality should not be ASAP is when it results in an intractable proliferation of options. So called "over-choice" is a burden and distraction as it presents the developer new problems to solve, not solutions to their problems.

Another instance in which ASAP timeliness is not optimal is when it creates poor code quality. A common example of this is the abuse of code-generation tools. Often engineers struggle to properly encode a comprehensive and minimal interface and resort to generating massive amounts of boilerplate code. This is a form of auto-complete which can be actively harmful, littering a code base with inconsistency and noise that can be difficult to claw back. Helping fellow engineers write bad code faster is not worth doing. More timely and easy to use tools must create positive feedback loops.

Leveraging code locality on this dimension is a potential boon for every engineer:

  • For product engineers, a more precise modeling and shorter feedback loops lead to faster development and fewer bugs.

  • For infrastructure engineers, leveraging these tools makes the features they support more self-service; freeing them up to support product teams in more domains.

However, the rub is leveraging these assistive tools generally takes more of an engineer's immediate time. You would not want to build a collection of auto-complete snippets for something that turns out to never be used. You must be judicious in deciding at what rung you ought to help yourself and fellow developers compared to delivering the features themselves.

This is yet again a place where I recommend the balanced approach: incremental releases of features and continual, strategic investment in velocity.

Infrastructure teams do well to allow product teams to try a bunch of things, see what sticks, and then encode and elevate the victors. Product teams do well to explicitly model and encode the domain they are working in such that they succeed in building features. This is why we see strong engineering cultures enforce bare minimums, for example, that all code must have tests or that all code must have type annotations.


Locality is a balancing act in two aspects: placement and timeliness.

In regards to placement, the correct approach is a trade off of who to cater to, e.g. who is the audience. Above we juxtaposed infrastructure and product teams, but we can have many more audiences than that: non-technical people (e.g. needing to introspect/audit the system) and even computers and the technical limitations they present us (e.g. constraints of hardware and distributed systems).

For timeliness, we weigh investing in velocity versus features. We consider the tools at our disposal to help us write code faster and better.

In each dimension, moderation and incremental, continual investment is the best path forward between possible extremes.