ArticlesEngineering

Bound APIs

Enforcing API contracts have maximum sizes

A lot of making good API designs is about being careful about what the limits of computers are.

It's common to see APIs like:

all_records = ORM::MyTable.load_all

Such APIs can make sense for prototyping or non-critical systems, but we know that there is an implicit limit to how much memory can be allocated to loading all the records. If this piece of code is put into production, it will be a problem eventually.

These APIs are unbounded. The tipping point where they cease to function as expected is not specified. Additionally, even if there is some limit a consumer of this API isn't made aware of this concern anywhere in the interface.

Making the bounded APIs isn't difficult. In this case the interface can require a limit to specified:

some_records = ORM::MyTable.load_all(limit: 1000)

It immediately shines a light on the limitation. The load_all will be insufficient for everything we need. We'll need pagination and other tools to work with the data in a reasonable way.

Becoming bound

When we start to focus on predictable performance, having bound inputs and outputs becomes very important.

In my own projects where bound APIs are something I think is worthwhile, I've been doing some cool stuff in this space.

Take a public-facing API for example. When I'm ready to enforce that API be bound, I flip on an annotation:

 {
   operation: 'my_api_operation',
+  sizeBound: true
 }

This annotation enforces that my_api_operation's API specification inputs and outputs all must be bound.

In practical terms:

  • All strings must have a max length.
  • All lists must have a max length.
  • For hashes with arbitrary keys and values, all keys and values must have a max length.

If I am sloppy or miss a field, a test fails.

Having an opt-in annotation is better than simply requiring all these conditions upfront because in my opinion that's too much premature locking down. It's better to have control over when you decide an operation or a whole API should become bound.

Some applications

Okay so we have this bound API. This is good for performance, security, setting user expectations, compute and storage costs, but what else can we gain from it?

Max request sizes

Most APIs sit behind some reverse-proxy or "API gateway" that accepts incoming requests and fans them out. In DDOS attacks against that API, there are a bunch of options in our tool belt to reduce load.

Often what we try to do in this API gateway is figure out which requests are garbage, e.g. would never make sense or be processed by API operations, and reject them. Rejecting requests saves wasted work.

An example of a request that probably doesn't make sense is one sending a 10GB request body. It's very common that a gateway has a single max request size limit to reject anything above that size.

Because we know our API is bounded, we can go further and calculate for each operation its exact total max size. If a request were to come in that was over that size, the API gateway could reject it.

Generative testing

Property-based testing or generative testing is where we specify an expected behavior and let the computer make the test cases to prove its correctness.

An unbound API can't have empirically provable correctness. Being unbound, there are either an infinite amount of behaviors (and computers can't do infinite amounts of work) or a lack of specificity that allows for bugs to surface.

A bound API can have empirically provable correctness. We know the allowable set of inputs and outputs. We could, in theory, generate them all and test each one.

In practice, generative testing, even with a bound set of inputs, pushes up against the limits of the computer. Much of what makes generative testing work are the algorithms around bisecting the problem space and shrinking a found test failure down to the minimal reproducible input.

Nonetheless, bound APIs have a narrower space of possible values and therefore generative testing can be more useful as a tool for verifying correctness.