API localization

Table of contents

Error messages: user vs developer
Data is the bedrock
Developer localization

You may have been tasked with localizing an API. This is often the case with APIs more near to end-customers or users. Offloading the responsibility for localization to an API provider in some ways can be seen as optimal. Why have N integrations repeat the translation of simpler touch-points to an API? Alas, the only absolute truth of software localization is that we underestimate the complexity of doing it well absolutely.

That difficulty is a challenge, not a dismissal, of the pursuit of localizing an API. Even if it is not perfect in implementation, localization can be a huge value add.

As the "Internationalization guy" at Stripe, I've been asked on numerous occasions about whether and how Stripe should localize its API. These are my thoughts on the broader space of localizing any API which will get you the most bang for your localization buck.

Error messages: user vs developer

The first thing one might consider localizing is the API error messages. They are the most textual and fit into existing string translation pipelines with only a bit of work. One of those pipelines looks like these steps:

Tag each message with an ID,
Upload source message files to translators (in data, HashMap<MessageID: string, ErrorMessage: string>)
Download completed translations to be used in the application.
For a locale, if the translated message is available for the ID, use it.

Something you'll want to hammer out and quickly is the intended audience of the error messages. Are they for developers to aid them in debugging integration issues, or are they intended to be surfaced directly to users, perhaps in a user interface?

I recommend separating the considerations for the two audiences. The API error messages should be developer facing exclusively. If the API errors are intended to be resurfaced to users, I would suggest a more human-oriented API.

A more human API looks like (1) for validation errors returning all of the validation errors in a batch as opposed to returning early on any single validation like a computer would be okay with. It also means (2) understanding the user interface as most likely some kind of form and allowing certain errors to be attributable easily back to certain form fields. Lastly, (3) humans provide data sparingly and out of order so have API resources which model partial submissions or drafted objects. These considerations make for a larger, different API.

That's a lot of work and not applicable to some integrators' use cases anyways. Human oriented APIs I think work best when offered as a layered approach atop the low-level.

Data is the bedrock

A common scenario API integrators find themselves in: the error is generic and the error message is more specific. If the integrator cares to model that more specific information they must resort to string parsing on the error message.

An example would be:

// US
{
  type: 'error',
  kind: 'tax_id_invalid',
  message: "Taxpayer identification numbers (TIN) is required"
}
// Canada
{
  type: 'error',
  kind: 'tax_id_invalid',
  message: "Individual Tax Number (ITN) is required"
}

We've embedded context only available in the error message so the integrator must either do string parsing or present this error to the user to provide a decent user experience.

This isn't a localization only concern but is something that a lot of providers of APIs force on their users inadvertently by providing great error messages without great error data.

Localization is no different than other cases. If you're going to offer more context in the error message, you'll also want to offer the same information as machine readable data.

In the above example, we'd want to offer data to match:

// US
{
  type: 'error',
  kind: 'tax_id_invalid',
  message: "Taxpayer identification numbers (TIN) is required",
  data: {
    type_id_type: 'us_tin'
  }
}
// Canada
{
  type: 'error',
  kind: 'tax_id_invalid',
  message: "Individual Tax Number (ITN) is required",
  data: {
    type_id_type: 'ca_itn'
  }
}

We don't need to give the pretty, human readable names as data fields. We only need to convey the information in a distinguishable way.

For localization, this error data becomes essential to allowing integrators to localize their interfaces either with proper tone of voice for their businesses and target markets (some markets like a calm voice, some like a strictly professional voice) and expand beyond the supported set of locales that your API offers.

Developer localization

It is easy to get really excited about localizing APIs. After all, programming is so overwhelmingly English-centric it's hard not to have sympathy for everyone else.

However we need to balance our pure vision with the realities of programming. The vast majority of API integrators, upon hitting an error or unexpected state, ask a computer. Search engines, help forums, and LLMs work best on a common text or source language. If we provide a localized error and they can't get to the reference material and external information about the error with ease, that's also a bad development experience.

Having extensive error catalogs are great and more APIs should have them alongside their API reference documentation, but these resources often can't tell the whole story of how to handle these cases. We need richer examples, peer discussion, and lessons already learned the hard way.

Giving access to the canonical / source error message is essential. Make it easy to get it! Some ways you might do so:

Include message and localized_message together always. This is simplest, allowing the developer to use both without any configuration flipping. The edges are:
- How to handle the source locale? Do we include the exact error message twice or do we only show localized_message when the two errors differ? Probably the latter.
- Be prepared for scrutiny of your translation quality. Side-by-side translation presented in this way is going to help you stomp out quality issues quickly if you embrace the feedback on how to improve them instead of being demoralized by it.
Make developer locale something easily configured per request using HTTP headers, environment variables, or other easy-add inputs to your API.
- Not allowing per-request configuration and instead requiring locale be set at some account, user, or API key level is frictional. Our goal in catering to the developer via localization is that getting their job done well is easier. Our goal is to not localize for the sake of it.

Developer localization is an area which still after so long is ripe for experimentation on the right way to do it. Good luck!

Published 7/4/2024

API localization

Error messages: user vs developer

Data is the bedrock

Developer localization

Subscribe for new articles