ArticlesEngineering

Human friendly tokens

Thoughts on designing tokens for secrets, sessions, and tokens.

I've been working on an API which requires a bearer token (an API key) to authenticate and use. I've iterated on it a fair amount, so here are some considerations that went into their format.

Make copy/paste easy

Try to copy this secret:

bearer token: csjkncc csjndkc4-cdsnjk?34893cdnj

Try to copy this secret:

bearer token: csjkncc_csjndkc4_cdsnjk_34893cdnj

Text selection is an important characteristic of format. The ideal copying experience for a single token, is to be able to double-click and have the OS/browser/etc select the full text range for the user.

To achieve this, we need to roughly follow the rules for what is generally considered a whole word in text selection: A-Z, a-z, and 0-9 characters (plus other accented, non-ASCII character but we don't tend to lean into those because we want to work in ASCII) and underscore (_).

The quickest way to get there is taking whatever value you actually have as your secret, say in your database, and Base encoding it.

Base64 is super common and well supported, but doesn't make the cut alphabet-wise. Base64 has + and = in its alphabet which as we can see does not play nicely with text selection:

bearer token: csjkncc=csjndkc4=cdsnjk=34893cdnj
bearer token: csjkncc+csjndkc4+cdsnjk+34893cdnj

The web version Base64URL swaps +/= with -/_ but dashes don't play nice either:

bearer token: csjkncc-csjndkc4-cdsnjk-34893cdnj

We're so close! Luckily base encodings can be designed so we can use a Base 63 encoding to shave off the dash from the alphabet.

The downside to defining a base encoding of your own is lack of ecosystem help (libraries providing encode/decode for free) and alignment with other developers as there is often not a standard on just what exactly the alphabet of characters will be.

Base 63 would solve the text-selection problem, but I choose to go with Base 62 for my bearer tokens. I've seen Base62 and never seen Base63, but also I needed underscore (_) removed from the alphabet so I could use it for my own purpose:

Prefix the secret

Developers juggle a bunch of secrets and tokens in a reasonably complex system. Looking at walls and walls of nonsense text, a secret can self-identify itself by using a common prefix.

For my bearer token I'm doing:

bt_{base_62_encoding_secret_value}

The prefix makes it easy to know as a human that you are probably dealing with a bearer token, not a junk random string.

Reasonably ergonomic

I chose to go with Base62 for my bearer tokens. I was deciding between Base62 and Base58. Base58 is also common and the difference between the two is in their alphabet. Base58 removes characters which are visually similar to other characters: 0OIl are not included.

This is a great human consideration. For people reading these secrets, it's great that they can read them more reliably in different contexts.

However, I don't want or need anyone reading these bearer tokens. I've made the copy-paste-able and obvious but I don't want to encourage or make it easy to read, speak, or type the individual characters in the secret. Copy it, paste it, and get on with life. My bearer tokens are long enough that someone will screw it up doing manual transcription anyways.

Base58 has a place when you have situations where humans might need to be comparing these values, but with bearer tokens the only thing needing to look at the token is a computer.