ArticlesEngineering

Postgres: On Conflict Retrieve

A Postgres feature request for supporting create-or-retrieve on INSERT statements.

There's a feature I really want for Postgres. I call it ON CONFLICT RETRIEVE.
Basically I want to be able to do:

CREATE TABLE dog_breeds (
  id serial,
  name text,

  CONSTRAINT unique_name UNIQUE (name)
);

INSERT INTO dog_breeds (name)
VALUES ('labrador')
RETURNING (id);

-- =>
--  id
-- ----
--   1
-- (1 row)

INSERT INTO dog_breeds (name)
VALUES ('labrador')
ON CONFLICT ON CONSTRAINT unique_name RETRIEVE
RETURNING (id);

-- =>
--  id
-- ----
--   1
-- (1 row)

In a single statement, I can attempt to insert a record and if it already exists just return the conflicting record.

Without this feature, we tend to write application code that looks like this:

async function createOrRetrieveDogBreedId(db, dogBreedName) {
  // See note A
  try {
    const result = await db.exec(
      `INSERT INTO dog_breeds (name) VALUES (?) RETURNING (id)`,
      [dogBreedName]
    )

    return result.rows[0].id
  } catch (error) {
    // See note B
    if (!isDogNameUniqueConstraintError(error)) {
      throw error
    }

    const result = await db.exec(`SELECT id from dog_breeds where name = ?`, [
      dogBreedName,
    ])
    const id = result.rows[0]?.id

    // See note C
    return id
  }
}

Note A: In a distributed context, you generally want to attempt the INSERT then SELECT if needed. This is because two processes may both query for a result, find nothing, and then both attempt to INSERT. This means one will hit the unique constraint error and need to handle it anyway.

Note B: Detecting constraint errors is relatively straight forward. Most Postgres drivers will return the name of the constraint that was violated so you just need to make sure your application's understanding of the constraint names matches the database naming. The potential for mismatch is reason to explicitly name constraints which is a good idea in general.

Note C: Here we have to make a choice:

  1. Assume it is highly unlikely that the dog breed was deleted or changed since trying to insert. This gets more and more risky in GC'd languages where your application might pause at random points between queries or when database request latency is high.

  2. If the ID is not returned, try the create-or-retrieve again until you succeed. A programmer might reach for recursion to drive towards a consistent state, which is problematic for reliable systems. For example, if isDogNameUniqueConstraintError is implemented incorrectly, this function may infinite loop due to a different error.

The ON CONFLICT conflict_target RETRIEVE feature is awesome because:

  • It's a single point-in-time operation and a single statement. I don't have to issue a new SELECT query that might query state that may have changed since the INSERT attempt.
  • It's fast: we've already traversed the indices and have access to the row IDs from trying to INSERT so it is a perfect time to pivot to returning results.
  • It's so handy. Create-or-retrieve is a very common operation. It is a huge win to reduce either the amount of application code I need to write or cut down the complexity of submitted statements.

I've seen many StackOverflow threads where people have proposed hacks to work around not having this, leveraging:

  • The existing ON CONFLICT DO UPDATE SET column = column for a no-op mutation but this does lock the row and has other consequences and the overhead of an UPDATE.
  • A butt ton of Postgres wizardry and CTEs, which again carries overhead.

Perhaps there's already a way to do this in Postgres while meeting all my desired requirements. I have been unable to find it written about on the internet.

I don't think RETRIEVE needs to be the syntax. Today we have:

  • ON CONFLICT DO NOTHING
  • ON CONFLICT DO UPDATE

So it seems nice to keep with the DO _ pattern. I could see these working:

  • ON CONFLICT DO SELECT. I hesitate on this one though because SELECT is so overloaded and we wouldn't want to be so feature rich in my opinion.
  • ON CONFLICT DO RETURN. This one is nice and simple, but DO RETURN is kinda weird in that it's really forcing the DO _ convention. Perhaps a simple ON CONFLICT RETURN would be good.

It's super odd we don't have this feature yet because it seems so easy to implement since we already have DO UPDATE. Hopefully it is easy to add because I want this really really bad!