A Postgres feature request for supporting create-or-retrieve on INSERT statements.
There's a feature I really want for Postgres. I call it
ON CONFLICT RETRIEVE
.
Basically I want to be able to do:
CREATE TABLE dog_breeds (
id serial,
name text,
CONSTRAINT unique_name UNIQUE (name)
);
INSERT INTO dog_breeds (name)
VALUES ('labrador')
RETURNING (id);
-- =>
-- id
-- ----
-- 1
-- (1 row)
INSERT INTO dog_breeds (name)
VALUES ('labrador')
ON CONFLICT ON CONSTRAINT unique_name RETRIEVE
RETURNING (id);
-- =>
-- id
-- ----
-- 1
-- (1 row)
In a single statement, I can attempt to insert a record and if it already exists just return the conflicting record.
Without this feature, we tend to write application code that looks like this:
async function createOrRetrieveDogBreedId(db, dogBreedName) {
// See note A
try {
const result = await db.exec(
`INSERT INTO dog_breeds (name) VALUES (?) RETURNING (id)`,
[dogBreedName]
)
return result.rows[0].id
} catch (error) {
// See note B
if (!isDogNameUniqueConstraintError(error)) {
throw error
}
const result = await db.exec(`SELECT id from dog_breeds where name = ?`, [
dogBreedName,
])
const id = result.rows[0]?.id
// See note C
return id
}
}
Note A: In a distributed context, you generally
want to attempt the INSERT
then
SELECT
if needed. This is because two processes may
both query for a result, find nothing, and then both attempt to
INSERT
. This means one will hit the unique
constraint error and need to handle it anyway.
Note B: Detecting constraint errors is relatively straight forward. Most Postgres drivers will return the name of the constraint that was violated so you just need to make sure your application's understanding of the constraint names matches the database naming. The potential for mismatch is reason to explicitly name constraints which is a good idea in general.
Note C: Here we have to make a choice:
Assume it is highly unlikely that the dog breed was deleted or changed since trying to insert. This gets more and more risky in GC'd languages where your application might pause at random points between queries or when database request latency is high.
If the ID is not returned, try the create-or-retrieve again
until you succeed. A programmer might reach for recursion to
drive towards a consistent state, which is problematic for
reliable systems. For example, if
isDogNameUniqueConstraintError
is implemented
incorrectly, this function may infinite loop due to a
different error.
The ON CONFLICT conflict_target RETRIEVE
feature is
awesome because:
SELECT
query that
might query state that may have changed since the
INSERT
attempt.
INSERT
so it is a
perfect time to pivot to returning results.
I've seen many StackOverflow threads where people have proposed hacks to work around not having this, leveraging:
ON CONFLICT DO UPDATE SET column = column
for a
no-op mutation but this does lock the row and has other
consequences and the overhead of an UPDATE
.
Perhaps there's already a way to do this in Postgres while meeting all my desired requirements. I have been unable to find it written about on the internet.
I don't think RETRIEVE
needs to be the syntax.
Today we have:
ON CONFLICT DO NOTHING
ON CONFLICT DO UPDATE
So it seems nice to keep with the DO _
pattern. I
could see these working:
ON CONFLICT DO SELECT
. I hesitate on this one
though because SELECT
is so overloaded and we
wouldn't want to be so feature rich in my opinion.
ON CONFLICT DO RETURN
. This one is nice and
simple, but DO RETURN
is kinda weird in that it's
really forcing the DO _
convention. Perhaps a
simple ON CONFLICT RETURN
would be good.
It's super odd we don't have this feature yet because it seems
so easy to implement since we already have
DO UPDATE
. Hopefully it is easy to add because I
want this really really bad!