Skip to content

Surprising infinite retrying of SQL statement on a replica causes replication lag #3673

@steve-chavez

Description

@steve-chavez

Problem

On a read replica, this happens:

GET /rpc/xx..

Jul 22 23:21:04 [redacted] postgrest[115960]: 22/Jul/2024:23:21:04 +0000: 
{"code":"40001",
"details":"User query might have needed to see row versions that must be removed.",
"hint":null,
"message":"canceling statement due to conflict with recovery"}

On the postgres logs, this error is repeated infinitely:

..,ERROR,40001,"canceling statement due to conflict with recovery","User query might have needed to see row versions that must be removed.",,
PL/pgSQL function xx() line 3 at RETURN QUERY","WITH pgrst_source AS (SELECT ""pgrst_call"".
...

For some reason this causes noticeable replication lag, I'm assuming this error somehow affects the replication process. It can only be solved if postgrest is stopped.

The problem root cause can be found on hasql-transaction:

Solution

Don't retry on 40001. We would need to patch hasql-transaction to make the retrying behavior configurable and then disable it.

Should be solved in nikita-volkov/hasql-transaction#22

Notes

This is hard to reproduce. But I think we shouldn't do any retrying by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions