(2.12) Read-after-write & monotonic reads #6970

MauriceVanVeen · 2025-06-13T10:27:37Z

See ADR: JetStream Read-after-Write for context, problem statement, and design.

Resolves #6557

Signed-off-by: Maurice van Veen [email protected]

derekcollison · 2025-06-16T15:25:47Z

Hard problem for sure..

I like the opt-in, but why not just make it such that the leader is the only one that listens on a non-DQ subscriber that direct gets, with the read after write opt-in, utilizes?

The redirect seems interesting, but not sure its practical. For one, we could just wait til we feel we are caught up from a replica standpoint (normally this will be faster than a redirect IMO).

MauriceVanVeen · 2025-06-16T15:38:19Z

I like the opt-in, but why not just make it such that the leader is the only one that listens on a non-DQ subscriber that direct gets, with the read after write opt-in, utilizes?

Multiple reasons for this, one is that KV's design requires direct get to be used. Currently you can do a workaround where you disable this, but this is not intended (and doesn't guarantee fresh data either). @ripienaar might have more info about the design aspect as well.

Just the leader answering is the MsgGet we already have (although it's a different format under the JS API). But, then we haven't solved guaranteeing that an old leader doesn't send stale data. Which can happen during network partitions, or even during a quick publish and message get while leaders are changing.

For one, we could just wait til we feel we are caught up from a replica standpoint

That may be true, but hard to know how long to wait. And if we wait at all, wouldn't redirecting to the leader not just be faster/simpler?
Especially for cases where a follower is behind enough and is churning through backlog of catchup, or generally slow on applies.
Because mirrors can also answer DirectGet if enabled, we also need some way to have the guarantee there. So this proposal solves for both cases.

derekcollison · 2025-06-16T16:03:08Z

Adding a new opt in such that direct gets can be only to the leader with new API feels correct.

I understand what you are saying about a new leader who is behind, but I am not sure the redirect to the leader would help that either IMO. WDYT?

MauriceVanVeen · 2025-06-16T20:18:12Z

The idea is specifically to still have replicas be allowed to respond. That's (likely) why DirectGet also is a requirement for our KV, and disabling DirectGet is considered to be a work around.

Just to clarify, it is an old/outdated leader that can still respond as well as the new leader. So even if we'd only allow "the leader" to respond we can't make any guarantees still. Because an old leader would still be "allowed" to respond and send stale data back to the client.

And because we don't want reads to become heavy and go through Raft, the Nats-Min-Last-Sequence header was introduced. Exposing it in read responses and allowing to require the minimum on subsequent read requests is what makes us able to guarantee read-after-write and monotonic reads.

This PR changes two read paths, namely the MsgGet and DirectGet paths:

With MsgGet, if a leader is old/outdated, it knows to delay the error response to let the real leader respond to the client faster. If the error does reach the client, it can simply retry.
With DirectGet this error logic is also there, but there's no delay. Not sure if we support delaying non-API error responses?

For the user it would be simpler to only use this header and have consistent reads, than to also need to set a stream setting that's not considered to be KV-compatible. (Although arguably, having the user flip a bool in a read request for it to go through Raft would be easiest for them, but I think that's a tradeoff we don't want to make?)

Note that reads only get redirected to the leader if the follower or mirror knows it doesn't have the data. If it does have the data, it can simply respond knowing its answer is correct. So DirectGet is still effective most of the time, and helps spread the load.

derekcollison · 2025-06-16T23:40:48Z

The idea is specifically to still have replicas be allowed to respond. That's (likely) why DirectGet also is a requirement for our KV, and disabling DirectGet is considered to be a work around.

DirectGet has two parts, it does not encode the payload and we allow DQ semantics. The DQ semantics are for horizontal scalability with lots of concurrent GETS.

Just to clarify, it is an old/outdated leader that can still respond as well as the new leader. So even if we'd only allow "the leader" to respond we can't make any guarantees still. Because an old leader would still be "allowed" to respond and send stale data back to the client.

Meaning the new leader has not caught up its state? I am not following, if a new leader is elected that does not have all the data that the previous one did, we will sync followers to our new state no?

And because we don't want reads to become heavy and go through Raft, the Nats-Min-Last-Sequence header was introduced. Exposing it in read responses and allowing to require the minimum on subsequent read requests is what makes us able to guarantee read-after-write and monotonic reads.

I like the concept for sure, but if I had a single app (needed in your solution), I would solve read after write a much different way that did not depend on the external system (The NATS servers).

This PR changes two read paths, namely the MsgGet and DirectGet paths:

With MsgGet, if a leader is old/outdated, it knows to delay the error response to let the real leader respond to the client faster. If the error does reach the client, it can simply retry.

With DirectGet this error logic is also there, but there's no delay. Not sure if we support delaying non-API error responses?

We could if we believe there is a good reason to do so.

For the user it would be simpler to only use this header and have consistent reads, than to also need to set a stream setting that's not considered to be KV-compatible. (Although arguably, having the user flip a bool in a read request for it to go through Raft would be easiest for them, but I think that's a tradeoff we don't want to make?)

My suggestion does not turn off direct get, it simply changes it to have only one subscriber, the leader. So 100% KV compatible, what you lose is the horizontal scalability of replicas being able to answer also.

Note that reads only get redirected to the leader if the follower or mirror knows it doesn't have the data. If it does have the data, it can simply respond knowing its answer is correct. So DirectGet is still effective most of the time, and helps spread the load.

Like I said, I like the idea with the header of the last sequence number. Would need to see it fleshed out in API form of course.

I do think in most misses we should wait for a short period of time and recheck last, and if we are still behind we can forward to the leader.

MauriceVanVeen · 2025-06-17T07:53:07Z

if a new leader is elected that does not have all the data that the previous one did, we will sync followers to our new state no?

This is not possible (because it would mean data loss). For someone to become leader they need to have the most up-to-date Raft log. This does not mean that it has applied all entries yet though. During a leader change you'll see the old leader have an up-to-date log and have applied most of those entries. The new leader will also have an up-to-date log, but it will not be up-to-date on applies, so it first needs to wait with responding to read/write until it has quorum on ALL uncommitted entries in its log. That ensures the new leader is in a consistent state before allowing read/writes.

Meaning the new leader has not caught up its state?

The new leader always has sufficient data in its log, but it first needs to get quorum and apply them. Because the previous leader did not inform them yet that those entries could be applied.

One could now write new data to the new leader and get a PubAck, but the old leader can still respond! (At least until it's informed of the leader change, which can take longer depending on RTT and partitions)
This violates read-after-write if the state on the old leader was stale. If you're only reading and not writing, this would also violate monotonic reads.

My suggestion does not turn off direct get, it simply changes it to have only one subscriber, the leader.

Above is also why just having "the leader" respond does not work. Because multiple servers can think they are leader. (Even if for a short period, but the property can be violated within that period)

I like the concept for sure, but if I had a single app (needed in your solution), I would solve read after write a much different way that did not depend on the external system (The NATS servers).

Not sure how you can achieve that without involving the servers that store the state?
The server that's responding to a read needs to know it is allowed to respond with some data. That can be either because it successfully went through Raft, so it's ordered and a valid response at that time in the log. Or with such a header so it can check inline by itself without needing Raft and without needing to be leader per se.
Not sure if there are alternatives?

MauriceVanVeen · 2025-06-25T13:35:04Z

I like the idea with the header of the last sequence number. Would need to see it fleshed out in API form of course.

Have implemented this in the Go client almost fully. Halfway through I realised.. consumers also don't guarantee read-after-write!
This means you could do kv.Create and then a kv.Keys. You'd expect the just-created key is reflected in the returned keys list, but currently there are no guarantees.

The PR is updated now to:

Not do any direct/msg get read request redirecting. That's fully removed. Either you are okay with getting an error if you try to read on a follower/mirror that's just slightly behind to satisfy read-after-write, or you disable MirrorDirect or AllowDirect so only the leader can answer.
The direct/msg get JSON request bodies now contain MinLastSeq uint64 'json:"min_last_seq,omitempty"' as an option, this turned out to be easier to implement in clients than passed as a Nats-Min-Last-Sequence header.
Consumer create requests can now also specify MinLastSeq. The consumer will then wait before delivering any messages, until the stream has stored at least up to that last sequence. That satisfies the kv.Create-kv.Keys case as well.
The additional exposing of Nats-Min-Last-Sequence on the read request was redundant, because with message gets you already get the sequence, similarly when consuming messages. So those sequences can just be used, no need for an additional field.

In terms of API, it looks like this:

// Write
r, err := kv.Put(ctx, "key", []byte("value"))

// Read request
kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r))

// Watch/consumer
kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r))

By specifying the MinLastRevision (or MinLastSequence when using a stream normally), you can be sure your read request will be rejected by a follower if it can't be satisfied, or the follower will wait to deliver you messages from the consumer until it's up-to-date.

This satisfies read-after-write and monotonic reads when combining the write and read paths.

MauriceVanVeen · 2025-06-25T13:36:16Z

Here's the Go client implementation in a branch.
It has full support for both the old and new JetStream API's, as well as for KV. Have not yet touched the ObjectStore API, but that could get similar support to support read-after-write, etc.

neilalexander

Overall LGTM but some questions.

server/stream.go

neilalexander

LGTM!

Signed-off-by: Maurice van Veen <[email protected]>

…ect get Signed-off-by: Maurice van Veen <[email protected]>

Signed-off-by: Maurice van Veen <[email protected]>

Remove read-after-write/monotonic reads for 2.12. > Although the design (read-after-write/monotonic reads/linearizable) gives a lot of control and flexibility, we'll need to decide whether this adds a bunch of complexity which would be (too) hard to understand by an end-user, in which case we might need to revisit this. #7146 (comment) The current design is very flexible, but adds a ton of complexity that needs to be handled on a per-request basis. We've discussed whether this should instead be on a per-stream basis. Where on a replicated stream/KV/ObjectStore the reads are serializable by default, but one can opt-in to linearizable for that whole stream. That would mean client code doesn't need to change at all to get this guarantee, only the stream setting would need to be updated. We'll need to discuss this further, and likely introduce this early in the next 2.14 (skipping 2.13) release cycle. Relates to #6557 Relates to PRs: #6970, #7146 Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen requested a review from a team as a code owner June 13, 2025 10:27

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch 3 times, most recently from 7286c89 to 270af06 Compare June 15, 2025 07:20

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch from 270af06 to 300c5c5 Compare June 25, 2025 13:18

MauriceVanVeen changed the title ~~(2.12) Direct Get read-after-write & monotonic reads~~ (2.12) Read-after-write & monotonic reads Jun 25, 2025

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch from 300c5c5 to 663e294 Compare July 3, 2025 10:53

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch from 663e294 to da76ee2 Compare July 11, 2025 09:08

MauriceVanVeen marked this pull request as draft July 11, 2025 09:19

MauriceVanVeen mentioned this pull request Jul 11, 2025

ADR-53: JetStream Read-after-Write nats-io/nats-architecture-and-design#358

Draft

neilalexander reviewed Jul 17, 2025

View reviewed changes

server/stream.go Show resolved Hide resolved

server/stream.go Outdated Show resolved Hide resolved

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch 3 times, most recently from 74230a8 to 4a5aa79 Compare July 18, 2025 14:25

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch from 4a5aa79 to 2d11e56 Compare July 29, 2025 12:10

MauriceVanVeen marked this pull request as ready for review July 29, 2025 12:38

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch from 05bc7d7 to 4f61a7d Compare July 30, 2025 14:53

neilalexander approved these changes Jul 30, 2025

View reviewed changes

MauriceVanVeen added 4 commits July 31, 2025 17:00

(2.12) Direct/msg get read-after-write & monotonic reads

4de3ee9

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Consumer read-after-write

f2643bd

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Delay direct get min last seq error

8241a15

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Test read-after-write on mirrors

91c485d

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen added 3 commits July 31, 2025 17:00

(2.12) Redirect direct get to leader if outdated

2112454

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Direct no redirect if mirror & mirror followers respond to dir…

da0bfae

…ect get Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Test redirected direct get with permissions

dace607

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen force-pushed the maurice/direct-get-consistent-read branch from 4f61a7d to dace607 Compare July 31, 2025 15:00

neilalexander merged commit 46dec81 into main Jul 31, 2025
90 of 92 checks passed

neilalexander deleted the maurice/direct-get-consistent-read branch July 31, 2025 16:05

MauriceVanVeen mentioned this pull request Aug 4, 2025

(2.12) Linearizable Message Get request #7146

Closed

MauriceVanVeen mentioned this pull request Aug 13, 2025

(2.12) Remove Read-after-Write for now #7167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(2.12) Read-after-write & monotonic reads #6970

(2.12) Read-after-write & monotonic reads #6970

Uh oh!

MauriceVanVeen commented Jun 13, 2025 •

edited

Loading

Uh oh!

derekcollison commented Jun 16, 2025

Uh oh!

MauriceVanVeen commented Jun 16, 2025

Uh oh!

derekcollison commented Jun 16, 2025

Uh oh!

MauriceVanVeen commented Jun 16, 2025 •

edited

Loading

Uh oh!

derekcollison commented Jun 16, 2025

Uh oh!

MauriceVanVeen commented Jun 17, 2025

Uh oh!

MauriceVanVeen commented Jun 25, 2025 •

edited

Loading

Uh oh!

MauriceVanVeen commented Jun 25, 2025

Uh oh!

neilalexander left a comment

Uh oh!

Uh oh!

Uh oh!

neilalexander left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

(2.12) Read-after-write & monotonic reads #6970

(2.12) Read-after-write & monotonic reads #6970

Uh oh!

Conversation

MauriceVanVeen commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derekcollison commented Jun 16, 2025

Uh oh!

MauriceVanVeen commented Jun 16, 2025

Uh oh!

derekcollison commented Jun 16, 2025

Uh oh!

MauriceVanVeen commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

derekcollison commented Jun 16, 2025

Uh oh!

MauriceVanVeen commented Jun 17, 2025

Uh oh!

MauriceVanVeen commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MauriceVanVeen commented Jun 25, 2025

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MauriceVanVeen commented Jun 13, 2025 •

edited

Loading

MauriceVanVeen commented Jun 16, 2025 •

edited

Loading

MauriceVanVeen commented Jun 25, 2025 •

edited

Loading