Skip to content

Commit 5b42cf8

Browse files
ADR-53: JetStream Read-after-Write
Signed-off-by: Maurice van Veen <[email protected]>
1 parent ba71212 commit 5b42cf8

File tree

5 files changed

+182
-21
lines changed

5 files changed

+182
-21
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
2121
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
2222
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
2323
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
24+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
2425

2526
## Client
2627

@@ -55,6 +56,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
5556
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
5657
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
5758
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
59+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
5860

5961
## Jetstream
6062

@@ -85,6 +87,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
8587
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
8688
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
8789
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
90+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
8891

8992
## Kv
9093

@@ -93,13 +96,15 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
9396
|[ADR-8](adr/ADR-8.md)|jetstream, client, kv, spec|JetStream based Key-Value Stores|
9497
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
9598
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
99+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
96100

97101
## Objectstore
98102

99103
|Index|Tags|Description|
100104
|-----|----|-----------|
101105
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
102106
|[ADR-20](adr/ADR-20.md)|jetstream, client, objectstore, spec|JetStream based Object Stores|
107+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
103108

104109
## Observability
105110

@@ -120,6 +125,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
120125
|-----|----|-----------|
121126
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
122127
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
128+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
123129

124130
## Security
125131

@@ -157,6 +163,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
157163
|[ADR-43](adr/ADR-43.md)|jetstream, client, server, 2.11|JetStream Per-Message TTL|
158164
|[ADR-44](adr/ADR-44.md)|jetstream, server, 2.11|Versioning for JetStream Assets|
159165
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
166+
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
160167

161168
## Spec
162169

adr-template.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,6 @@
2424

2525
[If this is a specification or actual design, write something here.]
2626

27-
## Decision
28-
29-
[Maybe this was just an architectural decision...]
30-
3127
## Consequences
3228

3329
[Any consequences of this design, such as breaking change or Vorpal Bunnies]

adr/ADR-31.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,12 @@
77
| Status | Implemented |
88
| Tags | jetstream, client, server, 2.11 |
99

10-
| Revision | Date | Author | Info |
11-
|----------|------------|------------|----------------------------------------------------------|
12-
| 1 | 2022-08-08 | @tbeets | Initial design |
13-
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 |
14-
| 3 | 2025-06-19 | @ripienaar | Support surpressing headers in replies using `NoHeaders` |
10+
| Revision | Date | Author | Info | Refinement | Server Requirement |
11+
|----------|------------|-----------------|----------------------------------------------------------|------------|--------------------|
12+
| 1 | 2022-08-08 | @tbeets | Initial design | | |
13+
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 | | |
14+
| 3 | 2025-06-19 | @ripienaar | Support suppressing headers in replies using `NoHeaders` | | |
15+
| 4 | 2025-07-11 | @MauriceVanVeen | Update on Read-after-Write guarantee | ADR-53 | |
1516

1617
## Context and motivation
1718

@@ -42,14 +43,20 @@ clients. Also, read availability can be enhanced as mirrors may be available to
4243

4344
###### A note on read-after-write coherency
4445

45-
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` provides read-after-write coherency by routing requests to a
46-
stream's current peer leader (R>1) or single server (R=1). A client that publishes a message to stream (with ACK) is
47-
assured that a subsequent call to the Get API will return that message as the read will go a server that defines
48-
_most current_.
46+
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` as well as _Direct Get_ do NOT provide any read-after-write
47+
guarantees by default. The existing Get API only guarantees read-after-write if the underlying stream is not
48+
replicated (R=1).
4949

50-
In contrast, _Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers
51-
(that may not have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_
52-
the latest consensus writes from upstream.
50+
_Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers (that may not
51+
have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_ the latest
52+
consensus writes from upstream.
53+
54+
The Get API routes requests to a stream's current peer leader (R>1). A client that publishes multiple messages to a
55+
stream (with ACK) is assured that they will be properly ordered by sequence, regardless of which peer leader was active
56+
at that time. However, during and after leader elections, calls to the Get API could still be served by a server that
57+
still thinks it's leader even if a new leader was elected in the meantime (but it doesn't know yet).
58+
59+
Read-after-write guarantees can be opted into with [ADR-53](adr/ADR-53.md).
5360

5461
## Implementation
5562

@@ -61,7 +68,7 @@ the latest consensus writes from upstream.
6168
based on `max_msgs_per_subject`
6269

6370
> Allow Direct is set automatically based on the inferred use case of the stream. Maximum messages per subject is a
64-
tell-tale of a stream that is a KV bucket.
71+
> tell-tale of a stream that is a KV bucket.
6572
6673
### Direct Get API
6774

adr/ADR-53.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# JetStream Read-after-Write
2+
3+
| Metadata | Value |
4+
|----------|--------------------------------------------------------------|
5+
| Date | 2025-07-11 |
6+
| Author | @MauriceVanVeen |
7+
| Status | Proposed |
8+
| Tags | jetstream, kv, objectstore, server, client, refinement, 2.12 |
9+
| Updates | ADR-8, ADR-17, ADR-20, ADR-31, ADR-37 |
10+
11+
| Revision | Date | Author | Info |
12+
|----------|------------|-----------------|----------------|
13+
| 1 | 2025-07-11 | @MauriceVanVeen | Initial design |
14+
15+
## Problem Statement
16+
17+
JetStream does NOT support read-after-write or monotonic reads. This can be especially problematic when
18+
using [ADR-8 JetStream based Key-Value Stores](ADR-8.md), primarily but not limited to the use of _Direct Get_.
19+
20+
Specifically, we have no way to guarantee a write like `kv.Put` can be observed by a subsequent `kv.Get` or `kv.Watch`,
21+
especially when the KV/stream is replicated or mirrored.
22+
23+
## Context
24+
25+
The topic of immediate consistency within NATS JetStream can sometimes be a bit confusing. On our docs we claim we
26+
maintain immediate consistency (as opposed to eventual consistency) even in the face of failures. Which is true, but
27+
as with anything, it depends.
28+
29+
- **Monotonic writes**, all writes to a single stream (replicated or not) are monotonic. It's ordered regardless of
30+
publisher by the stream sequence.
31+
- **Monotonic reads**, if you're using consumers. All reads for a consumer (replicated or not) are monotonic. It's
32+
ordered by consumer delivery sequence. (Messages can be redelivered on failure, but this also depends on which
33+
settings are used)
34+
35+
Those paths are immediately consistent, but they are not immediately consistent with respect to each other. This is no
36+
problem for publishers and consumers of a stream, because they observe all operations to be monotonic.
37+
But, if you use the KV abstraction for example, you're more often going to use single message gets through `kv.Get`.
38+
Since those rely on `DirectGet`, even followers can answer, which means we (by default) can't guarantee read-after-write
39+
or even monotonic reads. Such message GET requests get served randomly by all servers within the peer group (or even
40+
mirrors if enabled). Those obviously can't be made immediately consistent, since both replication and mirroring are
41+
async.
42+
43+
Also, when following up a `kv.Create` with `kv.Keys`, you might expect read-after-write such that the returned keys
44+
contains the key you've just written to. This also requires read-after-write.
45+
46+
## Design
47+
48+
Before sharing the proposed design, let's look at an alternative. Read-after-write could be achieved by having reads (on
49+
an opt-in basis) go through Raft replication first. This has several disadvantages:
50+
51+
- Reads will become significantly slower, due to requiring replication first.
52+
- Reads require quorum, due to replication, disallowing any reads when there's downtime or temporarily no leader.
53+
- Only the stream leader can answer reads, as it is the first one to know that it can answer the request. (Followers
54+
replicate asynchronously, so letting them answer would make the response take even longer to return.)
55+
- Mirrors can still answer `DirectGet` requests, the transparency of mirrors answering read requests will violate any
56+
read-after-write guarantees (as the client will not know). This would mean mirrors must not be enabled if this
57+
guarantee should be kept.
58+
- Read-after-write guarantees could temporarily be violated when scaling streams up or down.
59+
- This is not a compatible approach for consumers, meaning they could not have these guarantees based on this approach.
60+
It would require limiting consumer creation to R1 on the stream leader, which is not possible since the assignment is
61+
done by the meta leader that has no knowledge about the stream leader. A replicated consumer could violate the
62+
requirement if the consumer leader changes to an outdated follower in between.
63+
64+
Although having reads be served through Raft does (mostly) offer a strong guarantee of read-after-write and monotonic
65+
reads, the disadvantages outway the advantages. Ideally, the solution has the following advantages:
66+
67+
- It's explicitly defined, either in configuration or in code.
68+
- Works for both replicated and non-replicated streams. (Scale up/down has no influence, and implementation is not
69+
replication-specific)
70+
- Incurs no slowdown, just as fast as reads that don't guarantee read-after-write (no prior replication required).
71+
- Let followers, and even mirrors, answer read requests as long as they can make the guarantee.
72+
- Let followers, and mirrors, inform the client when they can't make the guarantee. The guarantee is always kept, but
73+
an error is returned that can be retried (to get a successful read). This can be tuned by disabling reads on mirrors
74+
or followers.
75+
76+
Now, on to the proposed design which has the above advantages.
77+
78+
The write and read paths remain eventually consistent as it is now. But one can opt-in for immediate consistency to
79+
guarantee read-after-write and monotonic reads, for both direct/msg read requests as well as consumers.
80+
81+
- **Read-after-write** is achieved because all writes through `js.Publish`, `kv.Put`, etc. return the sequence
82+
(inherently last sequence) of the stream. In `DirectGet` requests those observed last sequences can be used for read
83+
requests.
84+
- **Monotonic reads** is achieved by collecting the highest sequence seen in read requests and using that sequence for
85+
subsequent read requests.
86+
87+
This can be implemented with an additional `MinLastSeq` field in `JSApiMsgGetRequest` and `ConsumerConfig`.
88+
89+
- This ensures the server only replies with data if it can actually 100% guarantee immediate consistency. This is done
90+
by confirming the `LastSeq` it has for its local stream, is at least the `MinLastSeq` specified.
91+
- Side-note: although `MsgGet` is only answered by the leader, technically an old leader could still respond and serve
92+
stale reads. Although this shouldn't happen often in practice, until now we couldn't guarantee it. The error can be
93+
detected on the old leader, and it can delay the error response, allowing for the real leader to send the actual
94+
answer.
95+
- Followers that can't satisfy the `MinLastSeq` redirect the request to the leader for it to answer instead. This allows
96+
followers to still serve reads and share the load if they can, but if they can't, they defer to the leader to not
97+
require a client to retry on what would otherwise be an error.
98+
- Mirrors reject the read request if they can't satisfy the `MinLastSeq`. But can serve reads and share the load
99+
otherwise. Mirrors don't redirect requests to a leader, not even to the stream leader if the mirror is replicated.
100+
- Leaders/followers/mirrors don't reject a request immediately, but delay this error response to make sure clients don't
101+
spam these requests while allowing the underlying resources to try and become up-to-date enough in the meantime.
102+
- Rejected read requests have the error code returned as a header, e.g. `NATS/1.0 412 Min Last Sequence`.
103+
- Consumers don't start delivering messages until the `MinLastSeq` is reached, and don't reject the consumer creation.
104+
This allows consumers to be created successfully, even on outdated followers or mirrors, while waiting to ensure
105+
`pending` counts are correct when following up `kv.Create` with `kv.Keys` for example.
106+
107+
In terms of API, it can look like this:
108+
109+
```go
110+
// Write
111+
r, err := kv.Put(ctx, "key", []byte("value"))
112+
113+
// Read request
114+
kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r))
115+
116+
// Watch/consumer
117+
kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r))
118+
```
119+
120+
By specifying the `MinLastRevision` (or `MinLastSequence` when using a stream normally), you can be sure your read
121+
request will be rejected if it can't be satisfied, or the follower/mirror will wait to deliver you messages from
122+
the consumer until it's up-to-date. Followers redirect requests, that would otherwise error, to the leader to not
123+
require the client to retry in these cases.
124+
125+
This satisfies read-after-write and monotonic reads when combining the write and read paths, as well as when only
126+
preforming reads.
127+
128+
### A note about message deletion and purges
129+
130+
JetStream allows in-place deletion of messages through a "message delete" or "purge" request. These don't write new
131+
messages, and thus don't increase the last sequence. This means there are no read-after-write or monotonic reads after a
132+
message is deleted or purged. For example, after deleting a message or purging the stream, multiple requests can flip
133+
between returning the original messages and returning them as deleted.
134+
135+
Although a downside of this approach, it can only be supported when using a replicated stream that's not mirrored, which
136+
would be too restrictive. Whereas with the proposed approach, all followers and mirrors can contribute to providing the
137+
guarantee, regardless of replication or topology (which is valued more highly).
138+
139+
When deleting or purging messages is still desired AND you want to rely on read-after-write or monotonic reads, rollups
140+
can be used instead. The `Nats-Rollup` header can be used to purge messages where the subject equals, or purge the whole
141+
stream. Because a rollup message increases the last sequence, these guarantees can be relied upon again. However, the
142+
client application will need to interpret this rollup message as a "delete/purge" similar to how KV uses delete and
143+
purge markers. Therefore, the KV abstraction still has these guarantees since it places a new message for its
144+
`kv.Delete` and uses a rollup message for its `kv.Purge`.
145+
146+
## Consequences
147+
148+
Since this is an opt-in on a read request or consumer create basis, this is not a breaking change. Depending on client
149+
implementation, this could be harder to implement. But given it's just another field in the `JSApiMsgGetRequest` and
150+
`ConsumerConfig`, each client should have no trouble supporting it.

adr/ADR-8.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
| 7 | 2025-01-23 | Add Max Age limit Markers, remove non direct gets | ADR-48 | 2.11.0 |
2323
| 8 | 2025-02-17 | Add Metadata | | 2.10.0 |
2424
| 9 | 2025-04-09 | Document max_age and duplicate_window requirements | | |
25+
| 10 | 2025-07-11 | Update on Read-after-Write guarantee | ADR-53 | |
2526

2627
## Context
2728

@@ -291,12 +292,12 @@ The features to support KV is in NATS Server 2.6.0.
291292

292293
#### Consistency Guarantees
293294

294-
We do not provide read-after-write consistency. Reads are performed directly to any replica, including out
295-
of date ones. If those replicas do not catch up multiple reads of the same key can give different values between
296-
reads. If the cluster is healthy and performing well most reads would result in consistent values, but this should not
295+
We do not provide read-after-write consistency by default. Reads are performed directly to any replica, including
296+
out-of-date ones. If those replicas do not catch up, multiple reads of the same key can give different values between
297+
reads. If the cluster is healthy and performing well, most reads would result in consistent values, but this should not
297298
be relied on to be true.
298299

299-
Historically we had read-after-write consistency, this has been deprecated and retained here for historical record only.
300+
Read-after-write guarantees can be opted into with [ADR-53](adr/ADR-53.md).
300301

301302
#### Buckets
302303

0 commit comments

Comments
 (0)