Skip to content

Commit 2848047

Browse files
ADR-52: JetStream Read-after-Write
Signed-off-by: Maurice van Veen <[email protected]>
1 parent 488f364 commit 2848047

File tree

4 files changed

+155
-16
lines changed

4 files changed

+155
-16
lines changed

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
2020
|-----|----|-----------|
2121
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
2222
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
23+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
2324

2425
## Client
2526

@@ -53,6 +54,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
5354
|[ADR-47](adr/ADR-47.md)|client, spec, orbit|Request Many|
5455
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
5556
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
57+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
5658

5759
## Jetstream
5860

@@ -82,6 +84,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
8284
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
8385
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
8486
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
87+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
8588

8689
## Kv
8790

@@ -90,13 +93,15 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
9093
|[ADR-8](adr/ADR-8.md)|jetstream, client, kv, spec|JetStream based Key-Value Stores|
9194
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
9295
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
96+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
9397

9498
## Objectstore
9599

96100
|Index|Tags|Description|
97101
|-----|----|-----------|
98102
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
99103
|[ADR-20](adr/ADR-20.md)|jetstream, client, objectstore, spec|JetStream based Object Stores|
104+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
100105

101106
## Observability
102107

@@ -116,6 +121,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
116121
|Index|Tags|Description|
117122
|-----|----|-----------|
118123
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
124+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
119125

120126
## Security
121127

@@ -153,6 +159,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
153159
|[ADR-43](adr/ADR-43.md)|jetstream, client, server, 2.11|JetStream Per-Message TTL|
154160
|[ADR-44](adr/ADR-44.md)|jetstream, server, 2.11|Versioning for JetStream Assets|
155161
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
162+
|[ADR-52](adr/ADR-52.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|
156163

157164
## Spec
158165

adr/ADR-31.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,11 @@
77
| Status | Implemented |
88
| Tags | jetstream, client, server, 2.11 |
99

10-
| Revision | Date | Author | Info |
11-
|----------|------------|------------|------------------------------------------------|
12-
| 1 | 2022-08-08 | @tbeets | Initial design |
13-
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 |
10+
| Revision | Date | Author | Info | Refinement | Server Requirement |
11+
|----------|------------|-----------------|------------------------------------------------|------------|--------------------|
12+
| 1 | 2022-08-08 | @tbeets | Initial design | | |
13+
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 | | |
14+
| 3 | 2025-07-11 | @MauriceVanVeen | Update on Read-after-Write guarantee | ADR-52 | |
1415

1516
## Context and motivation
1617

@@ -41,14 +42,20 @@ clients. Also, read availability can be enhanced as mirrors may be available to
4142

4243
###### A note on read-after-write coherency
4344

44-
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` provides read-after-write coherency by routing requests to a
45-
stream's current peer leader (R>1) or single server (R=1). A client that publishes a message to stream (with ACK) is
46-
assured that a subsequent call to the Get API will return that message as the read will go a server that defines
47-
_most current_.
45+
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` as well as _Direct Get_ do NOT provide any read-after-write
46+
guarantees by default. The existing Get API only guarantees read-after-write if the underlying stream is not
47+
replicated (R=1).
4848

49-
In contrast, _Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers
50-
(that may not have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_
51-
the latest consensus writes from upstream.
49+
_Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers (that may not
50+
have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_ the latest
51+
consensus writes from upstream.
52+
53+
The Get API routes requests to a stream's current peer leader (R>1). A client that publishes multiple messages to a
54+
stream (with ACK) is assured that they will be properly ordered by sequence, regardless of which peer leader was active
55+
at that time. However, during and after leader elections, calls to the Get API could still be served by a server that
56+
still thinks it's leader even if a new leader was elected in the meantime (but it doesn't know yet).
57+
58+
Read-after-write guarantees can be opted into with [ADR-52](adr/ADR-52.md).
5259

5360
## Implementation
5461

@@ -60,7 +67,7 @@ the latest consensus writes from upstream.
6067
based on `max_msgs_per_subject`
6168

6269
> Allow Direct is set automatically based on the inferred use case of the stream. Maximum messages per subject is a
63-
tell-tale of a stream that is a KV bucket.
70+
> tell-tale of a stream that is a KV bucket.
6471
6572
### Direct Get API
6673

adr/ADR-52.md

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# JetStream Read-after-Write
2+
3+
| Metadata | Value |
4+
|----------|--------------------------------------------------------------|
5+
| Date | 2025-07-11 |
6+
| Author | @MauriceVanVeen |
7+
| Status | Proposed |
8+
| Tags | jetstream, kv, objectstore, server, client, refinement, 2.12 |
9+
| Updates | ADR-8, ADR-17, ADR-20, ADR-31, ADR-37 |
10+
11+
| Revision | Date | Author | Info |
12+
|----------|------------|-----------------|----------------|
13+
| 1 | 2025-07-11 | @MauriceVanVeen | Initial design |
14+
15+
## Problem Statement
16+
17+
JetStream does NOT support read-after-write or monotonic reads. This can be especially problematic when
18+
using [ADR-8 JetStream based Key-Value Stores](ADR-8.md), primarily but not limited to the use of _Direct Get_.
19+
20+
Specifically, we have no way to guarantee a write like `kv.Put` can be observed by a subsequent `kv.Get` or `kv.Watch`,
21+
especially when the KV/stream is replicated or mirrored.
22+
23+
## Context
24+
25+
The topic of immediate consistency within NATS JetStream can sometimes be a bit confusing. On our docs we claim we
26+
maintain immediate consistency (as opposed to eventual consistency) even in the face of failures. Which is true.. but,
27+
as with anything, it depends.
28+
29+
- **Monotonic writes**, all writes to a single stream (replicated or not) are monotonic. It's ordered regardless of
30+
publisher by the stream sequence.
31+
- **Monotonic reads**, if you're using consumers. All reads for a consumer (replicated or not) are monotonic. It's
32+
ordered by consumer delivery sequence. (Messages can be redelivered on failure, but this also depends on which
33+
settings are used)
34+
35+
Those paths are immediately consistent.. but they are not immediately consistent with respect to each other. This is no
36+
problem for publishers and consumers of a stream, because they observe all operations to be monotonic.
37+
But, if you use the KV abstraction for example, you're more often going to use single message gets through `kv.Get`.
38+
Since those rely on `DirectGet`, even followers can answer, which means we (by default) can't guarantee read-after-write
39+
or even monotonic reads. Such message GET requests get served randomly by all servers within the peer group (or even
40+
mirrors if enabled). Those obviously can't be made immediately consistent, since both replication and mirroring are
41+
async.
42+
43+
Also, when following up a `kv.Create` with `kv.Keys`, you might expect read-after-write such that the returned keys
44+
contains the key you've just written to. This also requires read-after-write.
45+
46+
## Design
47+
48+
Before sharing the proposed design, let's look at an alternative. Read-after-write could be achieved by having reads (on
49+
an opt-in basis) go through Raft replication first. This has several disadvantages:
50+
51+
- Reads will become significantly slower, due to requiring replication first.
52+
- Reads require quorum, due to replication, disallowing any reads when there's downtime or temporarily no leader.
53+
- Only the stream leader can answer reads, as it is the first one to know that it can answer the request. (Followers
54+
replicate asynchronously, so letting them answer would make the response take even longer to return.)
55+
- Mirrors can still answer `DirectGet` requests, the transparency of mirrors answering read requests will violate any
56+
read-after-write guarantees (as the client will not know). This would mean mirrors must not be enabled if this
57+
guarantee should be kept.
58+
- Read-after-write guarantees could temporarily be violated when scaling streams up or down.
59+
- This is not a compatible approach for consumers, meaning they could not have these guarantees based on this approach.
60+
61+
Although having reads be served through Raft does (mostly) offer a strong guarantee of read-after-write and monotonic
62+
reads, the disadvantages outway the advantages. Ideally, the solution has the following advantages:
63+
64+
- It's explicitly defined, either in configuration or in code.
65+
- Works for both replicated and non-replicated streams. (Scale up/down has no influence, and implementation is not
66+
replication-specific)
67+
- Incurs no slowdown, just as fast as reads that don't guarantee read-after-write (no prior replication required).
68+
- Let followers, and even mirrors, answer read requests as long as they can make the guarantee.
69+
- Let followers, and mirrors, inform the client when they can't make the guarantee. The guarantee is always kept, but
70+
an error is returned that can be retried (to get a successful read). This can be tuned by disabling reads on mirrors
71+
or followers.
72+
73+
Now, on to the proposed design which has the above advantages.
74+
75+
The write and read paths remain eventually consistent as it is now. But one can opt-in for immediate consistency to
76+
guarantee read-after-write and monotonic reads, for both direct/msg read requests as well as consumers.
77+
78+
- **Read-after-write** is achieved because all writes through `js.Publish`, `kv.Put`, etc. return the sequence
79+
(inherently last sequence) of the stream. In `DirectGet` requests those observed last sequences can be used for read
80+
requests.
81+
- **Monotonic reads** is achieved by collecting the highest sequence seen in read requests and using that sequence for
82+
subsequent read requests.
83+
84+
This can be implemented with an additional `MinLastSeq` field in `JSApiMsgGetRequest` and `ConsumerConfig`.
85+
86+
- This ensures the server only replies with data if it can actually 100% guarantee immediate consistency. This is done
87+
by confirming the `LastSeq` it has for its local stream, is at least the `MinLastSeq` specified.
88+
- Side-note: although `MsgGet` is only answered by the leader, technically an old leader could still respond and serve
89+
stale reads. Although this shouldn't happen often in practice, until now we couldn't guarantee it. The error can be
90+
detected on the old leader, and it can delay the error response, allowing for the real leader to send the actual
91+
answer.
92+
- Followers/mirrors reject the read request if they can't satisfy the `MinLastSeq`. But can serve reads and share the
93+
load otherwise.
94+
- Consumers don't start delivering messages, until the `MinLastSeq` is reached. (To ensure `pending` counts are correct
95+
when following up `kv.Create` with `kv.Keys` for example)
96+
97+
In terms of API, it can look like this:
98+
99+
```go
100+
// Write
101+
r, err := kv.Put(ctx, "key", []byte("value"))
102+
103+
// Read request
104+
kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r))
105+
106+
// Watch/consumer
107+
kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r))
108+
```
109+
110+
By specifying the `MinLastRevision` (or `MinLastSequence` when using a stream normally), you can be sure your read
111+
request will be rejected by a follower if it can't be satisfied, or the follower will wait to deliver you messages from
112+
the consumer until it's up-to-date.
113+
114+
This satisfies read-after-write and monotonic reads when combining the write and read paths.
115+
116+
## Decision
117+
118+
[Maybe this was just an architectural decision...]
119+
120+
## Consequences
121+
122+
Since this is an opt-in on a read request or consumer create basis, this is not a breaking change. Depending on client
123+
implementation, this could be harder to implement. But given it's just another field in the `JSApiMsgGetRequest` and
124+
`ConsumerConfig`, each client should have no trouble supporting it.

adr/ADR-8.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
| 7 | 2025-01-23 | Add Max Age limit Markers, remove non direct gets | ADR-48 | 2.11.0 |
2323
| 8 | 2025-02-17 | Add Metadata | | 2.10.0 |
2424
| 9 | 2025-04-09 | Document max_age and duplicate_window requirements | | |
25+
| 10 | 2025-07-11 | Update on Read-after-Write guarantee | ADR-52 | |
2526

2627
## Context
2728

@@ -291,12 +292,12 @@ The features to support KV is in NATS Server 2.6.0.
291292

292293
#### Consistency Guarantees
293294

294-
We do not provide read-after-write consistency. Reads are performed directly to any replica, including out
295-
of date ones. If those replicas do not catch up multiple reads of the same key can give different values between
296-
reads. If the cluster is healthy and performing well most reads would result in consistent values, but this should not
295+
We do not provide read-after-write consistency by default. Reads are performed directly to any replica, including
296+
out-of-date ones. If those replicas do not catch up, multiple reads of the same key can give different values between
297+
reads. If the cluster is healthy and performing well, most reads would result in consistent values, but this should not
297298
be relied on to be true.
298299

299-
Historically we had read-after-write consistency, this has been deprecated and retained here for historical record only.
300+
Read-after-write guarantees can be opted into with [ADR-52](adr/ADR-52.md).
300301

301302
#### Buckets
302303

0 commit comments

Comments
 (0)