Skip to content
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Client

Expand Down Expand Up @@ -56,6 +57,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Jetstream

Expand Down Expand Up @@ -87,6 +89,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
|[ADR-49](adr/ADR-49.md)|jetstream, spec, 2.12|JetStream Distributed Counter CRDT|
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Kv

Expand All @@ -95,13 +98,15 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
|[ADR-8](adr/ADR-8.md)|jetstream, client, kv, spec|JetStream based Key-Value Stores|
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Objectstore

|Index|Tags|Description|
|-----|----|-----------|
|[ADR-19](adr/ADR-19.md)|jetstream, client, kv, objectstore|API prefixes for materialized JetStream views|
|[ADR-20](adr/ADR-20.md)|jetstream, client, objectstore, spec|JetStream based Object Stores|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Observability

Expand All @@ -122,6 +127,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
|-----|----|-----------|
|[ADR-48](adr/ADR-48.md)|jetstream, client, kv, refinement, 2.11|TTL Support for Key-Value Buckets (updating [ADR-8](adr/ADR-8.md))|
|[ADR-52](adr/ADR-52.md)|jetstream, client, refinement, 2.12|No Headers support for Direct Get (updating [ADR-31](adr/ADR-31.md))|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Security

Expand Down Expand Up @@ -160,6 +166,7 @@ This repository captures Architecture, Design Specifications and Feature Guidanc
|[ADR-43](adr/ADR-43.md)|jetstream, client, server, 2.11|JetStream Per-Message TTL|
|[ADR-44](adr/ADR-44.md)|jetstream, server, 2.11|Versioning for JetStream Assets|
|[ADR-50](adr/ADR-50.md)|jetstream, server, client, 2.12|JetStream Batch Publishing|
|[ADR-53](adr/ADR-53.md)|jetstream, kv, objectstore, server, client, refinement, 2.12|JetStream Read-after-Write (updating [ADR-8](adr/ADR-8.md), [ADR-17](adr/ADR-17.md), [ADR-20](adr/ADR-20.md), [ADR-31](adr/ADR-31.md), [ADR-37](adr/ADR-37.md))|

## Spec

Expand Down
4 changes: 0 additions & 4 deletions adr-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,6 @@

[If this is a specification or actual design, write something here.]

## Decision

[Maybe this was just an architectural decision...]

## Consequences

[Any consequences of this design, such as breaking change or Vorpal Bunnies]
33 changes: 20 additions & 13 deletions adr/ADR-31.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@
| Status | Implemented |
| Tags | jetstream, client, server, 2.11 |

| Revision | Date | Author | Info |
|----------|------------|------------|----------------------------------------------------------|
| 1 | 2022-08-08 | @tbeets | Initial design |
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 |
| 3 | 2025-06-19 | @ripienaar | Support surpressing headers in replies using `NoHeaders` |
| Revision | Date | Author | Info | Refinement | Server Requirement |
|----------|------------|-----------------|----------------------------------------------------------|------------|--------------------|
| 1 | 2022-08-08 | @tbeets | Initial design | | |
| 2 | 2024-03-06 | @ripienaar | Adds Multi and Batch behaviors for Server 2.11 | | |
| 3 | 2025-06-19 | @ripienaar | Support suppressing headers in replies using `NoHeaders` | | |
| 4 | 2025-07-11 | @MauriceVanVeen | Update on Read-after-Write guarantee | ADR-53 | |

## Context and motivation

Expand Down Expand Up @@ -42,14 +43,20 @@ clients. Also, read availability can be enhanced as mirrors may be available to

###### A note on read-after-write coherency

The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` provides read-after-write coherency by routing requests to a
stream's current peer leader (R>1) or single server (R=1). A client that publishes a message to stream (with ACK) is
assured that a subsequent call to the Get API will return that message as the read will go a server that defines
_most current_.
The existing Get API `$JS.API.STREAM.MSG.GET.<stream>` as well as _Direct Get_ do NOT provide any read-after-write
guarantees by default. The existing Get API only guarantees read-after-write if the underlying stream is not
replicated (R=1).

In contrast, _Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers
(that may not have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_
the latest consensus writes from upstream.
_Direct Get_ does not assure read-after-write coherency as responders may be non-leader stream servers (that may not
have yet applied the latest consensus writes) or MIRROR downstream servers that have not yet _consumed_ the latest
consensus writes from upstream.

The Get API routes requests to a stream's current peer leader (R>1). A client that publishes multiple messages to a
stream (with ACK) is assured that they will be properly ordered by sequence, regardless of which peer leader was active
at that time. However, during and after leader elections, calls to the Get API could still be served by a server that
still thinks it's leader even if a new leader was elected in the meantime (but it doesn't know yet).

Read-after-write guarantees can be opted into with [ADR-53](adr/ADR-53.md).

## Implementation

Expand All @@ -61,7 +68,7 @@ the latest consensus writes from upstream.
based on `max_msgs_per_subject`

> Allow Direct is set automatically based on the inferred use case of the stream. Maximum messages per subject is a
tell-tale of a stream that is a KV bucket.
> tell-tale of a stream that is a KV bucket.

### Direct Get API

Expand Down
151 changes: 151 additions & 0 deletions adr/ADR-53.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# JetStream Read-after-Write

| Metadata | Value |
|----------|--------------------------------------------------------------|
| Date | 2025-07-11 |
| Author | @MauriceVanVeen |
| Status | Proposed |
| Tags | jetstream, kv, objectstore, server, client, refinement, 2.12 |
| Updates | ADR-8, ADR-17, ADR-20, ADR-31, ADR-37 |

| Revision | Date | Author | Info |
|----------|------------|-----------------|----------------|
| 1 | 2025-07-11 | @MauriceVanVeen | Initial design |

## Problem Statement

JetStream does NOT support read-after-write or monotonic reads. This can be especially problematic when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say before 2.12?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, dcb9074:

JetStream does NOT support read-after-write or monotonic reads (prior to server version 2.12).

using [ADR-8 JetStream based Key-Value Stores](ADR-8.md), primarily but not limited to the use of _Direct Get_.

Specifically, we have no way to guarantee a write like `kv.Put` can be observed by a subsequent `kv.Get` or `kv.Watch`,
especially when the KV/stream is replicated or mirrored.

## Context

The topic of immediate consistency within NATS JetStream can sometimes be a bit confusing. On our docs we claim we
maintain immediate consistency (as opposed to eventual consistency) even in the face of failures. Which is true, but
as with anything, it depends.

- **Monotonic writes**, all writes to a single stream (replicated or not) are monotonic. It's ordered regardless of
publisher by the stream sequence.
- **Monotonic reads**, if you're using consumers. All reads for a consumer (replicated or not) are monotonic. It's
ordered by consumer delivery sequence. (Messages can be redelivered on failure, but this also depends on which
settings are used)

Those paths are immediately consistent, but they are not immediately consistent with respect to each other. This is no
problem for publishers and consumers of a stream, because they observe all operations to be monotonic.
But, if you use the KV abstraction for example, you're more often going to use single message gets through `kv.Get`.
Since those rely on `DirectGet`, even followers can answer, which means we (by default) can't guarantee read-after-write
or even monotonic reads. Such message GET requests get served randomly by all servers within the peer group (or even
mirrors if enabled). Those obviously can't be made immediately consistent, since both replication and mirroring are
async.

Also, when following up a `kv.Create` with `kv.Keys`, you might expect read-after-write such that the returned keys
contains the key you've just written to. This also requires read-after-write.

## Design

Before sharing the proposed design, let's look at an alternative. Read-after-write could be achieved by having reads (on
an opt-in basis) go through Raft replication first. This has several disadvantages:

- Reads will become significantly slower, due to requiring replication first.
- Reads require quorum, due to replication, disallowing any reads when there's downtime or temporarily no leader.
- Only the stream leader can answer reads, as it is the first one to know that it can answer the request. (Followers
replicate asynchronously, so letting them answer would make the response take even longer to return.)
- Mirrors can still answer `DirectGet` requests, the transparency of mirrors answering read requests will violate any
read-after-write guarantees (as the client will not know). This would mean mirrors must not be enabled if this
guarantee should be kept.
- Read-after-write guarantees could temporarily be violated when scaling streams up or down.
- This is not a compatible approach for consumers, meaning they could not have these guarantees based on this approach.
It would require limiting consumer creation to R1 on the stream leader, which is not possible since the assignment is
done by the meta leader that has no knowledge about the stream leader. A replicated consumer could violate the
requirement if the consumer leader changes to an outdated follower in between. And would not work at all when creating
a consumer on a mirrored stream.

Although having reads be served through Raft does (mostly) offer a strong guarantee of read-after-write and monotonic
reads, the disadvantages outway the advantages. Ideally, the solution has the following advantages:

- It's explicitly defined, either in configuration or in code.
- Works for both replicated and non-replicated streams. (Scale up/down has no influence, and implementation is not
replication-specific)
- Incurs no slowdown, just as fast as reads that don't guarantee read-after-write (no prior replication required).
- Let followers, and even mirrors, answer read requests as long as they can make the guarantee.
- Let followers, and mirrors, inform the client when they can't make the guarantee. The guarantee is always kept, but
an error is returned that can be retried (to get a successful read). This can be tuned by disabling reads on mirrors
or followers.

Now, on to the proposed design which has the above advantages.

The write and read paths remain eventually consistent as it is now. But one can opt-in for immediate consistency to
guarantee read-after-write and monotonic reads, for both direct/msg read requests as well as consumers.

- **Read-after-write** is achieved because all writes through `js.Publish`, `kv.Put`, etc. return the sequence
(inherently last sequence) of the stream. In `DirectGet` requests those observed last sequences can be used for read
requests.
- **Monotonic reads** is achieved by collecting the highest sequence seen in read requests and using that sequence for
subsequent read requests.

This can be implemented with an additional `MinLastSeq` field in `JSApiMsgGetRequest` and `ConsumerConfig`.

- This ensures the server only replies with data if it can actually 100% guarantee immediate consistency. This is done
by confirming the `LastSeq` it has for its local stream, is at least the `MinLastSeq` specified.
- Side-note: although `MsgGet` is only answered by the leader, technically an old leader could still respond and serve
stale reads. Although this shouldn't happen often in practice, until now we couldn't guarantee it. The error can be
detected on the old leader, and it can delay the error response, allowing for the real leader to send the actual
answer.
- Followers that can't satisfy the `MinLastSeq` redirect the request to the leader for it to answer instead. This allows
followers to still serve reads and share the load if they can, but if they can't, they defer to the leader to not
require a client to retry on what would otherwise be an error.
- Mirrors reject the read request if they can't satisfy the `MinLastSeq`. But can serve reads and share the load
otherwise. Mirrors don't redirect requests to a leader, not even to the stream leader if the mirror is replicated.
- Leaders/followers/mirrors don't reject a request immediately, but delay this error response to make sure clients don't
spam these requests while allowing the underlying resources to try and become up-to-date enough in the meantime.
- Rejected read requests have the error code returned as a header, e.g. `NATS/1.0 412 Min Last Sequence`.
- Consumers don't start delivering messages until the `MinLastSeq` is reached, and don't reject the consumer creation.
This allows consumers to be created successfully, even on outdated followers or mirrors, while waiting to ensure
`pending` counts are correct when following up `kv.Create` with `kv.Keys` for example.

In terms of API, it can look like this:

```go
// Write
r, err := kv.Put(ctx, "key", []byte("value"))

// Read request
kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r))

// Watch/consumer
kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r))
```

By specifying the `MinLastRevision` (or `MinLastSequence` when using a stream normally), you can be sure your read
request will be rejected if it can't be satisfied, or the follower/mirror will wait to deliver you messages from
the consumer until it's up-to-date. Followers redirect requests, that would otherwise error, to the leader to not
require the client to retry in these cases.

This satisfies read-after-write and monotonic reads when combining the write and read paths, as well as when only
preforming reads.

### A note about message deletion and purges

JetStream allows in-place deletion of messages through a "message delete" or "purge" request. These don't write new
messages, and thus don't increase the last sequence. This means there are no read-after-write or monotonic reads after a
message is deleted or purged. For example, after deleting a message or purging the stream, multiple requests can flip
between returning the original messages and returning them as deleted.

Although a downside of this approach, it can only be supported when using a replicated stream that's not mirrored, which
would be too restrictive. Whereas with the proposed approach, all followers and mirrors can contribute to providing the
guarantee, regardless of replication or topology (which is valued more highly).

When deleting or purging messages is still desired AND you want to rely on read-after-write or monotonic reads, rollups
can be used instead. The `Nats-Rollup` header can be used to purge messages where the subject equals, or purge the whole
stream. Because a rollup message increases the last sequence, these guarantees can be relied upon again. However, the
client application will need to interpret this rollup message as a "delete/purge" similar to how KV uses delete and
purge markers. Therefore, the KV abstraction still has these guarantees since it places a new message for its
`kv.Delete` and uses a rollup message for its `kv.Purge`.

## Consequences

Since this is an opt-in on a read request or consumer create basis, this is not a breaking change. Depending on client
implementation, this could be harder to implement. But given it's just another field in the `JSApiMsgGetRequest` and
`ConsumerConfig`, each client should have no trouble supporting it.
9 changes: 5 additions & 4 deletions adr/ADR-8.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
| 7 | 2025-01-23 | Add Max Age limit Markers, remove non direct gets | ADR-48 | 2.11.0 |
| 8 | 2025-02-17 | Add Metadata | | 2.10.0 |
| 9 | 2025-04-09 | Document max_age and duplicate_window requirements | | |
| 10 | 2025-07-11 | Update on Read-after-Write guarantee | ADR-53 | |

## Context

Expand Down Expand Up @@ -291,12 +292,12 @@ The features to support KV is in NATS Server 2.6.0.

#### Consistency Guarantees

We do not provide read-after-write consistency. Reads are performed directly to any replica, including out
of date ones. If those replicas do not catch up multiple reads of the same key can give different values between
reads. If the cluster is healthy and performing well most reads would result in consistent values, but this should not
We do not provide read-after-write consistency by default. Reads are performed directly to any replica, including
out-of-date ones. If those replicas do not catch up, multiple reads of the same key can give different values between
reads. If the cluster is healthy and performing well, most reads would result in consistent values, but this should not
be relied on to be true.

Historically we had read-after-write consistency, this has been deprecated and retained here for historical record only.
Read-after-write guarantees can be opted into with [ADR-53](adr/ADR-53.md).

#### Buckets

Expand Down