|
| 1 | +# JetStream Read-after-Write |
| 2 | + |
| 3 | +| Metadata | Value | |
| 4 | +|----------|--------------------------------------------------------------| |
| 5 | +| Date | 2025-07-11 | |
| 6 | +| Author | @MauriceVanVeen | |
| 7 | +| Status | Proposed | |
| 8 | +| Tags | jetstream, kv, objectstore, server, client, refinement, 2.12 | |
| 9 | +| Updates | ADR-8, ADR-17, ADR-20, ADR-31, ADR-37 | |
| 10 | + |
| 11 | +| Revision | Date | Author | Info | |
| 12 | +|----------|------------|-----------------|----------------| |
| 13 | +| 1 | 2025-07-11 | @MauriceVanVeen | Initial design | |
| 14 | + |
| 15 | +## Problem Statement |
| 16 | + |
| 17 | +JetStream does NOT support read-after-write or monotonic reads. This can be especially problematic when |
| 18 | +using [ADR-8 JetStream based Key-Value Stores](ADR-8.md), primarily but not limited to the use of _Direct Get_. |
| 19 | + |
| 20 | +Specifically, we have no way to guarantee a write like `kv.Put` can be observed by a subsequent `kv.Get` or `kv.Watch`, |
| 21 | +especially when the KV/stream is replicated or mirrored. |
| 22 | + |
| 23 | +## Context |
| 24 | + |
| 25 | +The topic of immediate consistency within NATS JetStream can sometimes be a bit confusing. On our docs we claim we |
| 26 | +maintain immediate consistency (as opposed to eventual consistency) even in the face of failures. Which is true, but |
| 27 | +as with anything, it depends. |
| 28 | + |
| 29 | +- **Monotonic writes**, all writes to a single stream (replicated or not) are monotonic. It's ordered regardless of |
| 30 | + publisher by the stream sequence. |
| 31 | +- **Monotonic reads**, if you're using consumers. All reads for a consumer (replicated or not) are monotonic. It's |
| 32 | + ordered by consumer delivery sequence. (Messages can be redelivered on failure, but this also depends on which |
| 33 | + settings are used) |
| 34 | + |
| 35 | +Those paths are immediately consistent, but they are not immediately consistent with respect to each other. This is no |
| 36 | +problem for publishers and consumers of a stream, because they observe all operations to be monotonic. |
| 37 | +But, if you use the KV abstraction for example, you're more often going to use single message gets through `kv.Get`. |
| 38 | +Since those rely on `DirectGet`, even followers can answer, which means we (by default) can't guarantee read-after-write |
| 39 | +or even monotonic reads. Such message GET requests get served randomly by all servers within the peer group (or even |
| 40 | +mirrors if enabled). Those obviously can't be made immediately consistent, since both replication and mirroring are |
| 41 | +async. |
| 42 | + |
| 43 | +Also, when following up a `kv.Create` with `kv.Keys`, you might expect read-after-write such that the returned keys |
| 44 | +contains the key you've just written to. This also requires read-after-write. |
| 45 | + |
| 46 | +## Design |
| 47 | + |
| 48 | +Before sharing the proposed design, let's look at an alternative. Read-after-write could be achieved by having reads (on |
| 49 | +an opt-in basis) go through Raft replication first. This has several disadvantages: |
| 50 | + |
| 51 | +- Reads will become significantly slower, due to requiring replication first. |
| 52 | +- Reads require quorum, due to replication, disallowing any reads when there's downtime or temporarily no leader. |
| 53 | +- Only the stream leader can answer reads, as it is the first one to know that it can answer the request. (Followers |
| 54 | + replicate asynchronously, so letting them answer would make the response take even longer to return.) |
| 55 | +- Mirrors can still answer `DirectGet` requests, the transparency of mirrors answering read requests will violate any |
| 56 | + read-after-write guarantees (as the client will not know). This would mean mirrors must not be enabled if this |
| 57 | + guarantee should be kept. |
| 58 | +- Read-after-write guarantees could temporarily be violated when scaling streams up or down. |
| 59 | +- This is not a compatible approach for consumers, meaning they could not have these guarantees based on this approach. |
| 60 | + It would require limiting consumer creation to R1 on the stream leader, which is not possible since the assignment is |
| 61 | + done by the meta leader that has no knowledge about the stream leader. A replicated consumer could violate the |
| 62 | + requirement if the consumer leader changes to an outdated follower in between. |
| 63 | + |
| 64 | +Although having reads be served through Raft does (mostly) offer a strong guarantee of read-after-write and monotonic |
| 65 | +reads, the disadvantages outway the advantages. Ideally, the solution has the following advantages: |
| 66 | + |
| 67 | +- It's explicitly defined, either in configuration or in code. |
| 68 | +- Works for both replicated and non-replicated streams. (Scale up/down has no influence, and implementation is not |
| 69 | + replication-specific) |
| 70 | +- Incurs no slowdown, just as fast as reads that don't guarantee read-after-write (no prior replication required). |
| 71 | +- Let followers, and even mirrors, answer read requests as long as they can make the guarantee. |
| 72 | +- Let followers, and mirrors, inform the client when they can't make the guarantee. The guarantee is always kept, but |
| 73 | + an error is returned that can be retried (to get a successful read). This can be tuned by disabling reads on mirrors |
| 74 | + or followers. |
| 75 | + |
| 76 | +Now, on to the proposed design which has the above advantages. |
| 77 | + |
| 78 | +The write and read paths remain eventually consistent as it is now. But one can opt-in for immediate consistency to |
| 79 | +guarantee read-after-write and monotonic reads, for both direct/msg read requests as well as consumers. |
| 80 | + |
| 81 | +- **Read-after-write** is achieved because all writes through `js.Publish`, `kv.Put`, etc. return the sequence |
| 82 | + (inherently last sequence) of the stream. In `DirectGet` requests those observed last sequences can be used for read |
| 83 | + requests. |
| 84 | +- **Monotonic reads** is achieved by collecting the highest sequence seen in read requests and using that sequence for |
| 85 | + subsequent read requests. |
| 86 | + |
| 87 | +This can be implemented with an additional `MinLastSeq` field in `JSApiMsgGetRequest` and `ConsumerConfig`. |
| 88 | + |
| 89 | +- This ensures the server only replies with data if it can actually 100% guarantee immediate consistency. This is done |
| 90 | + by confirming the `LastSeq` it has for its local stream, is at least the `MinLastSeq` specified. |
| 91 | +- Side-note: although `MsgGet` is only answered by the leader, technically an old leader could still respond and serve |
| 92 | + stale reads. Although this shouldn't happen often in practice, until now we couldn't guarantee it. The error can be |
| 93 | + detected on the old leader, and it can delay the error response, allowing for the real leader to send the actual |
| 94 | + answer. |
| 95 | +- Followers that can't satisfy the `MinLastSeq` redirect the request to the leader for it to answer instead. This allows |
| 96 | + followers to still serve reads and share the load if they can, but if they can't, they defer to the leader to not |
| 97 | + require a client to retry on what would otherwise be an error. |
| 98 | +- Mirrors reject the read request if they can't satisfy the `MinLastSeq`. But can serve reads and share the load |
| 99 | + otherwise. Mirrors don't redirect requests to a leader, not even to the stream leader if the mirror is replicated. |
| 100 | +- Leaders/followers/mirrors don't reject a request immediately, but delay this error response to make sure clients don't |
| 101 | + spam these requests while allowing the underlying resources to try and become up-to-date enough in the meantime. |
| 102 | +- Rejected read requests have the error code returned as a header, e.g. `NATS/1.0 412 Min Last Sequence`. |
| 103 | +- Consumers don't start delivering messages until the `MinLastSeq` is reached, and don't reject the consumer creation. |
| 104 | + This allows consumers to be created successfully, even on outdated followers or mirrors, while waiting to ensure |
| 105 | + `pending` counts are correct when following up `kv.Create` with `kv.Keys` for example. |
| 106 | + |
| 107 | +In terms of API, it can look like this: |
| 108 | + |
| 109 | +```go |
| 110 | +// Write |
| 111 | +r, err := kv.Put(ctx, "key", []byte("value")) |
| 112 | + |
| 113 | +// Read request |
| 114 | +kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r)) |
| 115 | + |
| 116 | +// Watch/consumer |
| 117 | +kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r)) |
| 118 | +``` |
| 119 | + |
| 120 | +By specifying the `MinLastRevision` (or `MinLastSequence` when using a stream normally), you can be sure your read |
| 121 | +request will be rejected if it can't be satisfied, or the follower/mirror will wait to deliver you messages from |
| 122 | +the consumer until it's up-to-date. Followers redirect requests, that would otherwise error, to the leader to not |
| 123 | +require the client to retry in these cases. |
| 124 | + |
| 125 | +This satisfies read-after-write and monotonic reads when combining the write and read paths, as well as when only |
| 126 | +preforming reads. |
| 127 | + |
| 128 | +### A note about message deletion and purges |
| 129 | + |
| 130 | +JetStream allows in-place deletion of messages through a "message delete" or "purge" request. These don't write new |
| 131 | +messages, and thus don't increase the last sequence. This means there are no read-after-write or monotonic reads after a |
| 132 | +message is deleted or purged. For example, after deleting a message or purging the stream, multiple requests can flip |
| 133 | +between returning the original messages and returning them as deleted. |
| 134 | + |
| 135 | +Although a downside of this approach, it can only be supported when using a replicated stream that's not mirrored, which |
| 136 | +would be too restrictive. Whereas with the proposed approach, all followers and mirrors can contribute to providing the |
| 137 | +guarantee, regardless of replication or topology (which is valued more highly). |
| 138 | + |
| 139 | +When deleting or purging messages is still desired AND you want to rely on read-after-write or monotonic reads, rollups |
| 140 | +can be used instead. The `Nats-Rollup` header can be used to purge messages where the subject equals, or purge the whole |
| 141 | +stream. Because a rollup message increases the last sequence, these guarantees can be relied upon again. However, the |
| 142 | +client application will need to interpret this rollup message as a "delete/purge" similar to how KV uses delete and |
| 143 | +purge markers. Therefore, the KV abstraction still has these guarantees since it places a new message for its |
| 144 | +`kv.Delete` and uses a rollup message for its `kv.Purge`. |
| 145 | + |
| 146 | +## Consequences |
| 147 | + |
| 148 | +Since this is an opt-in on a read request or consumer create basis, this is not a breaking change. Depending on client |
| 149 | +implementation, this could be harder to implement. But given it's just another field in the `JSApiMsgGetRequest` and |
| 150 | +`ConsumerConfig`, each client should have no trouble supporting it. |
0 commit comments