-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
(2.12) Read-after-write & monotonic reads #6970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7286c89
to
270af06
Compare
Hard problem for sure.. I like the opt-in, but why not just make it such that the leader is the only one that listens on a non-DQ subscriber that direct gets, with the read after write opt-in, utilizes? The redirect seems interesting, but not sure its practical. For one, we could just wait til we feel we are caught up from a replica standpoint (normally this will be faster than a redirect IMO). |
Multiple reasons for this, one is that KV's design requires direct get to be used. Currently you can do a workaround where you disable this, but this is not intended (and doesn't guarantee fresh data either). @ripienaar might have more info about the design aspect as well. Just the leader answering is the MsgGet we already have (although it's a different format under the JS API). But, then we haven't solved guaranteeing that an old leader doesn't send stale data. Which can happen during network partitions, or even during a quick publish and message get while leaders are changing.
That may be true, but hard to know how long to wait. And if we wait at all, wouldn't redirecting to the leader not just be faster/simpler? |
Adding a new opt in such that direct gets can be only to the leader with new API feels correct. I understand what you are saying about a new leader who is behind, but I am not sure the redirect to the leader would help that either IMO. WDYT? |
The idea is specifically to still have replicas be allowed to respond. That's (likely) why DirectGet also is a requirement for our KV, and disabling DirectGet is considered to be a work around. Just to clarify, it is an old/outdated leader that can still respond as well as the new leader. So even if we'd only allow "the leader" to respond we can't make any guarantees still. Because an old leader would still be "allowed" to respond and send stale data back to the client. And because we don't want reads to become heavy and go through Raft, the This PR changes two read paths, namely the MsgGet and DirectGet paths:
For the user it would be simpler to only use this header and have consistent reads, than to also need to set a stream setting that's not considered to be KV-compatible. (Although arguably, having the user flip a bool in a read request for it to go through Raft would be easiest for them, but I think that's a tradeoff we don't want to make?) Note that reads only get redirected to the leader if the follower or mirror knows it doesn't have the data. If it does have the data, it can simply respond knowing its answer is correct. So DirectGet is still effective most of the time, and helps spread the load. |
DirectGet has two parts, it does not encode the payload and we allow DQ semantics. The DQ semantics are for horizontal scalability with lots of concurrent GETS.
Meaning the new leader has not caught up its state? I am not following, if a new leader is elected that does not have all the data that the previous one did, we will sync followers to our new state no?
I like the concept for sure, but if I had a single app (needed in your solution), I would solve read after write a much different way that did not depend on the external system (The NATS servers).
We could if we believe there is a good reason to do so.
My suggestion does not turn off direct get, it simply changes it to have only one subscriber, the leader. So 100% KV compatible, what you lose is the horizontal scalability of replicas being able to answer also.
Like I said, I like the idea with the header of the last sequence number. Would need to see it fleshed out in API form of course. I do think in most misses we should wait for a short period of time and recheck last, and if we are still behind we can forward to the leader. |
This is not possible (because it would mean data loss). For someone to become leader they need to have the most up-to-date Raft log. This does not mean that it has applied all entries yet though. During a leader change you'll see the old leader have an up-to-date log and have applied most of those entries. The new leader will also have an up-to-date log, but it will not be up-to-date on applies, so it first needs to wait with responding to read/write until it has quorum on ALL uncommitted entries in its log. That ensures the new leader is in a consistent state before allowing read/writes.
The new leader always has sufficient data in its log, but it first needs to get quorum and apply them. Because the previous leader did not inform them yet that those entries could be applied. One could now write new data to the new leader and get a PubAck, but the old leader can still respond! (At least until it's informed of the leader change, which can take longer depending on RTT and partitions)
Above is also why just having "the leader" respond does not work. Because multiple servers can think they are leader. (Even if for a short period, but the property can be violated within that period)
Not sure how you can achieve that without involving the servers that store the state? |
270af06
to
300c5c5
Compare
Have implemented this in the Go client almost fully. Halfway through I realised.. consumers also don't guarantee read-after-write! The PR is updated now to:
In terms of API, it looks like this: // Write
r, err := kv.Put(ctx, "key", []byte("value"))
// Read request
kve, err := kv.Get(ctx, "key", jetstream.MinLastRevision(r))
// Watch/consumer
kl, err := kv.ListKeys(ctx, jetstream.MinLastRevision(r)) By specifying the This satisfies read-after-write and monotonic reads when combining the write and read paths. |
Here's the Go client implementation in a branch. |
300c5c5
to
663e294
Compare
663e294
to
da76ee2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM but some questions.
74230a8
to
4a5aa79
Compare
4a5aa79
to
2d11e56
Compare
05bc7d7
to
4f61a7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
…ect get Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
4f61a7d
to
dace607
Compare
Remove read-after-write/monotonic reads for 2.12. > Although the design (read-after-write/monotonic reads/linearizable) gives a lot of control and flexibility, we'll need to decide whether this adds a bunch of complexity which would be (too) hard to understand by an end-user, in which case we might need to revisit this. #7146 (comment) The current design is very flexible, but adds a ton of complexity that needs to be handled on a per-request basis. We've discussed whether this should instead be on a per-stream basis. Where on a replicated stream/KV/ObjectStore the reads are serializable by default, but one can opt-in to linearizable for that whole stream. That would mean client code doesn't need to change at all to get this guarantee, only the stream setting would need to be updated. We'll need to discuss this further, and likely introduce this early in the next 2.14 (skipping 2.13) release cycle. Relates to #6557 Relates to PRs: #6970, #7146 Signed-off-by: Maurice van Veen <[email protected]>
See ADR: JetStream Read-after-Write for context, problem statement, and design.
Resolves #6557
Signed-off-by: Maurice van Veen [email protected]