(2.12) Atomic batch: support large batches #7067

MauriceVanVeen · 2025-07-15T16:36:11Z

This PR adds support for batches that are larger than a single append entry. Batches are first accumulated before replicating, then once all expected checks pass the batch is proposed (this was already the case). Now when applying the batch, we stage them in-memory until we see the commit message. If the commit message is not seen or gaps are detected, it gets rejected to prevent partially applying batches.

New batchMsgOp and batchCommitMsgOp were introduced to "wrap" the streamMsgOp and compressedStreamMsgOp to also contain the batchId and batchSeq without needing to decompress/decode the message to get the raw headers when doing consistency checks prior to commit.

Applying the streamMsgOp/compressedStreamMsgOp is extracted into a new applyStreamMsgOp function. This supports doing consistency checks prior to commit when batching, and then doing applies in one go.

Resolves #6978

Signed-off-by: Maurice van Veen [email protected]

MauriceVanVeen · 2025-07-15T16:38:04Z

The PR is ready for review, but put in draft since I need to fix one bug where if you'd constantly use batching and they would be spread over multiple append entries, the applied would not move up which would prevent making snapshots.

Instead of fully blocking n.Applied when having an active batch across append entries, need to track the ce.Index prior to the batch starting so we can gradually move n.Applied up.

derekcollison

Have not looked at this specific PR, but if we stage and wait to send the batch to replicas this will be a large seesaw in throughput as the app waits a longer period of time for the ack of the commit since we have to replicate all messages after the commit. WDYT?

MauriceVanVeen · 2025-07-15T17:15:58Z

Have not looked at this specific PR, but if we stage and wait to send the batch to replicas this will be a large seesaw in throughput as the app waits a longer period of time for the ack of the commit since we have to replicate all messages after the commit. WDYT?

I'm thinking there's a slight misunderstanding here around what happens where. But I could also be misinterpreting what you're asking. Let me explain the current flow.

This is how it works already on main:

Many publishers can publish batches to the stream leader.
Once a publisher says "commit", the stream leader does all consistency checks and only afterward proposes to all followers.

Below is added in this PR, and happens after the above:

On the follower(/leader) apply path, we do a consistency check to confirm the leader didn't change halfway when proposing the batch.
All followers do this complete batch/consistency check, and then apply as normal. This can happen safely without needing to consult the leader.

So, we need to do consistency checks prior to replicating AND the follower needs to do a consistency check the batch wasn't abandoned midproposal.

MauriceVanVeen · 2025-07-15T17:27:41Z

What I think you meant is that instead of replicating after the commit as seen on the leader, we replicate prior to seeing the commit. And then we either invalidate the batch if it turns out invalid or we commit it. That should then be some amount faster since we don't need to replicate and then apply.

I do want to consider and tryout that approach. But I'd first want to walk down this implementation path. This approach is going to be fully safe and give all the guarantees the batch needs.
The other approach might be faster, but will be less safe without additional fixes.

What I'd prefer we do:

Continue the current implementation path that will have the consistency and safety required.
Do benchmarking on this approach. (I already know it is faster than PublishAsync but it can be done more thoroughly)
Assert this approach really is 100% safe, not only with unit tests, but also when run under Antithesis.
Then look into implementing the alternative/mentioned approach, and do the same benchmarks and testing. If it's faster and just as safe, I'd be happy to use it. (Likely the implementation will not be different too much, so we can start here and extend to that later)

derekcollison · 2025-07-15T17:42:36Z

Yes so this means a large latency spike when an app sends COMMIT and waits on that ack since it means we have to replicate all of the batch only on COMMIT.

We could replicated as we go and then commit would check everything (if deterministic) and each one is already ready (essentially) on a commit.

Have not looked at this specific PR, but if we stage and wait to send the batch to replicas this will be a large seesaw in throughput as the app waits a longer period of time for the ack of the commit since we have to replicate all messages after the commit. WDYT?

I'm thinking there's a slight misunderstanding here around what happens where. But I could also be misinterpreting what you're asking. Let me explain the current flow.

This is how it works already on main:

Many publishers can publish batches to the stream leader.

Once a publisher says "commit", the stream leader does all consistency checks and only afterward proposes to all followers.

Below is added in this PR, and happens after the above:

On the follower(/leader) apply path, we do a consistency check to confirm the leader didn't change halfway when proposing the batch.

All followers do this complete batch/consistency check, and then apply as normal. This can happen safely without needing to consult the leader.

So, we need to do consistency checks prior to replicating AND the follower needs to do a consistency check the batch wasn't abandoned midproposal.

MauriceVanVeen · 2025-07-15T17:57:59Z

There are some additional issues I can foresee when doing that, for example counters can't have this optimistic replication because they change the headers and message body on commit. So those can't be replicated first.
Think those issues are solvable, and primarily batched counter updates would need to opt-out.

Do want to note though, there's no large latency spike on commit! If the batch fits in a single append entry, the latency will be exactly the same as with PublishAsync. Point taken though on potentially having better latencies with optimistic replication. Will need to measure if that's indeed the case, and by how much.

derekcollison · 2025-07-15T18:08:12Z

Agree if they fit into one AE should not see any effect. And under $SYS from a server that message can be any size (but prefer not to go over 8M). But once hoisted into a user account could be subject to max payload restrictions.

derekcollison · 2025-07-21T15:49:35Z

server/jetstream_batching_test.go

 	"fmt"
 	"strconv"
 	"testing"
 	"time"

+	"github.com/klauspost/compress/s2"
+


remove new line here.

derekcollison · 2025-07-21T15:52:39Z

server/jetstream_cluster.go

+				continue
+			} else if batchActiveId != _EMPTY_ {
+				// If a batch is abandoned without a commit, reject it.
+				mset.batchMu.Lock()


Keep repeating ourselves here, maybe these are functions?

With the move into a separate batchApply struct (d85ff64) could remove the batchActiveId variable that needed to be kept up-to-date.

Like mentioned here #7067 (comment), could only introduce a rejectBatchState() that's used in two places, the lock needs to be held for longer in all other cases.

derekcollison · 2025-07-21T15:53:50Z

server/jetstream_cluster.go

+
+// clearBatchStateLocked clears in-memory apply-batch-related state.
+// mset.batchMu lock should be held.
+func (mset *stream) clearBatchStateLocked() {


Maybe intro no Locked version that acquire the proper lock and release and call these functions.

Have introduced a rejectBatchState() which is now used in two places when that's the only thing that's done.
The others all need the lock to be held for a longer period.

derekcollison · 2025-07-21T15:57:35Z

server/stream.go

+	batches *batching
+
+	// State to check for batch completeness before applying it.
+	batchMu         sync.Mutex


Should these all be in their own struct that we alloc only when we see first batch?

Moved into a separate batchApply struct (d85ff64).

Signed-off-by: Maurice van Veen <[email protected]>

derekcollison reviewed Jul 15, 2025

View reviewed changes

MauriceVanVeen force-pushed the maurice/batch-large branch 2 times, most recently from 6531e0f to d8e92fd Compare July 16, 2025 10:33

MauriceVanVeen marked this pull request as ready for review July 16, 2025 11:34

MauriceVanVeen requested a review from a team as a code owner July 16, 2025 11:34

derekcollison reviewed Jul 21, 2025

View reviewed changes

MauriceVanVeen added 3 commits July 21, 2025 21:13

(2.12) Atomic batch: support large batches

32d260a

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) NRG: separate Processed and Applied for batch-semantics

35e41e9

Signed-off-by: Maurice van Veen <[email protected]>

(2.12) Atomic batch: return rejected batch entries to pool

dc6ac54

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen force-pushed the maurice/batch-large branch from d8e92fd to ef5c2f4 Compare July 21, 2025 19:14

Separate into batch struct

d85ff64

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen force-pushed the maurice/batch-large branch from ef5c2f4 to d85ff64 Compare July 21, 2025 19:25

Batch apply state cleanup

69b4735

Signed-off-by: Maurice van Veen <[email protected]>

MauriceVanVeen force-pushed the maurice/batch-large branch from 5f29e0e to 69b4735 Compare July 21, 2025 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(2.12) Atomic batch: support large batches #7067

(2.12) Atomic batch: support large batches #7067

MauriceVanVeen commented Jul 15, 2025

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

derekcollison left a comment

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

derekcollison commented Jul 15, 2025

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

derekcollison commented Jul 15, 2025

Uh oh!

derekcollison Jul 21, 2025

Uh oh!

MauriceVanVeen Jul 21, 2025

Uh oh!

derekcollison Jul 21, 2025

Uh oh!

MauriceVanVeen Jul 21, 2025 •

edited

Loading

Uh oh!

derekcollison Jul 21, 2025

Uh oh!

MauriceVanVeen Jul 21, 2025

Uh oh!

derekcollison Jul 21, 2025

Uh oh!

MauriceVanVeen Jul 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

(2.12) Atomic batch: support large batches #7067

Are you sure you want to change the base?

(2.12) Atomic batch: support large batches #7067

Conversation

MauriceVanVeen commented Jul 15, 2025

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

derekcollison left a comment

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

derekcollison commented Jul 15, 2025

Uh oh!

MauriceVanVeen commented Jul 15, 2025

Uh oh!

derekcollison commented Jul 15, 2025

Uh oh!

derekcollison Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

derekcollison Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

derekcollison Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

derekcollison Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MauriceVanVeen Jul 21, 2025 •

edited

Loading

MauriceVanVeen Jul 21, 2025 •

edited

Loading