New ScyllaDB key space partitioning #4049

ndr-ds · 2025-06-02T19:05:58Z

Motivation

ScyllaDB hashes each partition key into a token. Each token range gets assigned to a shard. Each shard is pinned to a specific CPU by default. This means that having very few partitions is bad for ScyllaDB's performance.
Right now in the current scheme of things we have just one partition key, which is the root_key. For views, which is our mutable data in the DB, root_key will be set, and we'll be in "exclusive mode". For everything else (Certificates, ConfirmedBlocks, Blobs, etc), everything is on the same partition. For example when we're doing 1M TPS, we'll have several thousands of blocks being created every second, as well as certificates. This current scheme won't scale for those numbers.

Proposal

New schema proposal: instead of having a root_key, we'll have a partition_key instead. This partition_key will have two modes: exclusive mode (mutable data) and non exclusive mode (immutable data). The former will work exactly how the previous root_key schema worked.

For exclusive mode, the partition_key's first byte will be 0, indicating the mode. The rest of the bytes will be the root_key that was provided when calling open_exclusive. Everything else should work as it previously did.

For non exclusive mode, the partition_key's first byte will be 1. The rest of the partition key will be a prefix of the key of a predetermined length (through a configuration parameter). This mode we'll have some reservations:

Batches with queries with keys across different partitions are not allowed (this also applies for exclusive mode by design)
Batches with prefix deletes are not allowed (would require a full table scan in cases where the prefix is smaller than the partition_key's prefix size)
On read_multi_values_internal and contains_keys_internal we currently group the keys by partition_key and execute one query per partition_key in parallel. This is done to keep the queries token aware across partitions
On find_keys_by_prefix_internal and find_key_values_by_prefix_internal, if the prefix is smaller than the partition_key's prefix size, we'll do a full table scan. This happens infrequently enough that we're willing to take the performance hit.

Test Plan

CI + will benchmark this to check performance. Some tests had to be altered as they didn't respect the invariant that if we're using a root_key, we should be in exclusive mode.

Follow ups

There are several follow ups here:

Bundle Certificate and ConfirmedBlock BaseKeys closer together, as well as Blob and BlobState
Do the same thing that we did for find_keys_by_prefix_internal for prefix deletes, and take the perf hit of the full table scan, as these seem to be currently infrequent
Enforce for Views (maybe on load) that they must always be on exclusive mode
We have a lot of places in the code where we use batches of size 1. These should not be batches. More specifically WritableKeyValueStore should have write_value and write_multi_values methods, analogous to ReadableKeyValueStore. Batches should be used when atomicity is wanted, or when we want to save network requests to the DB

Release Plan

Nothing to do / These changes follow the usual release cycle.

ndr-ds · 2025-06-02T19:06:17Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

Scylla test changes #4198
New ScyllaDB key space partitioning #4049 👈 (View in Graphite)
Optimize ScyllaDB's batch writes #4047
Some code cleanups #4066
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

ma2bd · 2025-06-05T04:00:55Z

linera-views/src/views/unit_tests/views.rs

@@ -243,6 +243,8 @@ impl TestContextFactory for ScyllaDbContextFactory {
        let config = ScyllaDbStore::new_test_config().await?;
        let namespace = generate_test_namespace();
        let store = ScyllaDbStore::recreate_and_connect(&config, &namespace).await?;
+        // TODO(#4065): Remove this once we enforce exclusive mode for views.


I don't think we're going to remove the line you added.

Ah, you’re right, I guess the tests lacking these calls are an orthogonal issue. I’ll remove the comment tomorrow

ma2bd

back to your queue as we iterate on the design

ndr-ds mentioned this pull request Jun 2, 2025

Optimize ScyllaDB's batch writes #4047

Draft

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch 4 times, most recently from e2afc91 to 72b1567 Compare June 2, 2025 22:27

ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch 2 times, most recently from 768208f to 0b093e3 Compare June 2, 2025 23:22

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 72b1567 to 7cfc4d3 Compare June 2, 2025 23:22

ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 3, 2025 16:50

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 7cfc4d3 to 0b8cf01 Compare June 3, 2025 16:52

ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 3, 2025 16:52

ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 3, 2025 19:11

ndr-ds mentioned this pull request Jun 3, 2025

Truncate query output on query_node #4054

Merged

ndr-ds force-pushed the graphite-base/4049 branch from 09d6c81 to a8361a5 Compare June 4, 2025 03:39

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 0b8cf01 to 1f38372 Compare June 4, 2025 03:39

ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 4, 2025 03:39

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 1f38372 to 07ca203 Compare June 4, 2025 03:40

ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 4, 2025 17:17

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 07ca203 to 314f92f Compare June 4, 2025 17:17

ndr-ds force-pushed the graphite-base/4049 branch from a8361a5 to 0ff7949 Compare June 4, 2025 17:17

ndr-ds changed the base branch from graphite-base/4049 to 06-04-some_code_cleanups June 4, 2025 17:18

ndr-ds mentioned this pull request Jun 4, 2025

Some code cleanups #4066

Merged

ndr-ds force-pushed the 06-04-some_code_cleanups branch from 0ff7949 to 2bd4ab3 Compare June 4, 2025 17:25

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 314f92f to 63fd134 Compare June 4, 2025 17:25

ndr-ds changed the base branch from 06-04-some_code_cleanups to graphite-base/4049 June 4, 2025 18:21

ndr-ds force-pushed the graphite-base/4049 branch from 2bd4ab3 to a05e631 Compare June 4, 2025 18:50

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 63fd134 to ff4a945 Compare June 4, 2025 18:50

ndr-ds changed the base branch from graphite-base/4049 to 06-04-some_code_cleanups June 4, 2025 18:50

ndr-ds changed the base branch from 06-04-some_code_cleanups to graphite-base/4049 June 4, 2025 19:02

ndr-ds requested review from MathieuDutSik and Twey June 5, 2025 03:51

ma2bd reviewed Jun 5, 2025

View reviewed changes

ndr-ds changed the base branch from 06-04-some_code_cleanups to graphite-base/4049 June 5, 2025 05:10

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from d9c9974 to d04a148 Compare June 5, 2025 05:28

ndr-ds force-pushed the graphite-base/4049 branch from b72852f to c8b5e29 Compare June 5, 2025 05:28

ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 5, 2025 05:28

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from d04a148 to d33ab0d Compare June 5, 2025 06:25

ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch from c8b5e29 to 36cfe7b Compare June 5, 2025 06:25

ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 5, 2025 10:47

ndr-ds force-pushed the graphite-base/4049 branch from 36cfe7b to cef2f5d Compare June 5, 2025 10:49

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from d33ab0d to 1cf50f7 Compare June 5, 2025 10:49

ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 5, 2025 10:49

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 1cf50f7 to 5161f03 Compare June 5, 2025 10:55

ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch from cef2f5d to aca7d91 Compare June 5, 2025 20:40

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 5161f03 to a419c54 Compare June 5, 2025 20:40

ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch from aca7d91 to 9367ee5 Compare June 5, 2025 23:21

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from a419c54 to 0428ee1 Compare June 5, 2025 23:21

ma2bd requested changes Jun 24, 2025

View reviewed changes

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 0428ee1 to 496ba58 Compare June 30, 2025 19:02

ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch 2 times, most recently from a5ebfeb to 0a80832 Compare June 30, 2025 19:07

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 496ba58 to 0fb57f0 Compare June 30, 2025 19:07

ndr-ds marked this pull request as draft July 2, 2025 15:09

ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 July 2, 2025 17:04

New ScyllaDB key space partitioning

f1df5d1

ndr-ds force-pushed the graphite-base/4049 branch from 0a80832 to 7be4b46 Compare July 2, 2025 17:19

ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 0fb57f0 to f1df5d1 Compare July 2, 2025 17:19

ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes July 2, 2025 17:19

ndr-ds mentioned this pull request Jul 2, 2025

Scylla test changes #4198

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New ScyllaDB key space partitioning #4049

New ScyllaDB key space partitioning #4049

ndr-ds commented Jun 2, 2025 •

edited

Loading

Uh oh!

ndr-ds commented Jun 2, 2025 •

edited

Loading

Uh oh!

ma2bd Jun 5, 2025 •

edited

Loading

Uh oh!

ndr-ds Jun 5, 2025

Uh oh!

ma2bd left a comment

Uh oh!

Uh oh!

New ScyllaDB key space partitioning #4049

Are you sure you want to change the base?

New ScyllaDB key space partitioning #4049

Conversation

ndr-ds commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Proposal

Test Plan

Follow ups

Release Plan

Uh oh!

ndr-ds commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ma2bd Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndr-ds Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

ma2bd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ndr-ds commented Jun 2, 2025 •

edited

Loading

ndr-ds commented Jun 2, 2025 •

edited

Loading

ma2bd Jun 5, 2025 •

edited

Loading