Skip to content

New ScyllaDB key space partitioning #4049

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: 06-02-optimize_scylladb_s_batch_writes
Choose a base branch
from

Conversation

ndr-ds
Copy link
Contributor

@ndr-ds ndr-ds commented Jun 2, 2025

Motivation

ScyllaDB hashes each partition key into a token. Each token range gets assigned to a shard. Each shard is pinned to a specific CPU by default. This means that having very few partitions is bad for ScyllaDB's performance.
Right now in the current scheme of things we have just one partition key, which is the root_key. For views, which is our mutable data in the DB, root_key will be set, and we'll be in "exclusive mode". For everything else (Certificates, ConfirmedBlocks, Blobs, etc), everything is on the same partition. For example when we're doing 1M TPS, we'll have several thousands of blocks being created every second, as well as certificates. This current scheme won't scale for those numbers.

Proposal

New schema proposal: instead of having a root_key, we'll have a partition_key instead. This partition_key will have two modes: exclusive mode (mutable data) and non exclusive mode (immutable data). The former will work exactly how the previous root_key schema worked.

For exclusive mode, the partition_key's first byte will be 0, indicating the mode. The rest of the bytes will be the root_key that was provided when calling open_exclusive. Everything else should work as it previously did.

For non exclusive mode, the partition_key's first byte will be 1. The rest of the partition key will be a prefix of the key of a predetermined length (through a configuration parameter). This mode we'll have some reservations:

  • Batches with queries with keys across different partitions are not allowed (this also applies for exclusive mode by design)
  • Batches with prefix deletes are not allowed (would require a full table scan in cases where the prefix is smaller than the partition_key's prefix size)
  • On read_multi_values_internal and contains_keys_internal we currently group the keys by partition_key and execute one query per partition_key in parallel. This is done to keep the queries token aware across partitions
  • On find_keys_by_prefix_internal and find_key_values_by_prefix_internal, if the prefix is smaller than the partition_key's prefix size, we'll do a full table scan. This happens infrequently enough that we're willing to take the performance hit.

Test Plan

CI + will benchmark this to check performance. Some tests had to be altered as they didn't respect the invariant that if we're using a root_key, we should be in exclusive mode.

Follow ups

There are several follow ups here:

  • Bundle Certificate and ConfirmedBlock BaseKeys closer together, as well as Blob and BlobState
  • Do the same thing that we did for find_keys_by_prefix_internal for prefix deletes, and take the perf hit of the full table scan, as these seem to be currently infrequent
  • Enforce for Views (maybe on load) that they must always be on exclusive mode
  • We have a lot of places in the code where we use batches of size 1. These should not be batches. More specifically WritableKeyValueStore should have write_value and write_multi_values methods, analogous to ReadableKeyValueStore. Batches should be used when atomicity is wanted, or when we want to save network requests to the DB

Release Plan

  • Nothing to do / These changes follow the usual release cycle.

Copy link
Contributor Author

ndr-ds commented Jun 2, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch 4 times, most recently from e2afc91 to 72b1567 Compare June 2, 2025 22:27
@ndr-ds ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch 2 times, most recently from 768208f to 0b093e3 Compare June 2, 2025 23:22
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 72b1567 to 7cfc4d3 Compare June 2, 2025 23:22
@ndr-ds ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 3, 2025 16:50
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 7cfc4d3 to 0b8cf01 Compare June 3, 2025 16:52
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 3, 2025 16:52
@ndr-ds ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 3, 2025 19:11
@ndr-ds ndr-ds force-pushed the graphite-base/4049 branch from 09d6c81 to a8361a5 Compare June 4, 2025 03:39
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 0b8cf01 to 1f38372 Compare June 4, 2025 03:39
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 4, 2025 03:39
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 1f38372 to 07ca203 Compare June 4, 2025 03:40
@ndr-ds ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 4, 2025 17:17
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 07ca203 to 314f92f Compare June 4, 2025 17:17
@ndr-ds ndr-ds force-pushed the graphite-base/4049 branch from a8361a5 to 0ff7949 Compare June 4, 2025 17:17
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-04-some_code_cleanups June 4, 2025 17:18
@ndr-ds ndr-ds mentioned this pull request Jun 4, 2025
@ndr-ds ndr-ds force-pushed the 06-04-some_code_cleanups branch from 0ff7949 to 2bd4ab3 Compare June 4, 2025 17:25
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 314f92f to 63fd134 Compare June 4, 2025 17:25
@ndr-ds ndr-ds changed the base branch from 06-04-some_code_cleanups to graphite-base/4049 June 4, 2025 18:21
@ndr-ds ndr-ds force-pushed the graphite-base/4049 branch from 2bd4ab3 to a05e631 Compare June 4, 2025 18:50
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 63fd134 to ff4a945 Compare June 4, 2025 18:50
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-04-some_code_cleanups June 4, 2025 18:50
@ndr-ds ndr-ds changed the base branch from 06-04-some_code_cleanups to graphite-base/4049 June 4, 2025 19:02
@ndr-ds ndr-ds requested review from MathieuDutSik and Twey June 5, 2025 03:51
@@ -243,6 +243,8 @@ impl TestContextFactory for ScyllaDbContextFactory {
let config = ScyllaDbStore::new_test_config().await?;
let namespace = generate_test_namespace();
let store = ScyllaDbStore::recreate_and_connect(&config, &namespace).await?;
// TODO(#4065): Remove this once we enforce exclusive mode for views.
Copy link
Contributor

@ma2bd ma2bd Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're going to remove the line you added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you’re right, I guess the tests lacking these calls are an orthogonal issue. I’ll remove the comment tomorrow

@ndr-ds ndr-ds changed the base branch from 06-04-some_code_cleanups to graphite-base/4049 June 5, 2025 05:10
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from d9c9974 to d04a148 Compare June 5, 2025 05:28
@ndr-ds ndr-ds force-pushed the graphite-base/4049 branch from b72852f to c8b5e29 Compare June 5, 2025 05:28
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 5, 2025 05:28
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from d04a148 to d33ab0d Compare June 5, 2025 06:25
@ndr-ds ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch from c8b5e29 to 36cfe7b Compare June 5, 2025 06:25
@ndr-ds ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 June 5, 2025 10:47
@ndr-ds ndr-ds force-pushed the graphite-base/4049 branch from 36cfe7b to cef2f5d Compare June 5, 2025 10:49
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from d33ab0d to 1cf50f7 Compare June 5, 2025 10:49
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes June 5, 2025 10:49
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 1cf50f7 to 5161f03 Compare June 5, 2025 10:55
@ndr-ds ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch from cef2f5d to aca7d91 Compare June 5, 2025 20:40
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 5161f03 to a419c54 Compare June 5, 2025 20:40
@ndr-ds ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch from aca7d91 to 9367ee5 Compare June 5, 2025 23:21
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from a419c54 to 0428ee1 Compare June 5, 2025 23:21
Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

back to your queue as we iterate on the design

@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 0428ee1 to 496ba58 Compare June 30, 2025 19:02
@ndr-ds ndr-ds force-pushed the 06-02-optimize_scylladb_s_batch_writes branch 2 times, most recently from a5ebfeb to 0a80832 Compare June 30, 2025 19:07
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 496ba58 to 0fb57f0 Compare June 30, 2025 19:07
@ndr-ds ndr-ds marked this pull request as draft July 2, 2025 15:09
@ndr-ds ndr-ds changed the base branch from 06-02-optimize_scylladb_s_batch_writes to graphite-base/4049 July 2, 2025 17:04
@ndr-ds ndr-ds force-pushed the graphite-base/4049 branch from 0a80832 to 7be4b46 Compare July 2, 2025 17:19
@ndr-ds ndr-ds force-pushed the 05-29-new_scylladb_key_space_partitioning branch from 0fb57f0 to f1df5d1 Compare July 2, 2025 17:19
@ndr-ds ndr-ds changed the base branch from graphite-base/4049 to 06-02-optimize_scylladb_s_batch_writes July 2, 2025 17:19
@ndr-ds ndr-ds mentioned this pull request Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants