Skip to content

Fix calculation of multipart S3 upload chunk size #1648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

byteOfChaos
Copy link

@byteOfChaos byteOfChaos commented Mar 31, 2025

Description

Fix an incorrect calculation of the chunk size for a multipart file upload to S3. Original code seemed to be targeting calculation of smallest chunk size for successful S3 file upload.

Added a few range checks - which may be out of place; I'll happily receive correction - at the top of the update() method and after calculation of chunk size and chunk count.

Fixes #1425

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Included a range check after calculation, requiring visual verification. For S3 store, there seems to only be integration tests that do not cover multipart upload.

I'd be happy to explore more thorough testing after receiving some guidance.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Mar 31, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@jaroeichler As second pair of eyes to check that this calculation is correct.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: 0 of 2 LGTMs obtained, and all files reviewed, and 3 discussions need to be resolved (waiting on @jaroeichler)


-- commits line 7 at r1:
This message seems to be out-of-sync with the code


nativelink-store/src/s3_store.rs line 82 at r1 (raw file):

// S3 parts cannot be more than this number. See:
// https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html
const MAX_UPLOAD_PARTS: u64 = 10_000;

nit: Consider doing one of the following:

  • Leave a comment to explain that this makes calculations with this value easier
  • Make this a u16 and use some u64::from conversions in the calculations.

nativelink-store/src/s3_store.rs line 425 at r1 (raw file):

        // Sanity check
        if max_size > MAX_UPLOAD_SIZE {

fyi I think this is fine.


nativelink-store/src/s3_store.rs line 546 at r1 (raw file):

        // First approximation of number of chunks if we upload in 5MB chucks (min chunk size),
        // clamp to 10,000 parts.
        let chunk_count = (max_size / MIN_MULTIPART_SIZE).clamp(1, MAX_UPLOAD_PARTS) + 1;

Let's create a regression test for this.

aaronmondal

This comment was marked as duplicate.

@MarcusSorealheis
Copy link
Collaborator

I think with a test, this PR will be close.

Copy link
Contributor

@jaroeichler jaroeichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 LGTMs obtained, and all files reviewed, and 5 discussions need to be resolved


nativelink-store/src/s3_store.rs line 546 at r1 (raw file):

        // First approximation of number of chunks if we upload in 5MB chucks (min chunk size),
        // clamp to 10,000 parts.
        let chunk_count = (max_size / MIN_MULTIPART_SIZE).clamp(1, MAX_UPLOAD_PARTS) + 1;

Shouldn't it be clamp(0, MAX_UPLOAD_PARTS - 1) here to adjust for the +1?


nativelink-store/src/s3_store.rs line 551 at r1 (raw file):

        // chunk, excluding last chunk, clamping to min/max upload size 5MB, 5GB.
        let bytes_per_upload_part =
            (max_size / chunk_count).clamp(MIN_MULTIPART_SIZE, MAX_MULTIPART_SIZE);

Can't you just cut this off at the minimum MIN_MULTIPART_SIZE and then check if bytes_per_upload_part < MAX_MULTIPART_SIZE is satisfied? Then you wouldn't have to recalculate chunk_count. Also why is there no +1 here? Don't we again have to take into account rounding from integer division to ensure that chunk_count * bytes_per_upload_part >= max_size?

Copy link
Author

@byteOfChaos byteOfChaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies on the long reply time. Thank you for the thoughtful comments; I hope to have addressed all to satisfaction except for the test. As the update() method currently stands, I feel like the test could be "pretty involved" where it would be nice to simply isolate the finer points to unit test and isolate the other business logic...

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 1 files reviewed, and pending CI: Bazel Dev / macos-15, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-15, Cargo Dev / ubuntu-24.04, Coverage, Installation / macos-13, Installation / macos-14, Installation / macos-15, Installation / ubuntu-22.04, Installation / ubuntu-24.04, Local / lre-rs / macos-15, NativeLink.com Cloud / Remote Cache / macos-15, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Publish image, Publish nativelink-worker-init, Remote / lre-cc / large-ubuntu-22.04, Remote / lre-rs / large-ubuntu-22.04, asan / ubuntu-24.04, macos-15, ubuntu-24.04, ubuntu-24.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved (waiting on @jaroeichler)


-- commits line 7 at r1:

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

This message seems to be out-of-sync with the code

Concur; will fix


nativelink-store/src/s3_store.rs line 82 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

nit: Consider doing one of the following:

  • Leave a comment to explain that this makes calculations with this value easier
  • Make this a u16 and use some u64::from conversions in the calculations.

Comment added


nativelink-store/src/s3_store.rs line 425 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

fyi I think this is fine.

As in - good addition (to which, I'll say thanks)? Or..?


nativelink-store/src/s3_store.rs line 546 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Let's create a regression test for this.

Been looking at the mechanisms already used to test s3 storage and should have a solution in the next 24 hours. Is an approach with StaticReplayClient what you were expecting?


nativelink-store/src/s3_store.rs line 546 at r1 (raw file):

Previously, jaroeichler wrote…

Shouldn't it be clamp(0, MAX_UPLOAD_PARTS - 1) here to adjust for the +1?

Concur; appreciate the fresh eyes


nativelink-store/src/s3_store.rs line 551 at r1 (raw file):

Previously, jaroeichler wrote…

Can't you just cut this off at the minimum MIN_MULTIPART_SIZE and then check if bytes_per_upload_part < MAX_MULTIPART_SIZE is satisfied? Then you wouldn't have to recalculate chunk_count. Also why is there no +1 here? Don't we again have to take into account rounding from integer division to ensure that chunk_count * bytes_per_upload_part >= max_size?

Not sure I was being overly cautious. If we have gotten past the first sanity check, there is no reason to assume that we cannot break the file down into no more than 10k parts of sizes b/w [5MiB, 5GiB]. I have made the corrections with associated comment clarifications.

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: 0 of 2 LGTMs obtained, and all files reviewed, and pending CI: Bazel Dev / macos-15, Bazel Dev / ubuntu-24.04, Cargo Dev / macos-15, Cargo Dev / ubuntu-24.04, Coverage, NativeLink.com Cloud / Remote Cache / macos-15, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, asan / ubuntu-24.04, macos-15, ubuntu-24.04, ubuntu-24.04 / stable, windows-2022 / stable, and 4 discussions need to be resolved (waiting on @jaroeichler)


nativelink-store/src/s3_store.rs line 425 at r1 (raw file):

Previously, byteOfChaos (Kyle Wheat) wrote…

As in - good addition (to which, I'll say thanks)? Or..?

As in "good addition" 😊 (IIUC the commit message mentioned that you weren't sure about this)


nativelink-store/src/s3_store.rs line 546 at r1 (raw file):

Previously, byteOfChaos (Kyle Wheat) wrote…

Been looking at the mechanisms already used to test s3 storage and should have a solution in the next 24 hours. Is an approach with StaticReplayClient what you were expecting?

Ideally yes. Although I noticed that it might be a bit tricky since we probably can't easily test this with a "naive" approach due to limited resources on the github runners. Maybe a streaming approach could work, though even with that, the time needed to chunk a 5TB file might be a bit too time consuming for CI. I'd say please try it, but if it turns out to be too much of a hassle it would also be ok to treat this as an exception where a regression test might simply be infeasible.

Copy link
Author

@byteOfChaos byteOfChaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing done. I believe it's impossible to have an upload within the limits of max upload size break any of the limits of max upload parts, min/max multipart size, etc. which simplifies the test and lets us avoid needing to test with outrageously large files (unless we want to add a test case where we calculate an chunk size that is within - but not equal to either - (MIN_MULTIPART_SIZE, MAX_MULTIPART_SIZE). That would probably strain CI.

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 2 files reviewed, and pending CI: pre-commit-checks, and 4 discussions need to be resolved (waiting on @jaroeichler)


nativelink-store/src/s3_store.rs line 425 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

As in "good addition" 😊 (IIUC the commit message mentioned that you weren't sure about this)

Ah, I see. I'll go all in and remove my uncertainty from the commit message.


nativelink-store/src/s3_store.rs line 546 at r1 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

Ideally yes. Although I noticed that it might be a bit tricky since we probably can't easily test this with a "naive" approach due to limited resources on the github runners. Maybe a streaming approach could work, though even with that, the time needed to chunk a 5TB file might be a bit too time consuming for CI. I'd say please try it, but if it turns out to be too much of a hassle it would also be ok to treat this as an exception where a regression test might simply be infeasible.

I believe it's impossible to violate the MAX_MULTIPART_SIZE in order to cause a clamp from the way we calculate chunk_count and then bytes_per_upload_part, so the test causes a clamp to MIN_MULTIPART_SIZE. I left a comment that outlines some thoughts for a test case that will calculate a chunk size b/w (MIN_MULTIPART_SIZE, MAX_MULTIPART_SIZE).

- Add check for maximum file size
- Calculate minimum chunk size (maximum chunk count) for multipart S3
  file upload, checking for proper limit of chunk count before upload
- Make MAX_UPLOAD_PARTS u64 for uniformity and simplicity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 Store's bytes per upload part calculation logic doesn't make sense
5 participants