Update fsspec parameters for cloud reads #677
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What has been built?
This PR sets up
fsspec
to use the cloud optimized parameters recommended in this whitepaper. It does not add the h5py parameters due to benchmarking discussed in #675.fsspec
is used for cloud reading (and only cloud reading) so this update should only affect reads to data in s3.Merging Note: I put this PR up for reference and visibility. I know there are other big changes and a v2 release coming soon, but this PR can wait until the appropriate time for merging.
Approximate timing results
Even on v006 data adding the added fsspec parameter noticeably speeds up data reads from the cloud (~12 times faster).
Timing was done with the jupyter magic
%%timeit
on the 3 lines below: 1) create Read object 2) append variables 3) load data. This does not account for s3 caching.How was it done?
The biggest decision, I think, was whether or not to expose this as an option to the user. This PR is the simplest possible implementation, in which the user is not given a choice. Given what a low level change this is I think this makes the most sense, but others should jump in if they would like.
How can it be tested?
The code below reads a small v006 file from s3.