-
Notifications
You must be signed in to change notification settings - Fork 1.2k
cache preset dict for LZ4WithPresetDictDecompressor #14397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -44,4 +44,6 @@ public abstract void decompress( | |
|
||
@Override | ||
public abstract Decompressor clone(); | ||
|
||
public void reset() {} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -511,6 +511,7 @@ private void doReset(int docID) throws IOException { | |
bytes.offset = bytes.length = 0; | ||
for (int decompressed = 0; decompressed < totalLength; ) { | ||
final int toDecompress = Math.min(totalLength - decompressed, chunkSize); | ||
decompressor.reset(); | ||
decompressor.decompress(fieldsStream, toDecompress, 0, toDecompress, spare); | ||
Comment on lines
+514
to
515
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am wondering if reset should be the default behavior. We can pass another flag to indicate reuse if possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am not questioning that. My point is to not have
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried but failed in just relying on outer We have two chunks:
Steps are as follow:
In the case, we should call |
||
bytes.bytes = ArrayUtil.grow(bytes.bytes, bytes.length + spare.length); | ||
System.arraycopy(spare.bytes, spare.offset, bytes.bytes, bytes.length, spare.length); | ||
|
@@ -559,6 +560,7 @@ SerializedDocument document(int docID) throws IOException { | |
documentInput = new ByteArrayDataInput(bytes.bytes, bytes.offset + offset, length); | ||
} else if (sliced) { | ||
fieldsStream.seek(startPointer); | ||
decompressor.reset(); | ||
decompressor.decompress( | ||
fieldsStream, chunkSize, offset, Math.min(length, chunkSize - offset), bytes); | ||
documentInput = | ||
|
@@ -572,6 +574,7 @@ void fillBuffer() throws IOException { | |
throw new EOFException(); | ||
} | ||
final int toDecompress = Math.min(length - decompressed, chunkSize); | ||
decompressor.reset(); | ||
decompressor.decompress(fieldsStream, toDecompress, 0, toDecompress, bytes); | ||
decompressed += toDecompress; | ||
} | ||
|
@@ -643,6 +646,7 @@ SerializedDocument serializedDocument(int docID) throws IOException { | |
if (state.contains(docID) == false) { | ||
fieldsStream.seek(indexReader.getStartPointer(docID)); | ||
state.reset(docID); | ||
decompressor.reset(); | ||
} | ||
assert state.contains(docID); | ||
return state.document(docID); | ||
|
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should consider exposing metric (simple counter maybe) on how many times we could reuse, and how many times had to read from the disk? That would provide some useful insights on the usefulness of this change