Skip to content

Discussion: Improve DirectIO Directory for Java 24/25 #14928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

uschindler
Copy link
Contributor

@uschindler uschindler commented Jul 9, 2025

Currently the DirectIODircetory allocates direct byte buffers outside of heap (because that's needed for direct IO to work). It also needs to align them on the blocksize. The current code may also be wrong if the mergeBufferSize is not a multiple of blockSize. This PR fixes that to have a correct buffer aligned and with correct length.

With MemorySegments we can improve that:

  • There is a direct allocator method that takes care that a MemorySegment is allocated with correct alignment. In contrast to ByteBuffers the length is not aligned, so we have to take care (I added code in ctor to have correct multiplies of blockSize).
  • We can convert this MemorySegment to a direct buffer with MemorySegment#asByteBuffer(). The resulting segment is compatible to direct IO.
  • We can free the buffer using an Arena at the correct time, when closing the output or input.

As IndexOutputs are only used by one thread we can use a confined arena and allocate the buffer there.

With IndexInputs it is more complicated: Theoretically they should also only be used from one thread (also RandomAccessInputs as far as I remember), but unfortunately the buffer is allocated at the time of cloning (which is not the thread when it is used). Actually the buffering code is a bit cryptic to me and I had no time to look closely into it: Actually like in BufferedIndexInput the buffer should be lazy initialized on the first real READ access (not on cloning and not on seeking for first time after cloning). To implement this correctly we may need to refactor the buffer code a bit.

Therefore in this mockup I use an AUTO arena which make the buffer freed by garbage collector. A shared arena is too expensive.

If you have an idea how to fix the IndexInput to use a lazy buffer like BufferedIndexInput without mixing everything up, tell me. The buffer should be confined and allocated only from the thread actually using the clone. An alternative is to have a pool of buffers for reuse "per thread" (threadlocal). The JDK internally uses a ThreadLocal for such buffers when implementing java's IO layer.

…ers outside heap without control of freeing them
Copy link

github-actions bot commented Jul 9, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@uschindler uschindler changed the title Improve DirectIO Directory for Java 24/25 Discussion: Improve DirectIO Directory for Java 24/25 Jul 9, 2025
@uschindler
Copy link
Contributor Author

P.S.: I looked at this only for "interest" because I was discussing on the OpenSearch issue tracker about implementing an encrypted directory implementation and my suggestion was to add a buffer pool which is fed by direct IO. While thinking about this I noticed that MemorySegments can be allocated with alignment.

@ChrisHegarty
Copy link
Contributor

Theoretically they should also only be used from one thread (also RandomAccessInputs as far as I remember), but unfortunately the buffer is allocated at the time of cloning (which is not the thread when it is used).

Hmmm.... I get confused about the model; it is slice or clone I need to do to operate on a thread other than the one that created the IndexInput ?

@uschindler
Copy link
Contributor Author

Basically the problem is: Some thread creates a clone (or a slice, does not matter). When the clone or the thread is created, unfortunately the clone's private directio buffer is created at the time of cloning by the thread who cloned it.
After that it is handled over to another thread. As the buffer is confined it can't be used after hand-over. So the creation of directio buffer needs to be delayed to the time after hand-over was done.

Or is there a way to hand over a confined arena to another thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants