-
Notifications
You must be signed in to change notification settings - Fork 768
Description
Issue
Following up from #19249 (comment) :
A race condition in the stack-walker, specifically getting a method's jitInfo/artifact/metadata.
The AVLTree holding the jitInfo hash-tables is synchronized with VMAccess, and the hash-tables themselves are synchronized with a monitor.
When a stack-walker accesses the hash-tables to retrieve a certain jitInfo (when walking, at every JIT frame, if cache miss) it does not hold the monitor, hence if the hash-table is being modified (e.g. JIT inserting a new compiled method's jitInfo) the stack-walker can retrieve incorrect info.
Train of thought
Current state
- CodeCache changes acquire VMExclusiveAccess (modifies AVLTree)
- JITInfo changes acquire monitor (modifies hash-table)
- GC gets VMAccess (not exclusive)
- Stack-walkers don’t use synchrony when getting jitInfo
- No need to synchronize AVLTree access when adding/accessing/removing jitInfo, only hash-table.
Solutions
- Use VMExclusiveAccess/VMAccess for jitInfo changes/access (hash-table)
- Using existing reader/writer synchrony infrastructure
- No need for monitor if hash-table access use VMAccess for synchrony
- Using VMAccess on hash-table synchrony intrudes/may-hold other VMExclusiveAccess actions unnecessarily
- Least intrusive solution(?)
- Add monitor to stack-walker
- “Obvious” solution, not performant (Sequentializes different stack-walkers, getting jitInfo)
- Might be better than expected if cache-hit most of the time and rarely need the monitor when stack walking(?)
- Walking parked virtual-threads continuation does not use the cache.
- Use reader/writer synchrony (instead of normal monitor) in artifact modification/access
- “Obvious” solution improved
- Change monitor to reader/writer mutex
- Overhead of reader/writer synchrony(?)
Ended up prototyping solution number 3.
Experiments
Drivers
Testing 4 different drivers:
- A: Base/Nightly build
- B: Replace hash-table monitor with reader/writer mutex
- Insert/remove artifact gets writer mutex
- Retrieve artifact gets reader mutex
- Does not solve race-condition, stack-walker does not get the mutex
- Evaluate reader/writer mutex overhead
- C: Stack-walker gets reader mutex when needed
- Only get mutex when getting jitInfo/artifact, if cache miss and going to hash-table
- Possible many-short mutex acquire/release
- D: Stack-walker always hold reader mutex
- Acquire reader mutex during all of stack-walk
- Few-long mutex acquire/release
- Holds reader access more than needed
- E: Solution 2 (use the same monitor for the stackwalker)
- Will synchronize stack-walkers getting jitInfo
- Only when accessing hash-table (cache-miss)
Results (UPDATED)
SPECjbb2015 | max-jOPS | Diff |
---|---|---|
A | 67801 | 100.00% |
B | 68292 | 100.73% |
C | 67699 | 99.85% |
D | 36185 | 53.37% |
E | 67894 | 100.14% |
Currently looking at profiles, only immediate apparent observation in D 25% of time spent in omrthread_spinlock_acquire
Conclusion
Seem like C (solution 3) and E (solution 2) are valid solutions.
Best solution is "C", getting the mutex only when needed even if more frequent acquires/releases.
Overhead from using reader/writer mutex instead of the monitor alone is 4.5%, and using the mutex to solve the race condition is another 1%.
Might be worth prototyping and measuring solution number 2, as difference between C and D might indicate high cache-hits, making solution 2 less problematic. But with virtual threads, you don't have the cache used, and having GC stack-walking many vThreads and synchronize around the monitor to stack-walk might be problematic, depend on parallel GC threads/stack-walkers.
I'll analyze profiles of the different drivers to see if something might help reducing the 4.5% effected by the reader/writer mutex change.