Skip to content

Stack-walker jitInfo race condition #22371

@rmnattas

Description

@rmnattas

Issue

Following up from #19249 (comment) :

A race condition in the stack-walker, specifically getting a method's jitInfo/artifact/metadata.
The AVLTree holding the jitInfo hash-tables is synchronized with VMAccess, and the hash-tables themselves are synchronized with a monitor.
When a stack-walker accesses the hash-tables to retrieve a certain jitInfo (when walking, at every JIT frame, if cache miss) it does not hold the monitor, hence if the hash-table is being modified (e.g. JIT inserting a new compiled method's jitInfo) the stack-walker can retrieve incorrect info.

Train of thought

Current state

  • CodeCache changes acquire VMExclusiveAccess (modifies AVLTree)
  • JITInfo changes acquire monitor (modifies hash-table)
  • GC gets VMAccess (not exclusive)
  • Stack-walkers don’t use synchrony when getting jitInfo
  • No need to synchronize AVLTree access when adding/accessing/removing jitInfo, only hash-table.

Solutions

  1. Use VMExclusiveAccess/VMAccess for jitInfo changes/access (hash-table)
    • Using existing reader/writer synchrony infrastructure
    • No need for monitor if hash-table access use VMAccess for synchrony
    • Using VMAccess on hash-table synchrony intrudes/may-hold other VMExclusiveAccess actions unnecessarily
    • Least intrusive solution(?)
  2. Add monitor to stack-walker
    • “Obvious” solution, not performant (Sequentializes different stack-walkers, getting jitInfo)
    • Might be better than expected if cache-hit most of the time and rarely need the monitor when stack walking(?)
    • Walking parked virtual-threads continuation does not use the cache.
  3. Use reader/writer synchrony (instead of normal monitor) in artifact modification/access
    • “Obvious” solution improved
    • Change monitor to reader/writer mutex
    • Overhead of reader/writer synchrony(?)

Ended up prototyping solution number 3.

Experiments

Drivers

Testing 4 different drivers:

  • A: Base/Nightly build
  • B: Replace hash-table monitor with reader/writer mutex
    • Insert/remove artifact gets writer mutex
    • Retrieve artifact gets reader mutex
    • Does not solve race-condition, stack-walker does not get the mutex
    • Evaluate reader/writer mutex overhead
  • C: Stack-walker gets reader mutex when needed
    • Only get mutex when getting jitInfo/artifact, if cache miss and going to hash-table
    • Possible many-short mutex acquire/release
  • D: Stack-walker always hold reader mutex
    • Acquire reader mutex during all of stack-walk
    • Few-long mutex acquire/release
    • Holds reader access more than needed
  • E: Solution 2 (use the same monitor for the stackwalker)
    • Will synchronize stack-walkers getting jitInfo
    • Only when accessing hash-table (cache-miss)

Results (UPDATED)

SPECjbb2015 max-jOPS Diff
A 67801 100.00%
B 68292 100.73%
C 67699 99.85%
D 36185 53.37%
E 67894 100.14%

Currently looking at profiles, only immediate apparent observation in D 25% of time spent in omrthread_spinlock_acquire

Conclusion

Seem like C (solution 3) and E (solution 2) are valid solutions.

Best solution is "C", getting the mutex only when needed even if more frequent acquires/releases.
Overhead from using reader/writer mutex instead of the monitor alone is 4.5%, and using the mutex to solve the race condition is another 1%.

Might be worth prototyping and measuring solution number 2, as difference between C and D might indicate high cache-hits, making solution 2 less problematic. But with virtual threads, you don't have the cache used, and having GC stack-walking many vThreads and synchronize around the monitor to stack-walk might be problematic, depend on parallel GC threads/stack-walkers.

I'll analyze profiles of the different drivers to see if something might help reducing the 4.5% effected by the reader/writer mutex change.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions