fix(profiler): TripleBufferedDictionary — fix vtable-stub class lookup and counter tracking#525
fix(profiler): TripleBufferedDictionary — fix vtable-stub class lookup and counter tracking#525jbachorik wants to merge 4 commits into
Conversation
…edDictionary Replace SpinLock-guarded Dictionary with DoubleBufferedDictionary for _class_map, _string_label_map and _context_value_map. Signal handlers and JNI writers write to the active buffer lock-free. The dump/stop paths call rotate() before lockAll() so the standby buffer holds a stable snapshot for writeCpool()/writeClasses(); clearStandby() frees old-active memory and resets counters after the dump completes. Removes _class_map_lock, classMapSharedGuard(), classMapTrySharedGuard(), tryLockSharedBounded() and BoundedOptionalSharedLockGuard — all of which existed solely to protect Dictionary reads from concurrent clear() calls. Fixes aarch64 regression where tryLockSharedBounded(5) spuriously failed under heavy wall-clock load, causing class lookups to return -1 and corrupting JFR recordings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
J9 uses a different stack-walker that does not populate the HotSpot class map, so dictionary_classes_keys stays 0. Wrap the regression assertion with a non-J9 guard. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dictionary_classes_keys only tracks active-dict inserts from signal handlers (objectSampler/liveness). For pure wall-clock profiling, all class name inserts happen via fillJavaMethodInfo() into the standby (id=0 slot) and never reach the active dict — so the counter is 0 on both HotSpot and J9. The regression for this path is that MethodSample events are generated, which verifyEvents() already checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rRotator - Extract RefCountGuard to standalone refCountGuard.h (void* generalised) - Add generic TripleBufferRotator<T> template in tripleBuffer.h - Replace DoubleBufferedDictionary with TripleBufferedDictionary: - Three buffers (active/dump/recent) fix vtable-stub class lookup regression - bounded_lookup falls back to _recent_ptr for read-only signal-safe path - All three buffers carry real counter id (fixes dictionary_classes_keys=0) - clearStandby uses waitForRefCountToClear instead of lockAll/unlockAll - Update profiler.h maps to TripleBufferedDictionary - Rewrite dictionary_ut.cpp for TripleBufferedDictionary semantics Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Superseded by PR #524 — the TripleBufferedDictionary commit was pushed directly to that branch. |
CI Test ResultsRun: #25748670182 | Commit:
Status Overview
Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled Failed Testsmusl-amd64/debug / 21-librcaJob: View logs No detailed failure information available. Check the job logs. musl-amd64/debug / 17-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 21-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 17-librcaJob: View logs No detailed failure information available. Check the job logs. musl-amd64/debug / 25-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 11-librcaJob: View logs No detailed failure information available. Check the job logs. musl-aarch64/debug / 25-librcaJob: View logs No detailed failure information available. Check the job logs. musl-amd64/debug / 11-librcaJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 8-ibmJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 17Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 17-graalJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 11-j9Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 17-j9Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 8-j9Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 8-orclJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 21-graalJob: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 21Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 25Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 11Job: View logs No detailed failure information available. Check the job logs. glibc-amd64/debug / 25-graalJob: View logs No detailed failure information available. Check the job logs. Summary: Total: 32 | Passed: 11 | Failed: 21 Updated: 2026-05-12 17:39:08 UTC |
| // Advance _active_index by 1 mod 3. | ||
| // After this call the old active is accessible via dumpBuffer(). | ||
| void rotate() { | ||
| int old = __atomic_load_n(&_active_index, __ATOMIC_ACQUIRE); |
There was a problem hiding this comment.
Non-atomic rotate() — concurrent callers can alias the same index
rotate() performs a non-atomic load-then-store on _active_index. If stop() and dump() race to call it concurrently, both can read the same old value and advance to the same next index, silently skipping a rotation step.
Applied as code fix — replaced with a CAS loop so each concurrent caller advances by exactly one step.
|
|
||
| // Clears all three buffers. Call on profiler reset (no concurrent writers). | ||
| void clearAll() { | ||
| _a.clear(); |
There was a problem hiding this comment.
DICTIONARY_KEYS counter zeroed incorrectly in clearStandby()
clearStandby() calls Dictionary::clear() which resets the shared DICTIONARY_KEYS counter slot to zero. Any insertions into the current active buffer that happened after the preceding rotate() — but before clearStandby() — are silently discarded from the counter.
Fixed per user guidance: "Fix the keys tracking"
Applied as code fix — after clear() the counter is recalibrated to active()->size() so monitoring sees only live entries.
|
|
||
| // Promote accumulated writes to standby so that writeCpool() (called from | ||
| // ~Recording() inside _jfr.stop()) reads a complete, stable snapshot. | ||
| _class_map.rotate(); |
There was a problem hiding this comment.
stop() never reclaims standby buffers — memory held until next restart
stop() rotates all three maps to give JFR a stable snapshot, but never calls clearStandby(). The standby buffers (holding the last pre-stop entries) are not freed until clearAll() on the next start(), leaking memory for the full IDLE period.
Applied as code fix — added clearStandby() calls after _jfr.stop(), mirroring the dump() path.
|
|
||
| // dump() triggers: rotate() → lockAll() → jfr.dump(snapshot) → unlockAll() | ||
| // → clearStandby() (resets counter to 0, frees pre-dump buffer) | ||
| Path snapshot = Files.createTempFile(Paths.get("/tmp/recordings"), "DictionaryRotation_snapshot_", ".jfr"); |
There was a problem hiding this comment.
Hardcoded /tmp/recordings path may not exist in CI
Paths.get("/tmp/recordings") fails with NoSuchFileException in CI environments that do not pre-create this directory.
Applied as code fix — changed to Files.createTempFile("DictionaryRotation_snapshot_", ".jfr") with no directory argument, which uses the JVM default temp dir.
| import static org.openjdk.jmc.common.unit.UnitLookup.PLAIN_TEXT; | ||
|
|
||
| /** | ||
| * Verifies that the DoubleBufferedDictionary rotate+clearStandby cycle correctly: |
There was a problem hiding this comment.
Stale class name in Javadoc
The class Javadoc says DoubleBufferedDictionary but the production class is TripleBufferedDictionary.
Applied as code fix.
| active_table->setInstanceId(getNextInstanceId()); | ||
| active_table->setParentStorage(this); | ||
| __atomic_store_n(&_active_storage, active_table.release(), __ATOMIC_RELEASE); | ||
| auto active_ptr = std::make_unique<CallTraceHashTable>(); |
There was a problem hiding this comment.
Inconsistent naming: active_ptr / standby_table in the same block
active_ptr was renamed but the adjacent standby_table was not, creating an inconsistent naming convention in the same initialization block.
Applied as code fix — renamed standby_table to standby_ptr.
| }; | ||
|
|
||
| // Triple-buffered wrapper for signal-handler-safe concurrent dictionary access. | ||
| // |
There was a problem hiding this comment.
Dictionary::_id should be const
_id was changed from const int to int to support setCounterId(), but setCounterId() is not called anywhere in production or test code, so the non-const field only weakens the compile-time immutability invariant.
Applied as code fix — restored const int _id and removed unused setCounterId().
| */ | ||
| struct alignas(DEFAULT_CACHE_LINE_SIZE) RefCountSlot { | ||
| volatile uint32_t count; // Reference count (0 = inactive) | ||
| char _padding1[4]; // Alignment padding for pointer |
There was a problem hiding this comment.
Hard-coded _padding1[4] assumes 64-bit ABI
_padding1[4] aligns active_ptr on an 8-byte boundary on 64-bit, but on a 32-bit ABI where void* is 4 bytes the padding is unnecessary and the remaining[] calculation silently over-fills.
Applied as code fix — replaced with alignas(alignof(void*)) void* active_ptr and a portable padding formula DEFAULT_CACHE_LINE_SIZE - alignof(void*) - sizeof(void*).
|
RefCountGuard activation window — trace-drop, not UAF The In practice this window is never hit for signal handlers: signal handlers complete within microseconds, while Observable effect if the window were hit:
|
What does this PR do?:
Replaces
DoubleBufferedDictionarywithTripleBufferedDictionarybacked by a genericTripleBufferRotator<T>template, fixing two regressions introduced by the double-buffered class/endpoint/context maps:Vtable-stub class lookup regression (
walkVMbounded_lookup): with two buffers, class names inserted byfillJavaMethodInfo()only land in the dump buffer and are invisible to the active buffer between dumps. The triple-buffer adds a_recent_ptr(last completed dump's snapshot) thatbounded_lookup(size_limit=0)falls back to when the active buffer misses. The fallback is guarded byRefCountGuardwith TOCTOU revalidation (pointer-first protocol), identical to theCallTraceStoragepattern.dictionary_classes_keyscounter always 0 during wall-clock profiling: fill-path inserts went to the standby buffer whose counter id was 0 (not the named slot). All threeTripleBufferedDictionarybuffers now carry the real counter id so every insert is counted correctly.Also extracts
RefCountGuardinto a standalonerefCountGuard.h(generalised fromCallTraceHashTable*tovoid*) and the triple-buffer rotation logic into a genericTripleBufferRotator<T>template intripleBuffer.h, shared betweenTripleBufferedDictionaryand (existing)CallTraceStorage.Motivation:
Follow-up fixes for PR #524. Synthetic-class churn means we cannot keep the class map growing forever; the triple-buffer bounds memory to ~2× one dump cycle's data and purges dead class names after at most one full dump cycle.
Additional Notes:
clearStandby()usesRefCountGuard::waitForRefCountToClear(old_recent)instead oflockAll()/unlockAll()— the lock-free drain is sufficient and avoids stalling all signal handlers.bounded_lookup(size_limit=0)is read-only (no malloc) and therefore signal-safe; the recent-fallback path preserves this property.How to test the change?:
ddprof-lib:gtestDebug_dictionary_ut— fully rewritten forTripleBufferedDictionarysemantics; includesBoundedLookupReadOnlyFallsBackToRecentwhich verifies keys survive exactly one dump cycle then disappear.DictionaryRotationTest(Java integration) — asserts counter resets afterclearStandbyand correct counts after fill-path inserts.For Datadog employees:
credentials of any kind, I've requested a review from
@DataDog/security-design-and-guidance.