Skip to content

fix(profiler): TripleBufferedDictionary — fix vtable-stub class lookup and counter tracking#525

Closed
jbachorik wants to merge 4 commits into
mainfrom
fix/double-buffered-dictionary
Closed

fix(profiler): TripleBufferedDictionary — fix vtable-stub class lookup and counter tracking#525
jbachorik wants to merge 4 commits into
mainfrom
fix/double-buffered-dictionary

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

What does this PR do?:

Replaces DoubleBufferedDictionary with TripleBufferedDictionary backed by a generic TripleBufferRotator<T> template, fixing two regressions introduced by the double-buffered class/endpoint/context maps:

  1. Vtable-stub class lookup regression (walkVM bounded_lookup): with two buffers, class names inserted by fillJavaMethodInfo() only land in the dump buffer and are invisible to the active buffer between dumps. The triple-buffer adds a _recent_ptr (last completed dump's snapshot) that bounded_lookup(size_limit=0) falls back to when the active buffer misses. The fallback is guarded by RefCountGuard with TOCTOU revalidation (pointer-first protocol), identical to the CallTraceStorage pattern.

  2. dictionary_classes_keys counter always 0 during wall-clock profiling: fill-path inserts went to the standby buffer whose counter id was 0 (not the named slot). All three TripleBufferedDictionary buffers now carry the real counter id so every insert is counted correctly.

Also extracts RefCountGuard into a standalone refCountGuard.h (generalised from CallTraceHashTable* to void*) and the triple-buffer rotation logic into a generic TripleBufferRotator<T> template in tripleBuffer.h, shared between TripleBufferedDictionary and (existing) CallTraceStorage.

Motivation:

Follow-up fixes for PR #524. Synthetic-class churn means we cannot keep the class map growing forever; the triple-buffer bounds memory to ~2× one dump cycle's data and purges dead class names after at most one full dump cycle.

Additional Notes:

  • clearStandby() uses RefCountGuard::waitForRefCountToClear(old_recent) instead of lockAll()/unlockAll() — the lock-free drain is sufficient and avoids stalling all signal handlers.
  • bounded_lookup(size_limit=0) is read-only (no malloc) and therefore signal-safe; the recent-fallback path preserves this property.
  • The three buffer roles cycle: active → dump → recent → (cleared). At most two non-empty buffers exist at any time.

How to test the change?:

  • ddprof-lib:gtestDebug_dictionary_ut — fully rewritten for TripleBufferedDictionary semantics; includes BoundedLookupReadOnlyFallsBackToRecent which verifies keys survive exactly one dump cycle then disappear.
  • DictionaryRotationTest (Java integration) — asserts counter resets after clearStandby and correct counts after fill-path inserts.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: [JIRA-XXXX]

jbachorik and others added 4 commits May 12, 2026 15:04
…edDictionary

Replace SpinLock-guarded Dictionary with DoubleBufferedDictionary for
_class_map, _string_label_map and _context_value_map.

Signal handlers and JNI writers write to the active buffer lock-free.
The dump/stop paths call rotate() before lockAll() so the standby buffer
holds a stable snapshot for writeCpool()/writeClasses(); clearStandby()
frees old-active memory and resets counters after the dump completes.

Removes _class_map_lock, classMapSharedGuard(), classMapTrySharedGuard(),
tryLockSharedBounded() and BoundedOptionalSharedLockGuard — all of which
existed solely to protect Dictionary reads from concurrent clear() calls.

Fixes aarch64 regression where tryLockSharedBounded(5) spuriously failed
under heavy wall-clock load, causing class lookups to return -1 and
corrupting JFR recordings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
J9 uses a different stack-walker that does not populate the HotSpot
class map, so dictionary_classes_keys stays 0. Wrap the regression
assertion with a non-J9 guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dictionary_classes_keys only tracks active-dict inserts from signal
handlers (objectSampler/liveness). For pure wall-clock profiling, all
class name inserts happen via fillJavaMethodInfo() into the standby
(id=0 slot) and never reach the active dict — so the counter is 0 on
both HotSpot and J9. The regression for this path is that
MethodSample events are generated, which verifyEvents() already checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rRotator

- Extract RefCountGuard to standalone refCountGuard.h (void* generalised)
- Add generic TripleBufferRotator<T> template in tripleBuffer.h
- Replace DoubleBufferedDictionary with TripleBufferedDictionary:
  - Three buffers (active/dump/recent) fix vtable-stub class lookup regression
  - bounded_lookup falls back to _recent_ptr for read-only signal-safe path
  - All three buffers carry real counter id (fixes dictionary_classes_keys=0)
  - clearStandby uses waitForRefCountToClear instead of lockAll/unlockAll
- Update profiler.h maps to TripleBufferedDictionary
- Rewrite dictionary_ut.cpp for TripleBufferedDictionary semantics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik added the AI label May 12, 2026
@jbachorik
Copy link
Copy Markdown
Collaborator Author

Superseded by PR #524 — the TripleBufferedDictionary commit was pushed directly to that branch.

@jbachorik jbachorik closed this May 12, 2026
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 12, 2026

CI Test Results

Run: #25748670182 | Commit: e576438 | Duration: 51m 2s (longest job)

21 of 32 test jobs failed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Failed Tests

musl-amd64/debug / 21-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-amd64/debug / 17-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 21-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 17-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-amd64/debug / 25-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 11-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-aarch64/debug / 25-librca

Job: View logs

No detailed failure information available. Check the job logs.

musl-amd64/debug / 11-librca

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 8-ibm

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 8

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 17

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 17-graal

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 11-j9

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 17-j9

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 8-j9

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 8-orcl

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 21-graal

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 21

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 25

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 11

Job: View logs

No detailed failure information available. Check the job logs.

glibc-amd64/debug / 25-graal

Job: View logs

No detailed failure information available. Check the job logs.

Summary: Total: 32 | Passed: 11 | Failed: 21


Updated: 2026-05-12 17:39:08 UTC

// Advance _active_index by 1 mod 3.
// After this call the old active is accessible via dumpBuffer().
void rotate() {
int old = __atomic_load_n(&_active_index, __ATOMIC_ACQUIRE);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-atomic rotate() — concurrent callers can alias the same index

rotate() performs a non-atomic load-then-store on _active_index. If stop() and dump() race to call it concurrently, both can read the same old value and advance to the same next index, silently skipping a rotation step.

Applied as code fix — replaced with a CAS loop so each concurrent caller advances by exactly one step.


// Clears all three buffers. Call on profiler reset (no concurrent writers).
void clearAll() {
_a.clear();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DICTIONARY_KEYS counter zeroed incorrectly in clearStandby()

clearStandby() calls Dictionary::clear() which resets the shared DICTIONARY_KEYS counter slot to zero. Any insertions into the current active buffer that happened after the preceding rotate() — but before clearStandby() — are silently discarded from the counter.

Fixed per user guidance: "Fix the keys tracking"

Applied as code fix — after clear() the counter is recalibrated to active()->size() so monitoring sees only live entries.


// Promote accumulated writes to standby so that writeCpool() (called from
// ~Recording() inside _jfr.stop()) reads a complete, stable snapshot.
_class_map.rotate();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop() never reclaims standby buffers — memory held until next restart

stop() rotates all three maps to give JFR a stable snapshot, but never calls clearStandby(). The standby buffers (holding the last pre-stop entries) are not freed until clearAll() on the next start(), leaking memory for the full IDLE period.

Applied as code fix — added clearStandby() calls after _jfr.stop(), mirroring the dump() path.


// dump() triggers: rotate() → lockAll() → jfr.dump(snapshot) → unlockAll()
// → clearStandby() (resets counter to 0, frees pre-dump buffer)
Path snapshot = Files.createTempFile(Paths.get("/tmp/recordings"), "DictionaryRotation_snapshot_", ".jfr");
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded /tmp/recordings path may not exist in CI

Paths.get("/tmp/recordings") fails with NoSuchFileException in CI environments that do not pre-create this directory.

Applied as code fix — changed to Files.createTempFile("DictionaryRotation_snapshot_", ".jfr") with no directory argument, which uses the JVM default temp dir.

import static org.openjdk.jmc.common.unit.UnitLookup.PLAIN_TEXT;

/**
* Verifies that the DoubleBufferedDictionary rotate+clearStandby cycle correctly:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale class name in Javadoc

The class Javadoc says DoubleBufferedDictionary but the production class is TripleBufferedDictionary.

Applied as code fix.

active_table->setInstanceId(getNextInstanceId());
active_table->setParentStorage(this);
__atomic_store_n(&_active_storage, active_table.release(), __ATOMIC_RELEASE);
auto active_ptr = std::make_unique<CallTraceHashTable>();
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent naming: active_ptr / standby_table in the same block

active_ptr was renamed but the adjacent standby_table was not, creating an inconsistent naming convention in the same initialization block.

Applied as code fix — renamed standby_table to standby_ptr.

};

// Triple-buffered wrapper for signal-handler-safe concurrent dictionary access.
//
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary::_id should be const

_id was changed from const int to int to support setCounterId(), but setCounterId() is not called anywhere in production or test code, so the non-const field only weakens the compile-time immutability invariant.

Applied as code fix — restored const int _id and removed unused setCounterId().

*/
struct alignas(DEFAULT_CACHE_LINE_SIZE) RefCountSlot {
volatile uint32_t count; // Reference count (0 = inactive)
char _padding1[4]; // Alignment padding for pointer
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded _padding1[4] assumes 64-bit ABI

_padding1[4] aligns active_ptr on an 8-byte boundary on 64-bit, but on a 32-bit ABI where void* is 4 bytes the padding is unnecessary and the remaining[] calculation silently over-fills.

Applied as code fix — replaced with alignas(alignof(void*)) void* active_ptr and a portable padding formula DEFAULT_CACHE_LINE_SIZE - alignof(void*) - sizeof(void*).

@jbachorik
Copy link
Copy Markdown
Collaborator Author

RefCountGuard activation window — trace-drop, not UAF

The RefCountGuard constructor stores the active pointer before incrementing the reference count. There is a theoretical window between these two operations where a scanner (calling waitForRefCountToClear) could see count=0 and skip the slot, allowing clearStandby() to proceed two dump cycles later while the slot is in the process of being activated.

In practice this window is never hit for signal handlers: signal handlers complete within microseconds, while clearStandby() can only be called after two full dump cycles (typically seconds apart).

Observable effect if the window were hit: bounded_lookup returns INT_MAX (miss) — a dropped trace or a generic vtable frame, not a crash or heap corruption. This is the graceful degradation path already in use today.

Fixed per user guidance: confirmed as observable incorrect-state / trace-drop, not a hard UAF. Updated comments in refCountGuard.h and TripleBufferedDictionary to document the activation window and its bounded, graceful consequence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant