Skip to content

fix(profiler): lock-free class/endpoint/context maps via TripleBufferedDictionary#524

Draft
jbachorik wants to merge 20 commits into
mainfrom
muse/crash-sigsegv-in-std-rb-tree-increment-clean
Draft

fix(profiler): lock-free class/endpoint/context maps via TripleBufferedDictionary#524
jbachorik wants to merge 20 commits into
mainfrom
muse/crash-sigsegv-in-std-rb-tree-increment-clean

Conversation

@jbachorik
Copy link
Copy Markdown
Collaborator

@jbachorik jbachorik commented May 12, 2026

What does this PR do?:

Replaces the SpinLock-guarded Dictionary instances for _class_map, _string_label_map, and _context_value_map with a new TripleBufferedDictionary that eliminates all locking from the read/write fast paths.

TripleBufferedDictionary holds three Dictionary buffers cycling through three roles via a generic TripleBufferRotator<T> template:

  • active — receives new writes (signal handlers + JNI threads), lock-free via CAS
  • dump — snapshot being read by the dump thread; promoted from old active on rotate()
  • scratch — two rotations behind active; ready to be cleared lock-free

The "scratch" role exists for safe lock-free reclamation: when a buffer enters that role, at least one full dump cycle has elapsed since it was last in the active or dump role. That grace period is much longer than any signal-handler (per-thread-locked, drained by lockAll() around the dump) or JNI-thread (microsecond lookup) can plausibly outlive a stale active pointer, so the buffer can be freed without any explicit drain.

bounded_lookup(size_limit=0) is signal-safe (no malloc) and checks the active buffer only — there is no fallback to older snapshots.

As part of this change the following dead code is removed:

  • _class_map_lock (SpinLock)
  • classMapSharedGuard() / classMapTrySharedGuard() on Profiler
  • tryLockSharedBounded() and BoundedOptionalSharedLockGuard on SpinLock

Motivation:

Three production crashes (fingerprint v10.DAECC680F0728EAB44F26DB0B91B703F, 2026-05-06 to 2026-05-08) showed SIGSEGV in std::_Rb_tree_increment via Recording::writeCpoolRecording::writeClassesDictionary::collect, caused by a race between writeClasses and concurrent Dictionary::clear().

PR #516 patched this with a shared-lock, but that introduced tryLockSharedBounded(5) in the signal-handler path (walkVM). Under heavy 100 µs wall-clock load on aarch64 the bounded CAS retries were consistently exhausted, causing class lookups to return -1 and corrupting JFR recordings.

This PR also fixes a related counter-tracking gap: dictionary_classes_keys was always 0 during wall-clock profiling because fill-path inserts went to a buffer with counter id=0. All three buffers now carry the real id.

Note: walkVM's vtable-stub class resolution remains best-effort (it can only find classes that some other path has already inserted into the active buffer); a proper fix would require pre-populating the dictionary via JVMTI ClassPrepare, which is left to a follow-up.

Supersedes PR #522 (fix(profiler): fix SIGSEGV in Dictionary::clear under concurrent lookup).

Additional Notes:

  • The grace period for safe lock-free clearing is the time between two consecutive clearStandby() calls (one full dump interval, typically ≥60s). This is many orders of magnitude longer than any signal-handler or JNI lookup, so explicit drains (RefCountGuard, lockAll) are unnecessary for the dictionary clear path.
  • At most two non-empty buffers exist at any time (active + dump).

How to test the change?:

  • ddprof-lib:gtestDebug_dictionary_ut — covers TripleBufferedDictionary rotation, counter semantics, and concurrent writer safety.
  • DictionaryRotationTest (Java) — counter reset after clearStandby; correct counts after fill-path inserts.

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

  • This PR doesn't touch any of that.

  • JIRA: [JIRA-XXXX]

@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 12, 2026

CI Test Results

Run: #25924315087 | Commit: 9131fd0 | Duration: 31m 4s (longest job)

All 32 test jobs passed

Status Overview

JDK glibc-aarch64/debug glibc-amd64/debug musl-aarch64/debug musl-amd64/debug
8 - - -
8-ibm - - -
8-j9 - -
8-librca - -
8-orcl - - -
11 - - -
11-j9 - -
11-librca - -
17 - -
17-graal - -
17-j9 - -
17-librca - -
21 - -
21-graal - -
21-librca - -
25 - -
25-graal - -
25-librca - -

Legend: ✅ passed | ❌ failed | ⚪ skipped | 🚫 cancelled

Summary: Total: 32 | Passed: 32 | Failed: 0


Updated: 2026-05-15 15:24:20 UTC

@jbachorik jbachorik force-pushed the muse/crash-sigsegv-in-std-rb-tree-increment-clean branch 2 times, most recently from 2f23bab to 132b472 Compare May 13, 2026 07:42
@jbachorik jbachorik changed the title fix(profiler): lock-free class/endpoint/context maps via DoubleBufferedDictionary fix(profiler): lock-free class/endpoint/context maps via TripleBufferedDictionary May 13, 2026
jbachorik and others added 2 commits May 13, 2026 17:38
Production crash (SIGSEGV) in Recording::cleanupUnreferencedMethods,
first seen in dd-trace-java 1.56.1 after PR #327 introduced method-map
cleanup. cleanupUnreferencedMethods() was called after finishChunk()
released the GetLoadedClasses pins, so jvmti->Deallocate(_ptr) inside
~SharedLineNumberTable could access freed line number table memory on
JVMs that reclaim JVMTI allocations on class unload.

Fix: detach SharedLineNumberTable from JVMTI lifetime by copying the
table into a malloc'd buffer in Lookup::fillJavaMethodInfo() and freeing
the JVMTI allocation immediately. SharedLineNumberTable destructor now
calls free(). As defence-in-depth, finishChunk() gains a do_cleanup
parameter so cleanup runs inside the GetLoadedClasses pin window.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On musl/aarch64/JDK11, HotSpot's deoptimisation blob
(generate_deopt_blob in sharedRuntime_aarch64.cpp) rebuilds interpreter
frames near the compiled frame's stack boundary, corrupting the top
~224 bytes of the thread stack where start_routine_wrapper_spec's frame
lives. Two crashes follow:

(a) -fstack-protector-strong inserts a canary into any frame with a
    non-trivially-destructed local (e.g. struct Cleanup); the canary
    lands in the corruption zone and fires __stack_chk_fail.
(b) Even without a canary, 'return' loads the corrupted saved LR and
    jumps to a garbage address.

Fix: no_stack_protector removes the canary; pthread_exit() replaces
'return' so LR is never used; cleanup is performed explicitly with
the tid read from TLS (ProfiledThread::currentTid()), which survives
frame corruption.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik force-pushed the muse/crash-sigsegv-in-std-rb-tree-increment-clean branch from 7844134 to b90761e Compare May 13, 2026 16:15
@jbachorik jbachorik changed the base branch from main to muse/sigsegv-in-recording May 13, 2026 16:15
JavaThread::~JavaThread / OSThread::~OSThread crashed on JDK 25 when the
ddprof pthread_create hook delivered SIGVTALRM between
Profiler::unregisterThread() returning and ProfiledThread::release()
acquiring its internal guard. The signal handler called
currentSignalSafe() and dereferenced the now-freed ProfiledThread.

Fix: extract unregister_and_release(tid) — a noinline helper that holds
a SignalBlocker for the entire unregister+release sequence. Both
start_routine_wrapper and start_routine_wrapper_spec invoke it; the
race window is eliminated without duplicating signal-masking logic.

Same SignalBlocker pattern is applied to perfEvents_linux.cpp's
pthread_setspecific_hook teardown path.

thread.h guards clearCurrentThreadTLS() with #ifdef UNIT_TEST so it
is absent from production builds; GtestTaskBuilder.kt adds -DUNIT_TEST
to the gtest compiler flags so the guarded method compiles in tests.

thread_teardown_safety_ut.cpp adds an acceptance-test suite
(ThreadTeardownSafetyTest T-01..T-10) covering the full teardown
lifecycle under signal load.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik force-pushed the muse/crash-sigsegv-in-std-rb-tree-increment-clean branch from b90761e to 76d919d Compare May 13, 2026 16:52
ASSERT_NE expands to a bare 'return;' on failure, which is a compile
error in a function whose return type is void*. Use ADD_FAILURE +
explicit 'return nullptr;' instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik force-pushed the muse/crash-sigsegv-in-std-rb-tree-increment-clean branch from 76d919d to 2105b61 Compare May 14, 2026 17:16
jbachorik and others added 3 commits May 15, 2026 12:01
- libraryPatcher_linux.cpp:151 — add __builtin_unreachable() after pthread_exit()
- flightRecorder.cpp:607 — update pin-window comment to reflect malloc'd ownership
- CleanupAfterClassUnloadTest.java:51 — fix Javadoc to describe both fix mechanisms
- thread_teardown_safety_ut.cpp:43 — add SigGuard RAII to restore signal dispositions
- thread_teardown_safety_ut.cpp:236 — gate T-07 on __GLIBC__; add musl cleanup path

Co-Authored-By: muse <muse@noreply>
- Add pthread_cleanup_push/pop + noinline cleanup_unregister() to
  start_routine_wrapper_spec so ProfiledThread is released when
  the wrapped routine calls pthread_exit() or the thread is canceled
- Extend CleanupAfterClassUnloadTest from AbstractDynamicClassTest to
  reuse generateClassBytecode/IsolatedClassLoader/tempFile helpers

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik force-pushed the muse/crash-sigsegv-in-std-rb-tree-increment-clean branch from 30a5959 to 6645120 Compare May 15, 2026 13:26
jbachorik and others added 2 commits May 15, 2026 15:35
- thread_teardown_safety_ut.cpp: guard T-01 sentinel with ASSERT_NE(kNotYetRun) to short-circuit if handler never ran; replace T-06 absolute mask assertions with relative before/after comparisons
- AbstractDynamicClassTest.java: split compound Label statements to separate lines for Spotless/google-java-format compliance

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik requested a review from Copilot May 15, 2026 14:37
@jbachorik
Copy link
Copy Markdown
Collaborator Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 66451207c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1404 to +1405
_string_label_map.rotate();
_context_value_map.rotate();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rotate label dictionaries only after blocking writers

When a Java thread calls recordTrace0 in the window after these rotations but before lockAll(), bounded_lookup inserts the endpoint/operation into the new active dictionary while recordTraceRoot can still acquire _locks and append the event to the current recording. writeCpool() then serializes only stringLabelMap()->standby() and contextValueMap()->standby() (flightRecorder.cpp:1187-1190), so that recording can contain a trace-root event whose label id is absent from the constant pool. Block/drop event writers before rotating these two dictionaries, or include the post-rotate active labels in the same dump.

Useful? React with 👍 / 👎.

// and after rotate() the new active is already fully populated.
void rotatePersistent() {
Dictionary* old_active = _rot.active();
_rot.clearTarget()->mergeFrom(*old_active);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drain class-map writers before copying persistent entries

If lookupClass() inserts a class into old_active while this merge is already iterating, the new key can be missed by clearTarget()->mergeFrom(*old_active). After _rot.rotate() that partially copied buffer becomes the active class map, so later signal-handler bounded_lookup(..., 0) calls can miss a class that was registered just before the dump/stop rotation and emit generic/missing vtable frames until it is registered again. The old active should be drained or otherwise made stable before it is used as the source for the persistent copy.

Useful? React with 👍 / 👎.

// on miss; callers must tolerate misses.
unsigned int bounded_lookup(const char* key, size_t length, int size_limit) {
Dictionary* active = _rot.active();
RefCountGuard guard(active);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid reusing one refcount slot for nested dictionary guards

This guard uses the global RefCountGuard slot, which stores only one active_ptr per thread. If a profiling signal interrupts a JNI dictionary lookup (for example recordTrace0/registerConstant0) and performs a class-map bounded_lookup, the inner guard overwrites and then clears the same slot while the outer dictionary lookup is still using its buffer; a concurrent rotation can then stop seeing the outer buffer in waitForRefCountToClear() and clear it underneath the outer lookup. The dictionary path needs either non-reentrant exclusion around these guards or per-thread stacked/nested protected pointers before it can be safely used from both JNI code and signal handlers.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces the SpinLock-guarded Dictionary instances (_class_map, _string_label_map, _context_value_map) with a new TripleBufferedDictionary that rotates three Dictionary buffers (active / dump / scratch) under a generic TripleBufferRotator<T> template. Writes go to the active buffer lock-free; the dump thread reads standby() after rotate()/rotatePersistent() drains in-flight readers via RefCountGuard. The previous bounded shared-lock primitives in spinLock.h and the related classMap*Guard() factory methods are removed, and RefCountGuard is generalised (extracted to refCountGuard.h) to operate on void* so it can protect dictionaries as well as call-trace tables.

Changes:

  • New TripleBufferRotator<T> + TripleBufferedDictionary (with rotate, rotatePersistent, clearStandby, clearAll) and a new Dictionary::mergeFrom used by rotatePersistent.
  • Generalised RefCountGuard to void* and extracted it from callTraceStorage.h into refCountGuard.h; removed tryLockSharedBounded / BoundedOptionalSharedLockGuard and the _class_map_lock member.
  • Wired Profiler::dump/stop/start to rotate / clearStandby / clearAll, switched flightRecorder.cpp and hotspotSupport.cpp to read standby() / call bounded_lookup() directly without locking, replaced old gtest suites with a new dictionary_ut.cpp, added a Java DictionaryRotationTest, and disabled ContendedCallTraceStorageTest on musl/aarch64.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
ddprof-lib/src/main/cpp/tripleBuffer.h New generic 3-buffer rotator with atomic CAS rotate.
ddprof-lib/src/main/cpp/refCountGuard.h New header extracting RefCountGuard/RefCountSlot and generalising active_ptr to void*.
ddprof-lib/src/main/cpp/dictionary.h Adds TripleBufferedDictionary, Dictionary::mergeFrom, counterId(), size() accessors.
ddprof-lib/src/main/cpp/dictionary.cpp Implements mergeFrom (recursive re-insert via lookup).
ddprof-lib/src/main/cpp/spinLock.h Removes tryLockSharedBounded and BoundedOptionalSharedLockGuard.
ddprof-lib/src/main/cpp/profiler.h Replaces Dictionary members with TripleBufferedDictionary; drops _class_map_lock + guard factories.
ddprof-lib/src/main/cpp/profiler.cpp Switches start/stop/dump/lookupClass to triple-buffer ops; removes shared-lock dance.
ddprof-lib/src/main/cpp/flightRecorder.cpp writeCpool/writeClasses now read standby() snapshot without lock.
ddprof-lib/src/main/cpp/hotspot/hotspotSupport.cpp Removes classMapTrySharedGuard; uses bounded_lookup directly in vtable-stub path.
ddprof-lib/src/main/cpp/callTraceStorage.h/.cpp Removes the inline RefCountGuard definitions and updates the impl to void* resource.
ddprof-lib/src/test/cpp/dictionary_ut.cpp New gtest covering rotate / clearStandby / merge / counter id / concurrent writes.
ddprof-lib/src/test/cpp/dictionary_concurrent_ut.cpp Removed (covered the old shared-lock contract).
ddprof-lib/src/test/cpp/spinlock_bounded_ut.cpp Removed alongside the deleted bounded shared-lock primitive.
ddprof-test/.../DictionaryRotationTest.java New Java test asserting pre/post-dump separation and counter recalibration.
ddprof-test/.../BoundMethodHandleMetadataSizeTest.java Drops the (now unused) counter-size assertion.
ddprof-test/.../ContendedCallTraceStorageTest.java Skips on musl/aarch64 due to a separately tracked native bug.
AGENTS.md Adds musl/aarch64/JDK11 frame-corruption rule for start_routine_wrapper_spec.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +180 to +186
// Before rotating, all entries from the current active are merged into the
// current clearTarget (the future active after rotation). Signal handlers
// observe no gap: they use the old active — still live during the merge —
// and after rotate() the new active is already fully populated.
void rotatePersistent() {
Dictionary* old_active = _rot.active();
_rot.clearTarget()->mergeFrom(*old_active);
Comment on lines +1082 to +1085
// Reset dictionaries
_class_map.clearAll();
_string_label_map.clearAll();
_context_value_map.clearAll();
Comment on lines +171 to +189
void rotate() {
Dictionary* old_active = _rot.active();
_rot.rotate();
RefCountGuard::waitForRefCountToClear(old_active);
}

// Variant of rotate() for persistent dictionaries (e.g. class map) whose
// entries must survive across dump cycles.
//
// Before rotating, all entries from the current active are merged into the
// current clearTarget (the future active after rotation). Signal handlers
// observe no gap: they use the old active — still live during the merge —
// and after rotate() the new active is already fully populated.
void rotatePersistent() {
Dictionary* old_active = _rot.active();
_rot.clearTarget()->mergeFrom(*old_active);
_rot.rotate();
RefCountGuard::waitForRefCountToClear(old_active);
}
Comment on lines +108 to +132
// Concurrency safety:
// lookup() and bounded_lookup() acquire a per-thread RefCountGuard on the
// active buffer pointer before touching it. rotate() and rotatePersistent()
// call RefCountGuard::waitForRefCountToClear(old_active) after advancing the
// active index, which provably drains all in-flight callers (signal handlers
// AND JNI threads) before the old buffer is handed to the dump thread.
// clearStandby() clears the scratch buffer, which was already drained by the
// rotate() two cycles earlier — no additional drain is needed.
//
// Trace-drop window: RefCountGuard uses a pointer-first activation protocol
// (see refCountGuard.h). In the theoretical window between storing the active
// pointer and incrementing the reference count a scanner could skip the slot.
// In practice signal handlers complete in microseconds and a buffer is only
// cleared after TWO dump cycles (seconds), so this window is never hit.
// Should it occur, bounded_lookup returns INT_MAX (miss) — a dropped trace or
// generic vtable frame — not a crash.
//
// Lifecycle per dump cycle:
// rotate() — advance active; drain old active via RefCountGuard
// standby()->... — dump thread reads stable snapshot
// clearStandby() — clear the scratch buffer (safe: drained two cycles ago)
//
// Memory: at most two non-empty buffers at any time (active + dump).
// Churn: entries purged after at most two full dump cycles.
//
Comment on lines +154 to +160
const DictRow *row = &table->rows[i];
for (int j = 0; j < CELLS; j++) {
if (const char *key = row->keys[j]) {
lookup(key, strlen(key));
}
}
if (row->next) mergeFrom(row->next);
Comment on lines +1411 to 1413
// Note: No need to clear call trace storage here - the triple-buffering in
// processTraces() already handles clearing old traces while preserving
// traces referenced by surviving LivenessTracker objects
Comment on lines +195 to +200
_rot.clearTarget()->clear();
// Dictionary::clear() zeroed the shared DICTIONARY_KEYS slot. Re-set it
// to the active buffer's actual insertion count so that monitoring sees
// only live entries, not the just-cleared scratch buffer's (zero) state.
Counters::set(DICTIONARY_KEYS, _rot.active()->size(), _counter_id);
}
protected boolean isPlatformSupported() {
return !Platform.isJ9(); // Avoid J9-specific issues
// CTimer::unregisterThread races with concurrent thread teardown on musl-aarch64 debug;
// tracked separately as a pre-existing native bug.
Comment on lines 29 to 32
assertTrue(x != 0);
stopProfiler();
verifyEvents("datadog.MethodSample");
Map<String, Long> counters = profiler.getDebugCounters();
assertFalse(counters.isEmpty());
// assert about the size of metadata here
}
Comment on lines +1301 to 1308
// Promote accumulated writes to standby so that writeCpool() (called from
// ~Recording() inside _jfr.stop()) reads a complete, stable snapshot.
_class_map.rotatePersistent();
_string_label_map.rotate();
_context_value_map.rotate();

// Acquire all spinlocks to avoid race with remaining signals
lockAll();
…edDictionary

Replace SpinLock-guarded Dictionary with DoubleBufferedDictionary for
_class_map, _string_label_map and _context_value_map.

Signal handlers and JNI writers write to the active buffer lock-free.
The dump/stop paths call rotate() before lockAll() so the standby buffer
holds a stable snapshot for writeCpool()/writeClasses(); clearStandby()
frees old-active memory and resets counters after the dump completes.

Removes _class_map_lock, classMapSharedGuard(), classMapTrySharedGuard(),
tryLockSharedBounded() and BoundedOptionalSharedLockGuard — all of which
existed solely to protect Dictionary reads from concurrent clear() calls.

Fixes aarch64 regression where tryLockSharedBounded(5) spuriously failed
under heavy wall-clock load, causing class lookups to return -1 and
corrupting JFR recordings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
jbachorik and others added 10 commits May 15, 2026 16:48
J9 uses a different stack-walker that does not populate the HotSpot
class map, so dictionary_classes_keys stays 0. Wrap the regression
assertion with a non-J9 guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dictionary_classes_keys only tracks active-dict inserts from signal
handlers (objectSampler/liveness). For pure wall-clock profiling, all
class name inserts happen via fillJavaMethodInfo() into the standby
(id=0 slot) and never reach the active dict — so the counter is 0 on
both HotSpot and J9. The regression for this path is that
MethodSample events are generated, which verifyEvents() already checks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rRotator

- Extract RefCountGuard to standalone refCountGuard.h (void* generalised)
- Add generic TripleBufferRotator<T> template in tripleBuffer.h
- Replace DoubleBufferedDictionary with TripleBufferedDictionary:
  - Three buffers (active/dump/recent) fix vtable-stub class lookup regression
  - bounded_lookup falls back to _recent_ptr for read-only signal-safe path
  - All three buffers carry real counter id (fixes dictionary_classes_keys=0)
  - clearStandby uses waitForRefCountToClear instead of lockAll/unlockAll
- Update profiler.h maps to TripleBufferedDictionary
- Rewrite dictionary_ut.cpp for TripleBufferedDictionary semantics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ionary

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The recent-fallback in bounded_lookup was not a real fix for walkVM
vtable-stub class resolution: it served data up to a full dump interval
stale, and provided nothing at all in configurations without periodic
dumps.  Walkvm lookups remain best-effort against the active buffer; a
proper fix requires pre-populating via JVMTI ClassPrepare.

Simplify TripleBufferRotator: drop _recent_ptr, recent(), advanceRecent().
The third buffer's only remaining role is a grace-period delay before
clearing, which is enough on its own to keep the lock-free clear safe
against in-flight JNI lookups.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tripleBuffer.h: use CAS loop in rotate() to prevent concurrent index aliasing
- dictionary.h: recalibrate DICTIONARY_KEYS counter in clearStandby() to active size;
  add trace-drop window documentation; restore const int _id; remove unused setCounterId()
- profiler.cpp: call clearStandby() after _jfr.stop() to reclaim standby buffers
- refCountGuard.h: use alignas(alignof(void*)) for portable padding; document activation window
- callTraceStorage.cpp: rename standby_table → standby_ptr for consistency
- DictionaryRotationTest.java: remove hardcoded /tmp/recordings path; fix stale Javadoc
- dictionary.cpp: add mergeFrom() implementations (required by rotatePersistent)
- dictionary_ut.cpp: additional counter-id and rotation tests
- AGENTS.md: document musl/aarch64 noinline frame invariant

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…er-stop tests

clearStandby() in stop() zeroed DICTIONARY_ENDPOINTS_KEYS (via _string_label_map,
id=2) after the final JFR rotation, breaking EndpointTest and DictionaryRotationTest
which both check counters after stopProfiler(). Scratch buffers are released by the
next clearAll() before profiler restart; no memory concern.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CTimer::unregisterThread races under concurrent thread teardown on
musl-aarch64 debug — pre-existing bug, tracked separately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jbachorik jbachorik force-pushed the muse/crash-sigsegv-in-std-rb-tree-increment-clean branch from 6645120 to 2a9dd08 Compare May 15, 2026 14:49
Base automatically changed from muse/sigsegv-in-recording to main May 15, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants