Skip to content

Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272

Open
max-charlamb wants to merge 18 commits into
dotnet:mainfrom
max-charlamb:maxcharlamb/stresslog-cdac-bridge
Open

Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272
max-charlamb wants to merge 18 commits into
dotnet:mainfrom
max-charlamb:maxcharlamb/stresslog-cdac-bridge

Conversation

@max-charlamb

@max-charlamb max-charlamb commented Jun 11, 2026

Copy link
Copy Markdown
Member

Note

This PR was created with assistance from GitHub Copilot.

Summary

Add a new ISOSDacInterface17 COM interface that exposes the cDAC's IStressLog contract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT processes without hardcoded struct offsets.

Motivation

SOS (!DumpLog) and clrmd currently read stress logs via raw memory reads with hardcoded struct offsets matching the CoreCLR layout. NativeAOT's StressLog struct has a different layout (different lock type, no module table, no padding sentinel), so these tools cannot read NativeAOT stress logs.

The cDAC already has a working IStressLog contract (StressLog_1/StressLog_2) that reads stress logs from both runtimes using datadescriptor.inc field offsets. This PR bridges that contract to the COM interface layer so SOS and clrmd can consume it.

Changes

cDAC contract updates

  • Add StartTime (wall-clock FILETIME) to data descriptors (CoreCLR + NativeAOT), Data/StressLog.cs, StressLogData record, and contract implementation
  • Add Address field to ThreadStressLogData for thread identification across the COM boundary
  • Add StressLog to ContractRegistry

ISOSDacInterface17 definition

  • Data structs: SOSStressLogData, SOSThreadStressLogData, SOSStressMsgData (defined inline in IDL following modern convention)
  • Enumerator interfaces: ISOSStressLogThreadEnum, ISOSStressLogMsgEnum
  • ISOSDacInterface17: GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator
  • All APIs return S_FALSE when stress log is not enabled (no separate availability check needed)

SOSDacImpl bridge

  • Thread enumerator: materialized array with try/catch COM boundary protection
  • Message enumerator: eagerly materialized with full Reset/GetCount support and GetArguments for per-batch arg retrieval
  • The native DAC does not implement this interface -- QueryInterface for the IID will fail on the legacy DAC, and only the cDAC handles it

IDL + testing

  • Full IDL definitions in sospriv.idl
  • StressLog debuggee + dump-based integration tests validating the contract and the COM interface layer

Test Results

All dump tests pass:

  • StressLogIsAvailable -- PASSED
  • StressLogDataIsValid -- PASSED
  • CanEnumerateThreadsAndMessages -- PASSED

Related PRs

Add a new ISOSDacInterface17 COM interface that exposes the cDAC's
IStressLog contract to SOS and clrmd consumers. This enables reading
stress logs from both CoreCLR and NativeAOT without hardcoded struct
offsets.

Changes:
- Add StartTime (wall-clock FILETIME) field to cDAC data descriptors,
  Data/StressLog.cs, StressLogData record, and contract implementation
- Add StressLog to ContractRegistry
- Add Address field to ThreadStressLogData for thread identification
- Define ISOSDacInterface17 with IsStressLogAvailable, GetStressLogData,
  GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and
  GetStressLogMemoryRanges
- Define ISOSStressLogThreadEnum, ISOSStressLogMsgEnum, and
  ISOSStressLogMemoryEnum enumerator interfaces
- Implement SOSDacImpl bridge delegating to _target.Contracts.StressLog
- Add IDL definitions in sospriv.idl (native DAC does not implement
  this interface; QueryInterface for the IID will fail on the legacy DAC)
- Add StressLog debuggee and StressLogDumpTests for dump-based validation
- Add design document explaining the architecture and migration path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the maxcharlamb/stresslog-cdac-bridge branch from 714265b to b31ec46 Compare June 11, 2026 02:34

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the cDAC stress log support by (1) enriching the StressLog contract data model (notably adding StartTime and thread-log address identity) and (2) introducing a new ISOSDacInterface17 COM surface (IDL + managed declarations + SOSDacImpl bridge) intended to let SOS/clrmd enumerate stress logs without hardcoded layout offsets. It also adds a new dump-test debuggee plus contract-level dump tests and a detailed design document.

Changes:

  • Add StartTime to StressLog data descriptors (CoreCLR + NativeAOT) and flow it through the managed contract types.
  • Add ISOSDacInterface17 + related StressLog enumerator interfaces/structs and implement the bridge in SOSDacImpl.
  • Add StressLog dump debuggee and dump-based integration tests for the StressLog contract, plus a design doc.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs Adds dump-based integration tests for the StressLog contract (availability, header data, basic enumeration).
src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/StressLog.csproj New debuggee project enabling stress logging via environment variables.
src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/Program.cs New debuggee that generates allocations/GC activity then FailFasts to produce a dump.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs Implements ISOSDacInterface17 and new StressLog enumerator implementations bridging to the cDAC IStressLog contract.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/ISOSDacInterface.cs Adds public COM interface/struct declarations for ISOSDacInterface17 and StressLog enumerators.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/StressLog.cs Adds [Field] StartTime to the StressLog data model.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StressLog.cs Plumbs StartTime into StressLogData and includes thread-log address in ThreadStressLogData.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs Extends StressLogData with StartTime and ThreadStressLogData with Address.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs Exposes StressLog via Target.Contracts.StressLog.
src/coreclr/vm/datadescriptor/datadescriptor.inc Exports StressLog.StartTime for CoreCLR via data descriptors.
src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc Exports StressLog.StartTime for NativeAOT via data descriptors.
src/coreclr/inc/sospriv.idl Adds IDL definitions for ISOSDacInterface17 and associated StressLog structs/enumerator interfaces.
docs/design/datacontracts/stresslog-isosinterface-design.md Design rationale and planned consumer migration strategy for exposing StressLog via SOS DAC interface.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

Is there a specific reason that regular CoreCLR and NativeAOT stress logs must be different?

- Add [GeneratedComClass] to all three enumerator classes
- Use ToTargetPointer for ClrDataAddress comparison
- Shrink _lastBatchRaw to actual fetched size
- Return ex.HResult instead of E_FAIL in IsStressLogAvailable

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 15:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

@jkotas

jkotas commented Jun 11, 2026

Copy link
Copy Markdown
Member

We should be able to delete https://github.com/max-charlamb/runtime/blob/bd3de83c6195bdbda612e67672fb1248a1014d28/src/coreclr/inc/stresslog.h#L225 once this is in place. (It does not have to be done in this PR.)

Max Charlamb and others added 3 commits June 16, 2026 11:50
- Remove DacpStressLogMemoryRange, ISOSStressLogMemoryEnum, and
  GetStressLogMemoryRanges from IDL, managed interface, and impl
  since no SOS consumer uses them.
- Refactor SOSStressLogMsgEnum to eagerly materialize messages into
  an array, enabling Reset and GetCount support consistent with
  other ISOSEnum implementations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename DacpStressLogData -> SOSStressLogData,
  DacpThreadStressLogData -> SOSThreadStressLogData,
  DacpStressMsgData -> SOSStressMsgData
- Define structs inline in sospriv.idl with typedef struct and
  cpp_quote guards, following the SOSMethodData/SOSMemoryRegion
  convention used by newer interfaces
- Remove Dacp forward declarations and dacprivate.h definitions
- Put ISOSDacInterface16, ISOSDacInterface17 on same line in class decl

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'written < count' pattern for S_FALSE (standard COM IEnumXxx
  convention matching SOSMethodEnum/SOSMemoryEnum)
- Return E_POINTER directly for null output in GetStressLogData

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 13:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Max Charlamb and others added 2 commits June 17, 2026 10:13
StressMsgHeaderSize, ChunkSize, MaxMessageSize, and PointerSize are
implementation details for raw memory walking that the message
enumerator now abstracts away. Consumers no longer need them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers can just call GetStressLogData/GetStressLogThreadEnumerator
directly -- they return S_FALSE when stress log is not enabled.
No need for a separate availability check method.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 14:18
Copilot AI review requested due to automatic review settings June 17, 2026 21:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28

  • These are public positional record structs. Adding new positional parameters (StartTime / Address) is a breaking change for any consumer that constructs, deconstructs, or pattern-matches these records. If this API is intended to be compatible across versions, consider keeping the existing primary constructor shape and adding new members as additional properties (e.g., public ulong StartTime { get; init; }) instead.
public record struct StressLogData(
    uint LoggedFacilities,
    uint Level,
    uint MaxSizePerThread,
    uint MaxSizeTotal,
    int TotalChunks,
    ulong TickFrequency,
    ulong StartTimestamp,
    ulong StartTime,
    TargetPointer Logs);

public record struct ThreadStressLogData(
    TargetPointer Address,
    TargetPointer NextPointer,
    ulong ThreadId,
    bool WriteHasWrapped,
    TargetPointer CurrentPointer,
    TargetPointer ChunkListHead,
    TargetPointer ChunkListTail,
    TargetPointer CurrentWriteChunk);

Initialize *pFetched = 0 early in Next and GetArguments so callers
see a well-defined value on all failure paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb requested a review from noahfalk June 17, 2026 22:08
@jkotas

jkotas commented Jun 17, 2026

Copy link
Copy Markdown
Member

Remove unused fields from SOS stress log structs

Should these unused field be deleted from the data contract as well?

@max-charlamb

Copy link
Copy Markdown
Member Author

Remove unused fields from SOS stress log structs

Should these unused field be deleted from the data contract as well?

The datadescriptors are previously existing for the IStressLog when used by the StressLogAnalzer.csproj tool. I looked into the source there are we do seem to be leaking more data than we need in the contract.

I'm happy to remove the things we don't use, but I'd prefer to do so in a follow-up.

@jkotas

jkotas commented Jun 17, 2026

Copy link
Copy Markdown
Member

I'm happy to remove the things we don't use, but I'd prefer to do so in a follow-up.

Sounds good

Messages with Timestamp==0 can appear at chunk boundaries on x86 with
small stress log sizes. These represent unwritten/partial message slots
and are valid contract output per the spec. Update tests to find the
first message with a non-zero timestamp rather than assuming the first
message is valid.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 00:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

@noahfalk noahfalk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Matches the convention used by other SOSDAC sized-buffer APIs where
S_FALSE indicates fewer items returned than requested.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@steveisok steveisok self-requested a review June 18, 2026 15:02
On x86 with small stress logs, partially-written messages can have
non-zero timestamps but null format strings due to chunk-boundary
wrapping. Look for messages with both valid timestamp and format
string.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 15:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Comment thread src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs
Comment thread src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs
Comment thread src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs Outdated
Temporary diagnostics to understand why x86 can't find valid
messages: prints thread count, pointer size, per-thread metadata,
and per-message timestamp/format/facility/argcount.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 18:50

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

StressLog.modules is an inline array (ModuleDesc modules[MAX_MODULES]),
not a pointer. Using [Field] reads the first bytes of the array as a
pointer value, which on x86 happens to be the coreclr base address
(from modules[0].baseAddress), causing GetFormatPointer's module table
lookup to iterate from the PE header instead of the module descriptors.

Using [FieldAddress] returns the address of the inline array itself,
which is the correct base for iterating the ModuleDesc entries.

Verified fix locally against CI x86 dump that previously failed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the maxcharlamb/stresslog-cdac-bridge branch from bded221 to cb02ef6 Compare June 18, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants