Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272
Add ISOSDacInterface17 for StressLog enumeration via cDAC#129272max-charlamb wants to merge 18 commits into
Conversation
1c1c89a to
714265b
Compare
Add a new ISOSDacInterface17 COM interface that exposes the cDAC's IStressLog contract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT without hardcoded struct offsets. Changes: - Add StartTime (wall-clock FILETIME) field to cDAC data descriptors, Data/StressLog.cs, StressLogData record, and contract implementation - Add StressLog to ContractRegistry - Add Address field to ThreadStressLogData for thread identification - Define ISOSDacInterface17 with IsStressLogAvailable, GetStressLogData, GetStressLogThreadEnumerator, GetStressLogMessageEnumerator, and GetStressLogMemoryRanges - Define ISOSStressLogThreadEnum, ISOSStressLogMsgEnum, and ISOSStressLogMemoryEnum enumerator interfaces - Implement SOSDacImpl bridge delegating to _target.Contracts.StressLog - Add IDL definitions in sospriv.idl (native DAC does not implement this interface; QueryInterface for the IID will fail on the legacy DAC) - Add StressLog debuggee and StressLogDumpTests for dump-based validation - Add design document explaining the architecture and migration path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
714265b to
b31ec46
Compare
There was a problem hiding this comment.
Pull request overview
This PR extends the cDAC stress log support by (1) enriching the StressLog contract data model (notably adding StartTime and thread-log address identity) and (2) introducing a new ISOSDacInterface17 COM surface (IDL + managed declarations + SOSDacImpl bridge) intended to let SOS/clrmd enumerate stress logs without hardcoded layout offsets. It also adds a new dump-test debuggee plus contract-level dump tests and a detailed design document.
Changes:
- Add
StartTimeto StressLog data descriptors (CoreCLR + NativeAOT) and flow it through the managed contract types. - Add
ISOSDacInterface17+ related StressLog enumerator interfaces/structs and implement the bridge inSOSDacImpl. - Add StressLog dump debuggee and dump-based integration tests for the StressLog contract, plus a design doc.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/DumpTests/StressLogDumpTests.cs | Adds dump-based integration tests for the StressLog contract (availability, header data, basic enumeration). |
| src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/StressLog.csproj | New debuggee project enabling stress logging via environment variables. |
| src/native/managed/cdac/tests/DumpTests/Debuggees/StressLog/Program.cs | New debuggee that generates allocations/GC activity then FailFasts to produce a dump. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/SOSDacImpl.cs | Implements ISOSDacInterface17 and new StressLog enumerator implementations bridging to the cDAC IStressLog contract. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Legacy/ISOSDacInterface.cs | Adds public COM interface/struct declarations for ISOSDacInterface17 and StressLog enumerators. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/StressLog.cs | Adds [Field] StartTime to the StressLog data model. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/StressLog.cs | Plumbs StartTime into StressLogData and includes thread-log address in ThreadStressLogData. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs | Extends StressLogData with StartTime and ThreadStressLogData with Address. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/ContractRegistry.cs | Exposes StressLog via Target.Contracts.StressLog. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Exports StressLog.StartTime for CoreCLR via data descriptors. |
| src/coreclr/nativeaot/Runtime/datadescriptor/datadescriptor.inc | Exports StressLog.StartTime for NativeAOT via data descriptors. |
| src/coreclr/inc/sospriv.idl | Adds IDL definitions for ISOSDacInterface17 and associated StressLog structs/enumerator interfaces. |
| docs/design/datacontracts/stresslog-isosinterface-design.md | Design rationale and planned consumer migration strategy for exposing StressLog via SOS DAC interface. |
|
Is there a specific reason that regular CoreCLR and NativeAOT stress logs must be different? |
- Add [GeneratedComClass] to all three enumerator classes - Use ToTargetPointer for ClrDataAddress comparison - Shrink _lastBatchRaw to actual fetched size - Return ex.HResult instead of E_FAIL in IsStressLogAvailable Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
We should be able to delete https://github.com/max-charlamb/runtime/blob/bd3de83c6195bdbda612e67672fb1248a1014d28/src/coreclr/inc/stresslog.h#L225 once this is in place. (It does not have to be done in this PR.) |
- Remove DacpStressLogMemoryRange, ISOSStressLogMemoryEnum, and GetStressLogMemoryRanges from IDL, managed interface, and impl since no SOS consumer uses them. - Refactor SOSStressLogMsgEnum to eagerly materialize messages into an array, enabling Reset and GetCount support consistent with other ISOSEnum implementations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename DacpStressLogData -> SOSStressLogData, DacpThreadStressLogData -> SOSThreadStressLogData, DacpStressMsgData -> SOSStressMsgData - Define structs inline in sospriv.idl with typedef struct and cpp_quote guards, following the SOSMethodData/SOSMemoryRegion convention used by newer interfaces - Remove Dacp forward declarations and dacprivate.h definitions - Put ISOSDacInterface16, ISOSDacInterface17 on same line in class decl Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Use 'written < count' pattern for S_FALSE (standard COM IEnumXxx convention matching SOSMethodEnum/SOSMemoryEnum) - Return E_POINTER directly for null output in GetStressLogData Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StressMsgHeaderSize, ChunkSize, MaxMessageSize, and PointerSize are implementation details for raw memory walking that the message enumerator now abstracts away. Consumers no longer need them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Callers can just call GetStressLogData/GetStressLogThreadEnumerator directly -- they return S_FALSE when stress log is not enabled. No need for a separate availability check method. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Abstractions/Contracts/IStressLog.cs:28
- These are
publicpositionalrecord structs. Adding new positional parameters (StartTime/Address) is a breaking change for any consumer that constructs, deconstructs, or pattern-matches these records. If this API is intended to be compatible across versions, consider keeping the existing primary constructor shape and adding new members as additional properties (e.g.,public ulong StartTime { get; init; }) instead.
public record struct StressLogData(
uint LoggedFacilities,
uint Level,
uint MaxSizePerThread,
uint MaxSizeTotal,
int TotalChunks,
ulong TickFrequency,
ulong StartTimestamp,
ulong StartTime,
TargetPointer Logs);
public record struct ThreadStressLogData(
TargetPointer Address,
TargetPointer NextPointer,
ulong ThreadId,
bool WriteHasWrapped,
TargetPointer CurrentPointer,
TargetPointer ChunkListHead,
TargetPointer ChunkListTail,
TargetPointer CurrentWriteChunk);
Initialize *pFetched = 0 early in Next and GetArguments so callers see a well-defined value on all failure paths. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Should these unused field be deleted from the data contract as well? |
The datadescriptors are previously existing for the I'm happy to remove the things we don't use, but I'd prefer to do so in a follow-up. |
Sounds good |
Messages with Timestamp==0 can appear at chunk boundaries on x86 with small stress log sizes. These represent unwritten/partial message slots and are valid contract output per the spec. Update tests to find the first message with a non-zero timestamp rather than assuming the first message is valid. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Matches the convention used by other SOSDAC sized-buffer APIs where S_FALSE indicates fewer items returned than requested. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On x86 with small stress logs, partially-written messages can have non-zero timestamps but null format strings due to chunk-boundary wrapping. Look for messages with both valid timestamp and format string. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Temporary diagnostics to understand why x86 can't find valid messages: prints thread count, pointer size, per-thread metadata, and per-message timestamp/format/facility/argcount. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
StressLog.modules is an inline array (ModuleDesc modules[MAX_MODULES]), not a pointer. Using [Field] reads the first bytes of the array as a pointer value, which on x86 happens to be the coreclr base address (from modules[0].baseAddress), causing GetFormatPointer's module table lookup to iterate from the PE header instead of the module descriptors. Using [FieldAddress] returns the address of the inline array itself, which is the correct base for iterating the ModuleDesc entries. Verified fix locally against CI x86 dump that previously failed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
bded221 to
cb02ef6
Compare
Note
This PR was created with assistance from GitHub Copilot.
Summary
Add a new
ISOSDacInterface17COM interface that exposes the cDAC'sIStressLogcontract to SOS and clrmd consumers. This enables reading stress logs from both CoreCLR and NativeAOT processes without hardcoded struct offsets.Motivation
SOS (
!DumpLog) and clrmd currently read stress logs via raw memory reads with hardcoded struct offsets matching the CoreCLR layout. NativeAOT'sStressLogstruct has a different layout (different lock type, no module table, no padding sentinel), so these tools cannot read NativeAOT stress logs.The cDAC already has a working
IStressLogcontract (StressLog_1/StressLog_2) that reads stress logs from both runtimes usingdatadescriptor.incfield offsets. This PR bridges that contract to the COM interface layer so SOS and clrmd can consume it.Changes
cDAC contract updates
StartTime(wall-clock FILETIME) to data descriptors (CoreCLR + NativeAOT),Data/StressLog.cs,StressLogDatarecord, and contract implementationAddressfield toThreadStressLogDatafor thread identification across the COM boundaryStressLogtoContractRegistryISOSDacInterface17 definition
SOSStressLogData,SOSThreadStressLogData,SOSStressMsgData(defined inline in IDL following modern convention)ISOSStressLogThreadEnum,ISOSStressLogMsgEnumISOSDacInterface17:GetStressLogData,GetStressLogThreadEnumerator,GetStressLogMessageEnumeratorS_FALSEwhen stress log is not enabled (no separate availability check needed)SOSDacImpl bridge
Reset/GetCountsupport andGetArgumentsfor per-batch arg retrievalQueryInterfacefor the IID will fail on the legacy DAC, and only the cDAC handles itIDL + testing
sospriv.idlTest Results
All dump tests pass:
StressLogIsAvailable-- PASSEDStressLogDataIsValid-- PASSEDCanEnumerateThreadsAndMessages-- PASSEDRelated PRs
!DumpLogupdated to useISOSDacInterface17with fallback