Rename dump tensor surface to dump args#1072
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughRenames the per-task diagnostic dump feature from "tensor dump" to "args dump" across the entire stack: the shared AICPU C API header and implementation, host collector output paths and manifest schema, ChangesArgs Dump Rename
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request renames the "tensor dump" feature to "args dump" across the codebase, updating documentation, CLI flags, output directories, artifact names, and internal APIs. The review feedback identifies a potential race condition and state leakage issue on AICPU due to the lazy allocation of global tables, suggesting eager allocation and cleanup during initialization and deinitialization to ensure thread safety and prevent memory leaks.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
1c48e2a to
cd63d98
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
.github/workflows/ci.yml (1)
1-14:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestrict default
GITHUB_TOKENpermissions for this workflow.Line 1 currently has no top-level
permissionsblock, so jobs inherit default token scopes. That leaves this CI pipeline broader than needed for PR validation.Suggested hardening
name: CI + +permissions: + contents: read + pull-requests: read🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 1 - 14, The CI workflow in .github/workflows/ci.yml lacks a top-level permissions block, which causes jobs to inherit default GITHUB_TOKEN scopes with broader access than needed. Add a top-level permissions block immediately after the name field (before the `on` section) to explicitly restrict token permissions to only what is required for pull request validation, such as read-only access to repository contents and write access only to pull request checks if needed.Source: Linters/SAST tools
src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp (1)
527-545:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAdd a hard upper-bound check for
num_dump_threadsbefore indexing fixed arrays.Line 544 iterates to
num_dump_threadsbut writes into static arrays sized byPLATFORM_MAX_AICPU_THREADS. Without validation, oversized thread counts can corrupt memory.Proposed patch
void dump_args_init(int num_dump_threads) { + if (num_dump_threads <= 0 || num_dump_threads > PLATFORM_MAX_AICPU_THREADS) { + LOG_ERROR( + "invalid num_dump_threads=%d (expected 1..%d)", num_dump_threads, PLATFORM_MAX_AICPU_THREADS + ); + return; + } + void *dump_base = reinterpret_cast<void *>(get_platform_dump_base()); if (dump_base == nullptr) { LOG_ERROR("platform dump base is NULL, cannot initialize args dump"); return; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp` around lines 527 - 545, The function dump_args_init uses the num_dump_threads parameter to iterate and index into fixed-size arrays in the for loop (starting at line 544 with `for (int t = 0; t < num_dump_threads; t++)`), but the parameter is never validated against the maximum allowed threads (PLATFORM_MAX_AICPU_THREADS). Add a bounds check immediately after the existing dump_base null check to verify that num_dump_threads does not exceed PLATFORM_MAX_AICPU_THREADS, and if it does, log an error message and return early to prevent out-of-bounds array access.src/common/platform/include/aicpu/tensor_dump_aicpu.h (1)
123-130:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRename residual warning text to “args dump” for consistency.
Line 126, Line 256, and Line 271 still log
"tensor dump skipped...", which leaves user-visible output inconsistent with the renameddump_argssurface and makes log filtering harder.Proposed patch
- "Thread %d: tensor dump skipped for task 0x%" PRIx64 + "Thread %d: args dump skipped for task 0x%" PRIx64 ": active callable tensor args (%d) do not match payload tensor args (%d). " "Task-level dump assumes payload tensor args are concatenated by active subtask order.", @@ - "Thread %d: tensor dump skipped for task 0x%" PRIx64 + "Thread %d: args dump skipped for task 0x%" PRIx64 ": task args (%d) smaller than callable signature (%d)", @@ - "Thread %d: tensor dump skipped for task 0x%" PRIx64 + "Thread %d: args dump skipped for task 0x%" PRIx64 ": callable tensor args (%d) do not match registered tensor info (%d)",Also applies to: 255-258, 270-273
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/common/platform/include/aicpu/tensor_dump_aicpu.h` around lines 123 - 130, The LOG_WARN message in the tensor dump validation block contains inconsistent terminology that references "tensor dump skipped" but should use "args dump" terminology to align with the renamed dump_args surface. Update the warning text at the current location (around the try_log_dump_args_layout_mismatch() call) and at the two other similar logging locations (around lines 255-258 and 270-273) to replace "tensor dump skipped" with "args dump skipped" to ensure consistent user-visible output and improve log filtering.src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp (1)
878-901:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset
l2_swimlane_level_before the enablement branch to prevent cross-run state leak.
l2_swimlane_level_should be set toL2SwimlaneLevel::DISABLEDunconditionally beforeif (is_l2_swimlane_enabled()). Keeping reset only in theelsepath can leave stale state across launches in edge initialization flows.Suggested fix
`#if` PTO2_PROFILING + l2_swimlane_level_ = L2SwimlaneLevel::DISABLED; // l2_swimlane_aicpu_init promotes g_l2_swimlane_level from the shared-memory // header — must be called BEFORE the orchestrator thread caches the level // via rt->orchestrator.l2_swimlane_level = get_l2_swimlane_level() in @@ if (is_l2_swimlane_enabled()) { l2_swimlane_aicpu_init(runtime->worker_count); l2_swimlane_level_ = get_l2_swimlane_level(); @@ - } else { - l2_swimlane_level_ = L2SwimlaneLevel::DISABLED; - } + } `#endif`Based on learnings: “In
SchedulerContext::init()inside the scheduler cold-path implementation, ensurel2_swimlane_level_cannot retain a stale value across runs; reset toDISABLEDunconditionally before conditional logic.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp` around lines 878 - 901, The `l2_swimlane_level_` member variable is currently only reset to `L2SwimlaneLevel::DISABLED` in the else branch, which can leave stale state from previous initialization runs. Add an unconditional reset of `l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;` before the `if (is_l2_swimlane_enabled())` check in the scheduler cold-path initialization logic. This ensures the variable always starts in a clean state regardless of which branch is taken, preventing cross-run state leakage in edge initialization flows.Source: Learnings
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/dfx/tensor-dump.md`:
- Around line 280-283: The relative links in docs/dfx/tensor-dump.md at lines
280, 282, and 410 are using single-level parent directory traversal (../) but
need to traverse two levels to correctly resolve to the tests directory at the
repository root. Change all instances of `../tests/` to `../../tests/` in the
markdown link references to fix the broken paths that currently would resolve to
the non-existent docs/tests/ location instead of the actual tests/ directory.
In `@docs/testing.md`:
- Line 108: The `--dump-args` option documentation in the table is incorrectly
showing `false` as the default value, but this option is level-based with
possible values 0, 1, 2, or 3, where the actual default is 0. Update the default
column for the `--dump-args` row to show `0` instead of `false` to accurately
document that this is a numeric level option, not a boolean flag.
In `@simpler_setup/scene_test.py`:
- Around line 1696-1698: The current code at the dump_args check appends only
the bare flag `--dump-args` to the common list without preserving its value.
When a user specifies a dump level like `--dump-args 2` or `--dump-args 3`, this
value is lost because subprocesses only receive the flag and argparse defaults
to const=1. Modify the condition around args.dump_args to check if the value is
greater than its default/const value and append both the flag and the actual
value to the common list to preserve the user's specified dump level in child
process invocations.
- Around line 1419-1421: The help text for the --dump-args argument is
incomplete and omits the level 3 option (full_json_only) from its description.
Update the help string in the add_argument call for --dump-args to include all
supported dump levels: add documentation for level 3 (full_json_only) to the
existing descriptions of levels 0 (off), 1 (partial), and 2 (full) so that the
help text accurately reflects all available options.
---
Outside diff comments:
In @.github/workflows/ci.yml:
- Around line 1-14: The CI workflow in .github/workflows/ci.yml lacks a
top-level permissions block, which causes jobs to inherit default GITHUB_TOKEN
scopes with broader access than needed. Add a top-level permissions block
immediately after the name field (before the `on` section) to explicitly
restrict token permissions to only what is required for pull request validation,
such as read-only access to repository contents and write access only to pull
request checks if needed.
In
`@src/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cpp`:
- Around line 878-901: The `l2_swimlane_level_` member variable is currently
only reset to `L2SwimlaneLevel::DISABLED` in the else branch, which can leave
stale state from previous initialization runs. Add an unconditional reset of
`l2_swimlane_level_ = L2SwimlaneLevel::DISABLED;` before the `if
(is_l2_swimlane_enabled())` check in the scheduler cold-path initialization
logic. This ensures the variable always starts in a clean state regardless of
which branch is taken, preventing cross-run state leakage in edge initialization
flows.
In `@src/common/platform/include/aicpu/tensor_dump_aicpu.h`:
- Around line 123-130: The LOG_WARN message in the tensor dump validation block
contains inconsistent terminology that references "tensor dump skipped" but
should use "args dump" terminology to align with the renamed dump_args surface.
Update the warning text at the current location (around the
try_log_dump_args_layout_mismatch() call) and at the two other similar logging
locations (around lines 255-258 and 270-273) to replace "tensor dump skipped"
with "args dump skipped" to ensure consistent user-visible output and improve
log filtering.
In `@src/common/platform/shared/aicpu/tensor_dump_aicpu.cpp`:
- Around line 527-545: The function dump_args_init uses the num_dump_threads
parameter to iterate and index into fixed-size arrays in the for loop (starting
at line 544 with `for (int t = 0; t < num_dump_threads; t++)`), but the
parameter is never validated against the maximum allowed threads
(PLATFORM_MAX_AICPU_THREADS). Add a bounds check immediately after the existing
dump_base null check to verify that num_dump_threads does not exceed
PLATFORM_MAX_AICPU_THREADS, and if it does, log an error message and return
early to prevent out-of-bounds array access.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5779ec94-992b-4fb9-97de-71161490d1e8
📒 Files selected for processing (31)
.github/workflows/ci.ymlconftest.pydocs/dfx/tensor-dump.mddocs/profiling-framework.mddocs/testing.mdsimpler_setup/scene_test.pysimpler_setup/tools/README.mdsimpler_setup/tools/dump_viewer.pysrc/a2a3/platform/onboard/aicpu/kernel.cppsrc/a2a3/platform/sim/host/device_runner.cppsrc/a2a3/platform/sim/host/device_runner.hsrc/a2a3/runtime/host_build_graph/aicpu/aicpu_executor.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.hsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cppsrc/a2a3/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cppsrc/a5/platform/onboard/aicpu/kernel.cppsrc/a5/platform/sim/host/device_runner.cppsrc/a5/platform/sim/host/device_runner.hsrc/a5/runtime/host_build_graph/aicpu/aicpu_executor.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/pto_orchestrator.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.hsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_cold_path.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_completion.cppsrc/a5/runtime/tensormap_and_ringbuffer/runtime/scheduler/scheduler_dispatch.cppsrc/common/platform/include/aicpu/tensor_dump_aicpu.hsrc/common/platform/include/host/tensor_dump_collector.hsrc/common/platform/shared/aicpu/tensor_dump_aicpu.cppsrc/common/platform/shared/host/tensor_dump_collector.cpptests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py
c78e7d7 to
1267f5c
Compare
Align the focused PR792/PR966 dump flow on the new dump-args naming across CLI, runtime artifacts, viewer, tests, CI, and user-facing docs while keeping the underlying compatibility fields intact.
1267f5c to
86267e2
Compare
Summary
dump tensortodump argsacross the CLI, runtime artifacts, viewer, and manifest schemadlsym, and onboard entrypoints with the newdump_argssymbol surface while keeping the underlying compatibility fields intactargs_dumpcontractTest plan
python -m py_compile conftest.py simpler_setup/scene_test.py simpler_setup/tools/dump_viewer.py tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.pypytest tests/st/a2a3/tensormap_and_ringbuffer/dfx/tensor_dump/test_tensor_dump.py --platform a2a3sim --device 0-15 -p no:xdist --pto-session-timeout 600 --pto-isa-commit ddafa8da9c760ecd13fe9fe2833d6ee55fb20bd8 --dump-args --clone-protocol https🤖 Generated with Claude Code