Skip to content

Return command output verbatim by default#538

Merged
tony merged 9 commits into
masterfrom
subprocess-trimming
Jun 21, 2026
Merged

Return command output verbatim by default#538
tony merged 9 commits into
masterfrom
subprocess-trimming

Conversation

@tony

@tony tony commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

  • Breaking: .run() on the Git/Hg/Svn command classes (and the internal runner) now returns command output verbatim by default — exactly what the VCS printed. It previously stripped each line, dropped blank lines, and removed the trailing newline, so a captured git diff was rejected by git apply, git cat-file blob lost indentation and blank lines, and multi-line stderr was concatenated into one run-on line.
  • Add a trim parameter (default False). Pass trim=True for the convenient "bare value" output the old default produced — a single whole-output str.rstrip, never per-line.
  • Add trim passthrough on the named Git/Hg/Svn command methods, so the opt-in works everywhere, not only the low-level run().
  • Keep downstream-facing scalar APIs bare: GitSync.get_revision() (and rev_parse/rev_list via trim=True) still return a clean value.
  • Fix stderr assembly so multi-line errors keep their line breaks in libvcs.exc.CommandError.output.
  • Document the decision, measured alternatives, and prior art in ADR 0001; add a changelog entry.

Breaking change

The default return of .run() flips from trimmed to verbatim. Reads that want a bare value opt in with trim=True.

Before — a bare value came back trimmed:

sha = git.run(["rev-parse", "HEAD"])

After — output is verbatim; opt into trimming:

sha = git.run(["rev-parse", "HEAD"], trim=True)

High-level helpers are unaffected: GitSync.get_revision() still returns a bare revision, and downstream vcspull (which strips defensively) passes unchanged — see the companion PR.

The bug, now fixed at the default

Before (on master) — .run(["diff"]) corrupts the patch; git apply rejects it:

@@ -1,5 +1,5 @@
def greet(name):
-    message = "hello, " + name
+    message = "hi, " + name
print(message)
return message

After (this PR) — .run(["diff"]) is byte-identical to git diff and applies cleanly:

@@ -1,5 +1,5 @@
 def greet(name):
-    message = "hello, " + name
+    message = "hi, " + name
 
     print(message)
     return message

Changes by area

  • src/libvcs/_internal/run.py: capture stdout/stderr verbatim and decode once; new trim parameter (default False) applies a single whole-output rstrip when set; stderr keeps its line structure; the UTF-8 decode fallback uses errors="backslashreplace".
  • src/libvcs/cmd/git.py, cmd/hg.py, cmd/svn.py: forward trim from the named command methods to run(); doctests show clean output via trim=True; the GitRemoteCmd.set_url doctest is now git-version-agnostic; Svn.blame reflects the faithful output.
  • src/libvcs/sync/git.py: get_revision() and update_repo's rev_list reads strip to keep bare SHAs.
  • tests/: functional tests for verbatim diff/blob fidelity + git apply, the default-preserves-newline contract, and stderr line preservation; sync/cmd tests adapted to verbatim output.
  • docs/project/adr/, CHANGES: ADR 0001 (faithful subprocess output capture) wired into the Project docs, plus a changelog entry.

Design decisions

  • Verbatim by default, transform at the edge: matches the convention of every tool surveyed in ADR 0001 (pip, uv, mise, mercurial, gitoxide, jujutsu) — capture pristine, trim only where a scalar is wanted. The per-line strip was the outlier and the source of the corruption.
  • trim=True is the opt-in for bare values: a single whole-output rstrip, never per-line; high-level sync APIs apply it internally so downstream comparisons stay bare.
  • Phase 1 of ADR 0001: the structured CompletedProcess backend (libvcs._internal.subprocess.SubprocessCommand) and retiring console_to_str are Phase 2, deferred to keep this change focused.

Companion PR

Verification

The old per-line strip/filter is gone:

$ rg "line.strip\(\) for line in" src/libvcs/_internal/run.py

(returns zero matches)

Test plan

  • test_run_trim_false_preserves_diff — captured diff is byte-identical to git diff and passes git apply --check
  • test_run_trim_false_preserves_blobcat-file blob round-trips byte-for-byte
  • test_run_default_preserves_trailing_newline — default .run() returns verbatim output (trailing newline kept); trim=True yields the bare value
  • test_run_failure_preserves_stderr_lines — failed command keeps stderr line structure in CommandError.output
  • uv run pytest — full suite green
  • uv run ruff check . && uv run ruff format --check .
  • uv run mypy
  • Downstream: vcspull's suite passes against this branch (py(deps) libvcs 0.43.0 -> 0.44.0 vcspull#556)

@codecov

codecov Bot commented Jun 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.69014% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.19%. Comparing base (163a945) to head (0069b20).

Files with missing lines Patch % Lines
src/libvcs/cmd/svn.py 0.00% 8 Missing ⚠️
src/libvcs/cmd/git.py 0.00% 3 Missing and 1 partial ⚠️
src/libvcs/_internal/run.py 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #538      +/-   ##
==========================================
+ Coverage   60.98%   61.19%   +0.20%     
==========================================
  Files          40       40              
  Lines        6531     6563      +32     
  Branches     1101     1103       +2     
==========================================
+ Hits         3983     4016      +33     
+ Misses       1951     1950       -1     
  Partials      597      597              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony tony force-pushed the subprocess-trimming branch 2 times, most recently from c42d2d6 to 2049c04 Compare June 21, 2026 11:10
@tony

tony commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. rev_parse (and rev_list) were missed by the new trim passthrough, so trim=True is silently ignored on them. rev_parse declares **kwargs but its self.run(...) call never forwards them, so rev_parse(args="HEAD", trim=True) returns verbatim output with a trailing newline instead of a bare SHA — the documented opt-in is a no-op. The same gap leaves rev_list verbatim, and GitSync.update_repo() feeds that newline-suffixed value straight into reset(pathspec=head_sha), which git rejects (fatal: invalid reference: <sha>). get_revision() got a .strip() fix for exactly this; these two methods did not. The update_repo path is a stash-pop recovery branch not exercised by tests, so the suite stays green.

rev_parse drops the kwargs (no trim/**kwargs forwarded):

libvcs/src/libvcs/cmd/git.py

Lines 2059 to 2063 in 2049c04

return self.run(
["rev-parse", *local_flags],
check_returncode=check_returncode,
)

The newline-suffixed SHA used as a git pathspec:

# Stash pop failed: Restore previous state.
with contextlib.suppress(exc.CommandError):
self.cmd.reset(
pathspec=head_sha,
hard=True,
quiet=True,

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony tony force-pushed the subprocess-trimming branch from 2049c04 to 94436ff Compare June 21, 2026 12:38
@tony tony force-pushed the subprocess-trimming branch 2 times, most recently from c1ba17a to 788caff Compare June 21, 2026 13:33
@tony tony changed the title Fix corrupted VCS command output (diffs, blobs, errors) Return command output verbatim by default Jun 21, 2026
@tony

tony commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Code review

Fresh re-review at head 788caff. The earlier rev_parse/rev_list trim finding is resolved (both forward **kwargs now, and update_repo strips). Found 2 issues, both in the new ADR:

  1. "Alternatives considered" carries branch-internal narrative and a fragile test count (AGENTS.md/CLAUDE.md AI-slop rubric: "Avoid ... fragile file/test counts" and "Do not mention intermediate branch states ... unless users of a published release actually experienced the old state"). Drop the subprocess-trimming branch name and "575 tests at baseline".

Each candidate was spiked on the `subprocess-trimming` branch on
2026-06-20 and measured against the test suite (575 tests at baseline)
and a probe checking whether a captured diff survives `git apply` and

  1. The ADR contradicts its own shipped decision on .run(). Phase 1 makes output verbatim by default, but Phase 2 calls .run() -> str a "trimmed convenience facade" and the Risks section says "the facade is defined as rstrip()" — leftover wording from the earlier trimmed-default design.

callers that want streams, exit code, and exact bytes. Keep
`.run() -> str` as a trimmed convenience facade layered on top.
- Retire `libvcs._internal.run.console_to_str` in favor of subprocess's

(also Risks:

- Behavior drift between the trimmed facade and the structured result.
Mitigation: the facade is defined as `rstrip()` over the structured
result's decoded stdout, not a separate code path.
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

tony added 9 commits June 21, 2026 09:56
why: The legacy run() helper post-processes captured output
(per-line strip, drops blank lines, no trailing newline), which
corrupts whitespace-significant output -- captured diffs no
longer apply, blob reads lose content, multi-line errors smash
together. The decision and rejected alternatives need a durable
record so the behavior isn't "simplified" back into the bug.

what:
- Record the decision: capture pristine, trim/decode at the
  edge, phased migration onto SubprocessCommand.
- Capture the measured alternatives matrix and prior-art lessons
  (pip, uv, mise, mercurial, gitoxide, jujutsu, git) as evidence.
- Start a docs/project/adr/ section with an index and toctree.
why: The runner trimmed each captured line and dropped blank lines,
so whitespace-significant output was corrupted: captured diffs were
rejected by `git apply`, `cat-file blob` lost indentation and blank
lines, and multi-line stderr was concatenated into one run-on line.
Single-token reads like `rev-parse HEAD` hid the defect.

what:
- Capture stdout/stderr verbatim; decode once; trim only the whole
  output (str.rstrip), preserving leading indentation, blank lines,
  and interior structure.
- Add `trim` (default True). Pass `trim=False` for byte-faithful
  output -- diffs that feed `git apply`, exact blob contents.
- Rejoin stderr without smashing lines together.
- Correct the Svn.blame doctest, which had encoded the stripped output.
- Cover trim=False fidelity, the default trim contract, and stderr
  line preservation with functional tests.
why: Trimming in the capture path corrupted whitespace-significant
output. Returning verbatim by default makes the plain
`.run(["diff"])` faithful and applyable, fixing the original bug
without an opt-in. Bare-value reads opt in with `trim=True`.

what:
- Flip `trim` default from True to False (verbatim).
- Rewrite the docstring around the verbatim default.
why: get_revision() is a scalar API downstream compares to bare
SHAs; with verbatim run() it would carry a trailing newline.

what:
- Strip the rev-parse result so get_revision() stays bare.
why: With run() verbatim by default, command methods return output
with a trailing newline. Forwarding a `trim` flag lets callers opt
into the bare value, and lets doctests show clean output without
raw-string newline noise.

what:
- Forward a `trim` parameter from git/hg/svn command methods to run().
- Use `trim=True` in command doctests and drop the trailing newline.
- Restore plain docstrings where the escaped newline is gone.
why: Tests that read scalars via the low-level run() or thin cmd
wrappers relied on the old implicit trim.

what:
- Strip rev-parse / symbolic-ref / is-shallow / version reads.
- Invert the default-behavior test to assert verbatim output.
why: The chosen design flipped to verbatim-by-default during review.

what:
- Update ADR 0001 Phase 1, alternatives, and consequences to
  reflect verbatim default with `trim=True` opt-in.
why: Document the verbatim-output breaking change and the related
output-fidelity fixes for downstream readers.

what:
- Add a Breaking changes entry for the verbatim-default `.run()`
  (with the trim=True migration) and Fixes for applyable diffs,
  byte-identical blobs, and intact multi-line error output.
why: rev_list's since/after/until/before/max_age/min_age flags were
never applied -- the guard tested the imported `datetime` module
instead of the loop value, so the condition was always False.

what:
- Test the loop value (datetime_kwarg), matching the adjacent flag
  loop, so the date-range flags reach git rev-list.
@tony tony force-pushed the subprocess-trimming branch from c267a51 to 0069b20 Compare June 21, 2026 14:57
@tony tony merged commit 0876582 into master Jun 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant