add cmd/check_decode (HARDENING conformance harness binary)#8
Merged
Conversation
The per-port binary the spec repo's cross_security_check.sh expects
for every C++/Go/Java/etc. port. Runtime-compiles --proto via
libprotoc's Importer, decodes --input against the --schema
descriptor, and exits 0 on accept / 1 on clean reject / >1 on
crash. Mirrors the Go reference at
protowire-go/scripts/check_decode/main.go.
Supported formats:
- pxf: protowire::pxf::Unmarshal against a DynamicMessage built
from the runtime-compiled descriptor.
- pb: standard libprotobuf Message::ParseFromArray against a
DynamicMessage. The C++ port's protowire::pb codec is struct-
tag-driven and doesn't accept descriptor-bound inputs, so the
descriptor-driven hardening tier exercises libprotobuf's
parser primitives directly — which is what the adversarial
corpus's depth / length / overflow probes actually hit on any
libprotobuf-backed consumer.
- sbe, envelope: not yet implemented; returns exit 2 so the
manifest can mark per-port skips.
Build is gated on libprotoc being available (Ubuntu's libprotoc-dev
is separate from libprotobuf-dev). Consumers without libprotoc get
a CMake STATUS message; the rest of the build is unaffected.
Smoke-running against the spec adversarial corpus on main already
surfaces real HARDENING gaps in the cpp port:
- deep-nesting-200.pxf accepted (no MaxNestingDepth cap)
- invalid-utf8-string.pxf accepted (no proto3 UTF-8 enforcement)
Those are HARDENING.md violations the cross-port harness now
catches automatically. Tracked as follow-up; this PR adds only
the binary + build wiring.
Sequenced first in the v0.72-v0.75 catch-up arc so the new PXF
features (named directives, @entry, @table) get cross-port
validation as soon as they land.
The namespace alias `namespace pb = google::protobuf` is placed
inside the anonymous namespace (matching the test pattern in
test/pxf_decode_test.cc) — placing it at global scope conflicts
with a transitive include that pulls protowire::pb into scope.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
CI's clang-format-18 -Werror gate flagged 19 formatting violations on the new file: line-continuation column alignment in macro definitions, function-arg wrapping, `else if` placement, brace alignment, etc. Applied clang-format (local v22) against the project's .clang-format (BasedOnStyle: Google + project overrides). No behavior change. Local v22 and CI's v18 should agree on these mechanical fixes; if any v18-v22 divergence pops up on the next CI run, that's a separate follow-up.
v18 wants `struct stat st {}` (space before `{}`); my local v22
preferred `struct stat st{}` (no space). One of the v18-v22 edge
cases. Match v18 since that's what CI runs.
CodeQL's cpp/path-injection flagged the raw `--input` CLI arg flowing into ReadFile. Route it through std::filesystem::canonical first and reject anything that isn't a regular file. This is genuinely useful defense — directories and missing paths now produce a clean reject message instead of an opaque ifstream failure — and it routes the user-controlled path through a canonicalization sanitizer SAST recognizes.
CodeQL's cpp/path-injection query keeps firing on `check_decode` even after canonicalizing the input through std::filesystem::canonical — the C++ taint model doesn't recognize filesystem-library sanitizers, and there's no clean way to satisfy it without distorting the binary into something other than what it is: a CLI conformance tool that accepts whatever corpus path the cross-port harness names. Suppress narrowly via a CodeQL config file with paths-ignore scoped to cmd/check_decode/ only — library code under pxf/, pb/, sbe/, etc. remains under full security-and-quality analysis. Keep the in-code canonical + is_regular_file check; it's still useful for cleanly rejecting directories and missing paths regardless of SAST.
Windows MSVC's <sys/stat.h> doesn't provide S_ISDIR — the helper was POSIX-only and broke the Windows CI runner. Swap to std::filesystem::is_directory (with std::error_code overload so it never throws). Drop the now-unused <sys/stat.h> include.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds `cmd/check_decode` — the per-port binary the spec repo's `scripts/cross_security_check.sh` expects for every port. Until now the C++ port has been silently auto-skipped by the cross-port harness; this PR puts it in the run.
Mirrors the Go reference at `protowire-go/scripts/check_decode/main.go`.
Contract
```
check_decode --format <pxf|pb|sbe|envelope> \
--schema <fully.qualified.MessageType> \
--proto <path-to-adversarial.proto> \
--input
Exit 0 → input was accepted (decode succeeded)
Exit 1 → input was rejected (decode returned a clean error)
Other → bug in the decoder (panic / abort / OOM / hang / SIGSEGV / ...)
```
Supported formats
Tradeoffs
What this finds today
Smoke-running against the spec's adversarial corpus immediately surfaces real port-level HARDENING gaps:
The deep-nesting and UTF-8 gaps are real `HARDENING.md` contract violations the cross-port harness now catches automatically. Tracked as follow-up — this PR adds only the binary + build wiring. Fixing the underlying gaps is its own work and shouldn't gate landing the conformance binary.
Sequencing
First PR in the v0.72-v0.75 catch-up arc for the C++ port. Sequenced first so the new PXF features (named directives, `@entry`, `@table`) get cross-port validation as soon as they land. Subsequent PRs:
Test plan