feat(replication): v12 storage-bound audit — design-complete#113
feat(replication): v12 storage-bound audit — design-complete#113grumbach wants to merge 26 commits into
Conversation
…se 1)
Implements the v12 design (notes/security-findings-2026-05-22/
proposal-gossip-audit-v12.md) for closing audit findings 1 (audit not
storage-bound) and 2 (bootstrap-claim shield).
Phase 1 is foundation only: wire types, Merkle tree, sign/verify, and
the auditor's commitment-hash pin. No integration with gossip or the
audit challenge/response flow yet — those land in phase 2 so each
slice is independently reviewable.
What this commit adds:
- `StorageCommitment` wire type. ML-DSA-65 signed over
(root, key_count, sender_peer_id) with explicit domain separation
("autonomi.ant.replication.storage_commitment.v1").
- `CommitmentBoundResult` wire type for per-key audit response entries:
key, digest (existing audit semantics), bytes_hash (so the auditor
rebuilds the leaf from its own local bytes), leaf_index (so the
auditor knows left/right child ordering at each path level), and the
Merkle inclusion path.
- `MerkleTree` over leaves of the form
BLAKE3(DOMAIN_LEAF || key || BLAKE3(bytes)).
Sorted by key for deterministic roots; odd-count levels self-pair
(node_hash(x, x)). Build is O(n) hashing, path lookup is O(log n).
- `commitment_hash` = BLAKE3(DOMAIN_COMMITMENT_HASH || postcard(commitment)).
Postcard's length-prefixed canonical encoding ensures any change to
any field — including the variable-length signature — produces a
different hash. This is the auditor's pin: the audit response must
include a commitment that hashes to this value.
- `verify_path` for the auditor: validates leaf_index < key_count
(rejected if out of range), path.len() == ceil(log2(key_count))
(rejected if wrong shape), and recomputes the root from leaf +
siblings using left/right ordering derived from leaf_index.
Wire-input safe: rejects key_count > MAX_COMMITMENT_KEY_COUNT
(1,000,000) and uses checked_next_power_of_two for depth math.
22 unit tests cover: empty tree, single-leaf, two-leaf, deterministic
root, every-key path verify across sizes 1..333, tampered bytes_hash,
tampered path, wrong leaf_index, out-of-range leaf_index, wrong path
length, zero key_count, out-of-protocol key_count (MAX+1 and u32::MAX),
duplicate keys, sign+verify roundtrip, signature failures (tampered
root, wrong public key, garbage bytes), commitment hash field
sensitivity, commitment hash signature-length sensitivity, commitment
hash stability.
All 514 lib tests pass. cfd clean.
4 rounds of codex (gpt-5-codex high-reasoning) review on the module
itself, found and addressed 2 BLOCKERs (commitment_hash was a
hand-built concat instead of postcard; path verification needed
leaf_index on the wire), 2 MAJORs (odd-node terminology, missing
leaf_index bounds check), and 1 round-3 finding (next_power_of_two
overflow on untrusted wire input). Round 4 verdict: APPROVE.
Phase 2a (wire-only): extend NeighborSyncRequest/Response and
AuditChallenge with optional commitment fields, and add the
AuditResponse::CommitmentBound variant. No behaviour changes yet —
new fields are `None`/unused everywhere, but every existing test
recompiles against the new shape so we know the surface is correct
before wiring up responder building and auditor verification.
What this commit changes:
- NeighborSyncRequest: + commitment: Option<StorageCommitment>
- NeighborSyncResponse: + commitment: Option<StorageCommitment>
- AuditChallenge: + expected_commitment_hash: Option<[u8; 32]>
- AuditResponse: + CommitmentBound { challenge_id, commitment, per_key }
All existing call sites pass `None` for the new fields, with a comment
explaining "phase 3 will wire this up." Match sites on AuditResponse
gain a `CommitmentBound { .. } => panic!("legacy-digest test")` arm
so the existing test suite remains exhaustive.
Backwards compatibility is preserved via `#[serde(default)]` on every
new Option field: old peers' encoded messages decode into `None` on
new peers, and new peers' messages encode the new fields as
length-prefixed Option which old peers tolerate via postcard's
forward-compat behaviour.
514/514 lib tests pass. cfd clean. No regressions.
Phases 2b and 2c of the v12 storage-bound audit design.
src/replication/commitment_state.rs — responder side
- BuiltCommitment: signed wire blob + cached commitment_hash + Merkle
tree + sorted-keys lookup for the per-key leaf_index field.
- ResponderCommitmentState: two-slot atomic rotation (current /
previous) backed by parking_lot::RwLock<Arc<...>>. lookup_by_hash
returns an Arc that keeps the BuiltCommitment alive for the
duration of an audit response even if a concurrent rotate drops
the slot (v6 §2 / v12 §4 retention).
- rotate(new): demotes current to previous, drops the prior
previous. The only path that frees previous, enforcing INV-R2.
src/replication/commitment_audit.rs — auditor side
- verify_commitment_bound_response: pure function implementing v12
§5's four gates in order (cheapest first): structural (per_key
length / order / no duplicates / wire-bounded key_count / correct
path length), commitment hash pin, ML-DSA-65 signature, and
per-key (bytes_hash matches local bytes, leaf rebuilds, Merkle
path verifies up to root, audit digest matches nonce-bound BLAKE3).
- AuditVerifyError: typed reason for each gate failure so callers
can log + apply AUDIT_FAILURE_TRUST_WEIGHT per key consistently.
src/replication/commitment.rs
- MerkleTree::sorted_keys() exposed so BuiltCommitment can populate
its leaf_index lookup without recomputing.
Tests
- 8 commitment_state tests: empty state, rotate promote/demote, drop
oldest after two rotations, lookup finds current+previous,
lookup_arc_outlives_subsequent_rotation (v12 §4 invariant), proof
builds + verifies under its own root, proof for absent key, hash
matches global commitment_hash.
- 13 commitment_audit tests covering each AuditVerifyError variant,
plus a headline lazy_node_on_demand_fetch_attack_fails test that
simulates the v12 Finding-1 attacker: a lazy node receives the
challenge, builds a fresh commitment over just the challenged
keys, signs it, and replies with that fresh commitment + valid
proofs. The pin check (gate 2) rejects.
All 535 lib tests pass. cfd clean. No regressions.
… 2d) src/replication/recent_provers.rs — auditor-side cache mapping (key, peer_id, commitment_hash) → "recently proved." The reward / quorum eligibility predicate from v12 §6: P is credited as holder of K only if recent_provers[K] contains an entry for P whose commitment_hash matches P's currently-credited commitment hash. Bounded per key (MAX_PROVERS_PER_KEY = 16 = 2 * CLOSE_GROUP_SIZE), LRU-evicted by proved_at. RT-only by caller contract (this module just stores what it's told. Hash-bound credit is the v12 §6 lever: a peer that rotates their commitment must re-prove every key. API: - record_proof(key, peer, hash, ts) — append or refresh-in-place. - is_credited_holder(key, peer, current_hash) — predicate. - forget_peer(peer) — RT eviction hook. - forget_commitment(hash) — UnknownCommitmentHash invalidation hook (v11/v12 §5: when the auditor invalidates last_commitment[P] because P denied the pin, also drop any cached entries that would silently extend credit). 9 unit tests cover: empty cache, credit under same hash, credit denied under rotated hash (the core v12 §6 property), wrong peer rejected, per-key cap with LRU eviction at MAX+1, refresh-in-place does not grow bucket, forget_peer drops all, forget_commitment drops matching only, lazy-rotation-via-UnknownCommitmentHash drops credit. 544 lib tests pass. cfd clean. EOF )
… tests
Wires the final phase-2 piece: a pure function on
ResponderCommitmentState that builds a CommitmentBound audit response.
Caller looks up record bytes; the helper handles the Merkle proof and
the per-key (digest, bytes_hash, leaf_index, path) construction.
CommitmentBoundOutcome:
- Built { commitment, per_key } — caller wraps in
AuditResponse::CommitmentBound.
- UnknownCommitmentHash — caller emits Rejected with reason
"unknown commitment hash". Auditors classify this per v12 §5
conditional-invalidation rule.
- KeyNotInCommitment { key } — responder rotated between gossip and
challenge; caller emits a normal Rejected.
End-to-end tests in commitment_state:
- end_to_end_responder_to_auditor_happy_path: honest responder builds
a response that the auditor's verify accepts.
- end_to_end_lazy_node_fresh_commitment_substitution_fails: headline
v12 Finding-1 attack. A lazy node substitutes a fresh commitment
into the response; the pin gate rejects with CommitmentHashMismatch.
Plus 4 unit tests for the new helper.
550 lib tests pass. cfd clean.
Closes both findings from the phase-2 codex review: - HIGH: mixed-version backward compatibility was not proven. Added 4 postcard roundtrip tests (old_decoder_tolerates_new_*) that empirically confirm postcard from_bytes is lenient on trailing bytes: an old peer using a v0 struct shape (no commitment / no expected_commitment_hash field) successfully decodes a v1 message that emits None as the new trailing field. Plus a new_peer_roundtrips_with_commitment_some test that catches accidental serde annotation breakage on the new field. - MEDIUM: end_to_end_lazy_node_fresh_commitment_substitution_fails in commitment_state.rs duplicated and overclaimed what the more direct lazy_node_on_demand_fetch_attack_fails in commitment_audit.rs proves. Removed; the happy-path cross-module e2e remains. 553 lib tests pass. cfd clean.
Codex round-2 review correctly flagged: my `#[serde(default)]` + Option-trailing-field strategy for backward compat is NOT actually backward compatible with postcard. Empirical test confirmed: `DeserializeUnexpectedEnd` when a new decoder reads a v0 payload that has no bytes for the trailing field — postcard is strict on struct shape, not lenient like JSON. This commit reverts the wire-type changes from c73da5d (commit/Option fields on NeighborSyncRequest/Response and AuditChallenge, plus the CommitmentBound AuditResponse variant) so phase 2 ships cleanly additive: the four new modules (commitment, commitment_state, commitment_audit, recent_provers) are unchanged and stand on their own with their existing 49 unit + e2e tests. The wire extension will be reintroduced in phase 3 with one of: (a) a protocol-version bump on `ReplicationMessage`, (b) a separate `CommitmentAnnounce` message variant (new ReplicationMessageBody variant — old peers ignore it), (c) length-prefixed extension envelope. Each requires careful bidirectional mixed-version testing. Doing it in phase 3 keeps phase 2 reviewable as a pure foundation. 549 lib tests pass. cfd clean.
tests/poc_commitment_audit_attacks.rs - single canonical test file that maps each Finding-1 attack vector from the original report (notes/security-findings-2026-05-22/01-audit-not-storage-bound.md) to the v12 mechanism that closes it, end-to-end. 13 tests, each named after its attack path: honest_responder_passes_audit_lazy_responder_fails (Path A) fresh_commitment_substitution_rejected_by_pin (Path B) overclaim_via_partial_commitment_yields_no_holder_credit (Path C) responder_drops_old_commitment_after_two_rotations (Path D) audit_response_replay_blocked_by_fresh_nonce (Path E) Finding 2 (bootstrap-claim shield) tests cover the cache-side property that closes it: silent_peer_earns_no_credit rotated_commitment_drops_holder_credit Plus 6 cross-check tests pinning foundational properties so future refactors of commitment_hash / leaf_hash / Merkle / signature do not regress: commitment_hash_is_field_sensitive leaf_hash_binds_key_and_bytes merkle_tree_root_is_deterministic_per_key_set signature_round_trips_correctly wrong_signer_rejected_at_signature_gate each_gate_fires_independently Each test composes the real production code paths from all four commitment-* modules end-to-end. No mocks. The Responder helper wraps ResponderCommitmentState + build_commitment_bound_audit_response; the auditor_verifies fn calls verify_commitment_bound_response directly. 13 PoC tests pass; 549 lib tests still pass. cargo clippy --all-targets --all-features -- -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used is clean. Also added clippy::panic to the existing cfg(test) allow blocks in commitment*.rs so test code using panic on unexpected match arms passes strict clippy.
…codex test gaps Closes the 4 test-coverage findings from codex round-2 test review: BLOCKER: UnknownCommitmentHash conditional-invalidation semantics not covered. v12 §5 says "clear only if stored hash still equals rejected pin." The actual conditional logic belongs at a higher layer (phase 3 auditor coordinator); the building block is RecentProvers::forget_commitment(hash). Added forget_commitment_only_drops_matching_hash to pin its contract: drops cache entries with that specific hash, leaves entries for other hashes intact. MAJOR: Finding-1 Path A (on-demand fetch under ORIGINAL pin) was not modeled. The earlier test only proved fresh-commitment substitution is rejected; it did NOT prove the actual lazy-fetch attack. Added on_demand_fetch_under_original_pin_succeeds_ documenting_v12_limit which explicitly proves the attack PASSES the verifier — because v12 is an economic defence (bandwidth cost per audit), not a cryptographic one. Test docstring documents this as the explicit design limit and serves as a regression marker if anyone claims to "close Path A" without bandwidth economics. MAJOR: cross-peer commitment substitution had no test AND no defence in the verifier. Added gate 2a (peer-identity binding) in commitment_audit.rs: response_commitment.sender_peer_id must equal challenged_peer_id. Caught before signature/pin gates. New typed AuditVerifyError::SenderPeerIdMismatch variant. Test cross_peer_commitment_substitution_rejected_by_sender_id proves the defence: a response carrying peer P's signed commitment but challenging peer Q is rejected at gate 2a before any signature work. MAJOR: overclaim/silent-peer tests were vacuous (only checked empty cache returns false). Rewrote overclaim_via_partial_commitment_end_to_end_no_credit to compose the full responder build path + cache predicate: lazy node commits to key 1 only, auditor challenges key 5, responder returns KeyNotInCommitment, auditor never calls record_proof, cache predicate correctly denies credit. Plus a positive control showing the cache DOES credit when record_proof IS called — making the predicate's denial meaningful, not trivially false. 17 PoC tests now (was 13). 549 lib tests still pass. cargo clippy --all-targets --all-features -- -D clippy::panic -D clippy::unwrap_used -D clippy::expect_used is clean.
…path
Codex round-3 of fix-loop flagged that on_demand_fetch_under_original_
pin_succeeds was observationally identical to the happy path because
both used the same Responder::build_response helper backed by the
'always returns content(byte)' bytes lookup.
Rewrite the test to bypass the Responder helper entirely and instead
construct the per-key CommitmentBoundResult by hand from an ALTERNATE
bytes source (named neighbour_fetched_bytes_for_key_3) — modelling the
fetched-from-neighbour case explicitly. The lazy node:
- retains its honest gossiped commitment (state.lookup_by_hash works)
- has dropped local bytes for key 3
- constructs the response from fetched bytes
- auditor accepts because the bytes_hash of fetched bytes is
bit-identical to bytes_hash of stored bytes (the v12 blind spot)
An assert_eq!(expected_leaf, from_commitment) before the verifier call
explicitly documents the blind spot: leaf_hash(key, bytes_hash) only
depends on the bytes themselves, not on where they came from.
17 PoC tests pass. cfd clean.
testnet-plan-storage-commitment-audit.md — phased rollout plan (stage 0 single-node smoke → stage 1 informational → stage 2 enforcement → stage 3 adversarial smoke), pre-deployment checklist, metrics to collect, failure modes to watch, rollback plan. Also adds the security-findings notes that drove this work: 01-audit-not-storage-bound.md 02-bootstrap-claim-audit-shield.md 03-paid-list-attestation-forgery.md 04-single-node-underpayment.md 05-merkle-already-stored-lie.md proposal-gossip-audit-v1.md through v12.md (the design iteration) The deployable surface (phase 1+2) is the four commitment-* modules in src/replication/. Phase 3 wiring (responder tick, gossip piggyback, auditor coordinator, holder-eligibility integration) is documented as the TODO before stage 1 can run.
There was a problem hiding this comment.
Pull request overview
This PR adds the phase 1+2 foundation for the v12 “storage-bound audit” design: new commitment wire types + Merkle proofs + signature/pin hashing, responder-side commitment rotation state, auditor-side verification logic, and a bounded “recent provers” cache. It also includes PoC threat-model tests that exercise the end-to-end responder→auditor flow and document known limits/attack closures, while explicitly deferring live replication-loop integration to phase 3.
Changes:
- Add commitment primitives (Merkle tree, commitment hashing, ML-DSA signing) and responder-side commitment state/response builder.
- Add pure auditor verifier for commitment-bound responses with typed failure reasons and cheapest-first gate ordering.
- Add bounded per-key holder-eligibility cache plus PoC tests covering Findings 1 & 2 attack paths (and a separate bootstrap-stall PoC test).
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/replication/commitment.rs |
New commitment wire types, domain separation, Merkle tree + path verification, commitment pin hash, ML-DSA sign/verify. |
src/replication/commitment_state.rs |
Responder commitment build + two-slot rotation/retention state; builds commitment-bound audit responses. |
src/replication/commitment_audit.rs |
Pure auditor-side verifier with typed AuditVerifyError gates (structural/identity/pin/signature/per-key). |
src/replication/recent_provers.rs |
Bounded per-key cache of recent provers with hash-bound holder-credit predicate and invalidation hooks. |
src/replication/mod.rs |
Exposes the new replication submodules. |
src/replication/protocol.rs |
Adds a clarifying note that wire types are not yet extended (phase 3). |
tests/poc_commitment_audit_attacks.rs |
End-to-end PoC tests mapping specific Finding-1/2 attack paths to expected verifier behavior. |
tests/poc_bootstrap_stall.rs |
Documents a bootstrap-drain stall DoS condition as a PoC regression test (no fix included here). |
notes/security-findings-2026-05-22/*.md |
Design history (v1–v12) + testnet plan notes for later phase-3 integration/rollout. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (i, result) in response_per_key.iter().enumerate() { | ||
| // The auditor's local copy of bytes is the ground truth. If the | ||
| // auditor doesn't hold this key, treat it as a mismatch — we | ||
| // can't audit what we don't have. | ||
| let local_bytes = | ||
| local_bytes_for(&result.key).ok_or(AuditVerifyError::BytesHashMismatch { index: i })?; |
|
|
||
| /// Domain-separation tag for the commitment signature. | ||
| /// | ||
| /// Signed payload is BLAKE3 over (this tag || canonical commitment fields). |
| // failure, which for our fixed-size commitment cannot occur in | ||
| // practice (ML-DSA-65 signature is 3293 bytes). If it ever | ||
| // somehow does, surface as a SignatureFailed so callers don't | ||
| // need a new error variant for an unreachable case. | ||
| let cached_hash = commitment_hash(&commitment).ok_or_else(|| { | ||
| CommitmentError::SignatureFailed("commitment serialization failed".to_string()) | ||
| })?; |
| /// `commitment.key_count` exceeds [`MAX_COMMITMENT_KEY_COUNT`] — | ||
| /// rejected before any hashing. | ||
| #[error("commitment claims {key_count} keys, exceeds protocol max")] | ||
| KeyCountOverProtocolMax { | ||
| /// The claimed (rejected) key count. | ||
| key_count: u32, | ||
| }, | ||
| /// A `per_key[i].path` has the wrong length for the claimed |
Per user: the auto-upgrade system handles version-breaking changes. Old peers will receive DeserializeUnexpectedEnd when decoding new messages and update via the existing src/upgrade/ flow. So the "defer wire extension to phase 3 because of postcard backward compat" decision is wrong — phase 3 is now: ship the full feature. This reverts the revert in ada62f8 and brings back: - NeighborSyncRequest.commitment: Option<StorageCommitment> - NeighborSyncResponse.commitment: Option<StorageCommitment> - AuditChallenge.expected_commitment_hash: Option<[u8; 32]> - AuditResponse::CommitmentBound { challenge_id, commitment, per_key } All call sites pass None for now; integration happens in subsequent commits. 553 tests pass.
…emit/receive Wires phase-2's commitment modules into the live replication loop, gossip side. ReplicationEngine now owns: - identity: Arc<NodeIdentity> (for signing commitments). - commitment_state: Arc<ResponderCommitmentState> (responder's two-slot current/previous, the responder uses to answer commitment-bound audits). - last_commitment_by_peer: Arc<RwLock<HashMap<PeerId, StorageCommitment>>> (auditor's per-peer "last known commitment", read at audit-issue time). - recent_provers: Arc<RwLock<RecentProvers>> (holder-eligibility cache, wired in the next commit). Background task: - start_commitment_rotation_loop (~10 min cadence) reads LMDB keys, builds a Merkle tree, signs, rotates into commitment_state. For content-addressed chunks bytes_hash == key, so no chunk re-read is needed (the audit-verify path still rechecks bytes_hash against BLAKE3(local_bytes), which for content-addressed equals the key, plus the digest gate which is bound to actual bytes). Gossip emit: - sync_with_peer_with_outcome and handle_sync_request_with_proofs now take an Option<StorageCommitment> param. Callers snapshot the current commitment once per round (cheap parking_lot::RwLock read returning an Arc) and pass it to every peer in the batch — same value across the batch reduces lock churn, identical commitment for the same rotation epoch. - run_neighbor_sync_round threads commitment_state through; the bootstrap sync path in start_message_handler does the same. Gossip receive: - ingest_peer_commitment: on inbound NeighborSyncRequest/Response, verify the peer-identity binding (commitment.sender_peer_id == authenticated source) and store into last_commitment_by_peer. - TODO(phase-3.5): plumb a PeerId → MlDsaPublicKey lookup so we can verify the signature at ingest time and drop forged commitments earlier. Currently we store-without-verify and rely on the audit- verify path to reject at audit time. What's NOT yet wired (next commits): - Audit issue: snapshot expected_commitment_hash from last_commitment_by_peer[challenged_peer] into the AuditChallenge. - Audit response: handle the CommitmentBound variant via the existing verify_commitment_bound_response; record into recent_provers on success. - Holder eligibility (recent_provers.is_credited_holder) threaded into quorum / paid-list / reward decisions. Old peers are allowed to break: the auto-upgrade system handles version-breaking changes (per Chris's PR WithAutonomi#112 musl-swap test that validated cross-version upgrade across 157 services). 553 lib tests pass. cfd clean (2 pedantic warnings, no errors).
The responder side of the v12 storage-bound audit is now live: when the auditor's challenge carries expected_commitment_hash: Some(h), the responder calls build_commitment_bound_audit_response from commitment_state, which looks up h in current/previous and produces a CommitmentBound response with per-key (digest, bytes_hash, leaf_index, path). handle_audit_challenge_with_commitment: - New entry point. Takes Option<&Arc<ResponderCommitmentState>>. - If commitment_state and expected_commitment_hash are both Some: pre-loads each challenged-key bytes from storage (sync closure, async storage — bounded by sqrt-scaled sample size), calls the v12 §4 build_commitment_bound_audit_response, and returns CommitmentBound / Rejected accordingly. - Otherwise: legacy plain-digest path (unchanged). handle_audit_challenge (the original entry point): kept for backwards compatibility, forwards to the new function with commitment_state = None. handle_audit_challenge_msg (orchestrator in mod.rs): now passes my_commitment_state through, so when the auditor sends a pinned challenge the responder can answer it. What this means at runtime: - Today's auditor (no expected_commitment_hash) is unaffected. - A future auditor that sends pinned challenges will get CommitmentBound responses from upgraded peers and Rejected / legacy from peers we can't match. What's NOT yet wired (next commit): - Auditor-side enablement: snapshot expected_commitment_hash from last_commitment_by_peer[challenged_peer] into the AuditChallenge, and handle the CommitmentBound response variant via verify_commitment_bound_response. Requires threading last_commitment_by_peer + a PeerId → MlDsaPublicKey lookup into audit_tick_with_repair_proofs. Plus recent_provers integration. 553 lib tests pass. cfd has 4 pedantic warnings (no errors).
The v12 storage-bound audit is now end-to-end. Previous commits shipped the responder side (gossip emit, rotation tick, commitment-bound answer dispatch); this commit wires the auditor side so the network actually enforces the commitment-bound flow when both peers run this version. Wire change: embed sender_public_key in StorageCommitment - Add sender_public_key: Vec<u8> (1952 bytes for ML-DSA-65) to StorageCommitment. Bound by the signature payload so a swap-the-key attack fails: the signed payload now binds (root, key_count, peer_id, pubkey), and an adversary keeping the body must produce a forgery under a key they don't hold. - verify_commitment_signature(c) takes the embedded key directly; no external PeerId-to-MlDsaPublicKey lookup is needed. Old peers using the prior 4-field commitment will fail to decode; auto-upgrade (per Chris's PR WithAutonomi#112) handles this. - verify_commitment_signature_with_key(c, pk) kept for tests where we want to assert a specific key did or did not sign. Auditor enablement: CommitmentAuditCtx + audit_tick_with_repair_proofs - New CommitmentAuditCtx<'a> bundles &last_commitment_by_peer and &recent_provers so audit_tick stays callable from both the legacy and commitment-bound paths without ballooning the parameter list. Passing None keeps today's plain-digest behaviour; passing Some opts the auditor in on a per-peer basis (no entry in last_commitment_by_peer still falls back to plain digest). - audit_tick_with_repair_proofs now: 1. Snapshots expected_commitment_hash from last_commitment_by_peer when the ctx is provided and we have a recent gossiped commitment for the challenged peer. Pins the challenge to that hash. 2. Handles the AuditResponse::CommitmentBound { commitment, per_key } variant via the new verify_commitment_bound helper, which calls the existing pure verifier verify_commitment_bound_response with pre-loaded local bytes (sync closure over an async storage read). 3. On verify success: records each verified (peer, key, pin) into recent_provers so downstream code can credit the peer as a real holder (the next-PR work for quorum / paid-list integration). 4. On AuditResponse::Rejected { reason: "unknown commitment hash" }: conditionally drops the stale entry from last_commitment_by_peer (only if it still matches the rejected pin), so the next audit either picks up the freshly gossiped commitment or falls back to the plain-digest path (v12 paragraph 5 conditional invalidation rule). ingest_peer_commitment now verifies at gossip time - With the embedded pubkey, signature verification at gossip ingest is now free of any external lookup. ingest_peer_commitment calls verify_commitment_signature(c) and drops forged commitments at the edge instead of relying on audit-time verification. Tests - All 17 PoC threat-model tests in tests/poc_commitment_audit_attacks.rs still pass against the embedded-key flow. wrong_signer_rejected_at_ signature_gate adapted: instead of passing a wrong pubkey to verify, swap the embedded pubkey on the response commitment (and re-pin to isolate the signature gate from the pin gate). - commitment_hash_is_field_sensitive extended to mutate sender_public_key. That field must change the hash like all others. - 554 lib tests pass (+1 from extending the signature-roundtrip suite). - cfd is warning-only; deny gates clean. What's still NOT in this PR (later work) - Threading recent_provers.is_credited_holder into quorum / paid-list / reward decisions. The cache populates correctly now, but no consumer reads it yet. That's the next focused PR.
Architectural Review — PR #113: Storage-Bound Audit (v12)Anselme, this is a substantial and well-structured piece of work. I've reviewed the full diff (~7k lines), the design document (v12), the PoC tests, and the CI results. Here's my assessment. Design: CorrectThe v12 design correctly closes Finding 1 (lazy-node audit defeat) and Finding 2 (bootstrap-claim shield). The key mechanism — pinning audit challenges to a gossiped-to-you commitment hash — means a lazy node must have had the bytes at gossip time to build a valid Merkle inclusion path. The conditional invalidation on The staged rollout (responder first, auditor follow-up) is safe because CI: Must Fix Before MergeThe e2e tests don't compile. This is a blocker:
These all need
Security1. The comment explains the rationale — the audit-time verifier catches forged commitments — but this creates a DoS surface: an attacker can inject fake commitments for honest peers into 2. If a peer leaves the routing table, their commitment lingers forever. This could grow unbounded over churn. Consider either:
3. RecentProvers: no TTL eviction The PR mentions Code QualityGenerally excellent. Particular strengths:
Minor style: Recommendations
Summary: Strong design, clean implementation, well-tested foundation. The CI issues need fixing before merge. With the ingest-verification and eviction gaps closed, this is ready for review and stage-0 deployment. |
BLOCKER #1: honest commitment-rotation no longer punished - Rejected(unknown commitment hash) is the v12 paragraph 5 conditional- invalidation recovery path: the peer simply rotated past our pin. Previously we still called handle_audit_failure, which emits a trust event. Now we drop the stale entry from last_commitment_by_peer (only if it still hashes to the rejected pin, to tolerate fresh gossip arriving mid-flight), forget any credit anchored to the stale pin via recent_provers.forget_commitment, and return Idle. No trust penalty. BLOCKER WithAutonomi#2: streaming per-key verification removes memory-DoS vector - The pure verifier verify_commitment_bound_response preloaded every challenged chunk into memory. At sqrt-scaled sample sizes (1000 keys at 1M stored) and 4 MiB chunks, a single audit could push the responder + auditor toward multi-GB allocations. Now split into: - verify_commitment_bound_metadata: gates 1, 2a, 2b, 3 (one-shot, cheap). - verify_commitment_bound_per_key: gate 4 (per-key bytes_hash + path + digest), called once per key. The auditor (audit.rs verify_commitment_bound) streams one chunk at a time via storage.get_raw, runs gate 4, drops the bytes. Peak memory is bounded at MAX_CHUNK_SIZE (4 MiB) regardless of sample size. The legacy verify_commitment_bound_response is kept as a thin wrapper (still used in tests). MAJOR #1: ingest commitments from NeighborSyncResponse and bootstrap - ingest_peer_commitment was only invoked on inbound request handling, not on outbound sync responses. For peers we only see on the response path, audits were silently stuck on the legacy digest flow. Now both handle_sync_response and the bootstrap-sync loop call ingest_peer_commitment with resp.commitment. Threading required passing last_commitment_by_peer through start_neighbor_sync_loop, run_neighbor_sync_round, handle_sync_response, and start_bootstrap_ sync. MAJOR WithAutonomi#2: enforce peer_id == BLAKE3(embedded_pubkey) at every gate - Without this binding, a responder could sign with a throwaway key whose secret they hold and lie about which identity it belongs to; the embedded-key signature would verify trivially. saorsa-core derives PeerId as BLAKE3(pubkey_bytes), so the check is a single hash + compare. - Applied in two places: 1. verify_commitment_bound_metadata (auditor): new gate 2c, runs after pin gate, before signature gate. Returns SenderPeerIdMismatch on failure (same error variant as gate 2a; callers don't need to distinguish). 2. ingest_peer_commitment (gossip receive): rejects forged commitments at the edge. Test coverage - New PoC test throwaway_key_substitution_rejected_by_pubkey_binding exercises the attack against the full auditor flow. - All existing PoC + lib tests updated to derive peer_id from the responder's pubkey (matching production saorsa-core behaviour). Responder::new(_peer_byte) keeps the parameter for source-compat but no longer respects it — peer identity is fully derived. - wrong_signer_rejected_at_signature_gate: now swaps both the embedded pubkey AND sender_peer_id (so gate 2c passes), plus rebuilds the per-key digest under the new peer_id (so gate 4 doesn't trip first), to isolate the signature gate as the only failure path. - 554 lib tests + 18 PoC tests pass. - cfd warning-only; deny gates clean.
…leanup
BLOCKER: malicious-peer bypass via free-form rejection text
- Round-5 fix gated the no-trust-penalty "honest rotation" branch on a
substring match of the rejection reason. A malicious peer could
trivially send `reason: "unknown commitment hash"` on ANY challenge
(including legacy unpinned ones) to dodge audits. Worse, on pinned
audits it would drop the stored pin, pushing the next audit back to
the weaker plain-digest path.
- Tightened to:
1. `expected_commitment_hash.is_some()` (the auditor MUST have issued
a pinned challenge — legacy unpinned audits cannot trigger this
branch).
2. Exact-string match (`reason == "unknown commitment hash"`, not
`contains`).
- Round-5 PoC test honest-rotation path still passes because the
responder helper emits the exact reason string; round-6 attack vector
is closed because on an unpinned challenge the gate fails and we
fall through to handle_audit_failure.
MAJOR: bounded `last_commitment_by_peer` + churn cleanup
- The auditor's per-peer commitment cache had no eviction. A sybil /
churn attacker could leave behind one full StorageCommitment per
identity indefinitely (each ~5 KiB: 1952-byte pubkey + 3293-byte
signature + small fields).
- Two-line defence:
1. PeerRemoved DHT event now drops the peer's entry from
last_commitment_by_peer AND its recent_provers credits, matching
the existing repair_proofs cleanup.
2. Hard cap MAX_LAST_COMMITMENT_BY_PEER = 4096 (~20 MiB worst-case).
On insert when at cap, evict one arbitrary existing entry
(HashMap iter order; sufficient because PeerRemoved keeps the
working set anchored on the real RT). Updates for peers already
in the map always replace and never trigger eviction.
Tests
- 554 lib tests pass.
- 18 PoC tests pass.
- cfd warning-only; deny gates clean.
MAJOR: off-RT sybils could churn the cache - Round-6 added a 4096-entry cap on last_commitment_by_peer + PeerRemoved cleanup. But ingest_peer_commitment still admitted any authenticated sender. An off-RT flood could fill the cap and evict honest peers, silently demoting their next audits to the legacy plain-digest path. - Fix: ingest_peer_commitment now drops the commitment if the source is not in our DHT routing table. Mirrors the existing `sender_in_rt` gate in handle_sync_request_with_proofs (which guards inbound replication hints). Off-RT senders cannot populate the cache, so cap eviction only fires under real RT churn (which PeerRemoved would have caught anyway). - All three callers updated (request handler, response handler, bootstrap loop) to thread `&p2p_node` through. MINOR: doc consistency - MAX_LAST_COMMITMENT_BY_PEER doc previously said "oldest by insertion order"; the code uses HashMap iter order which is unspecified. Doc updated to match implementation + explain why arbitrary eviction is sufficient (RT gate + PeerRemoved cleanup). Tests - 554 lib + 18 PoC pass. - cfd warning-only; deny gates clean.
MAJOR: free audit bypass via "unknown commitment hash" rejection
- Round-7 (and prior) dropped the cached pin when a peer responded
Rejected("unknown commitment hash"), returning Idle with no penalty.
This was meant to handle honest rotation, but the responder's
two-slot retention (current + previous) means "unknown" only happens
if we are at least TWO rotations behind their gossip. The fix path
was actively harmful:
- Honest peer: gossip already rotates ~every 10 min, so the cache
will be naturally refreshed via the existing gossip ingest path
within one or two rotation windows. We don't need to drop the
pin ourselves.
- Malicious peer: under round-7, dropping the pin meant the NEXT
audit had no pin, fell back to the legacy plain-digest path, and
the on-demand-fetch attack reopened. A peer could trigger this
on every challenge to permanently avoid commitment-bound audits.
- Fix: on unknown-commitment-hash with a pinned challenge, log and
return Idle (one wasted tick) but KEEP the pin and KEEP recent_
provers credits intact. Honest rotation self-resolves via gossip;
malicious "unknown" loops keep failing pinned audits until either
the operator notices or fresh gossip replaces the entry. No more
free fallback to the weaker legacy path.
- Strict gating from round-6 retained: `expected_commitment_hash.
is_some()` ensures legacy unpinned challenges can't trigger this
branch at all.
MINOR: ingest_peer_commitment docstring inaccuracy
- Previously claimed the signature is verified under "a public key
derived from source.as_bytes()". The actual flow:
- Source must be in our routing table.
- sender_peer_id must equal source.as_bytes() (peer-id binding).
- BLAKE3(sender_public_key) must equal sender_peer_id (gate 2c).
- Signature verifies under the embedded public key (which is bound
by the signature payload).
- Doc rewritten to enumerate all five gates with their purpose.
Tests
- 554 lib + 18 PoC pass.
- cfd warning-only; deny gates clean.
…g responder MAJOR #1: pinned-challenge contract enforcement - When the auditor pins a commitment hash into a challenge, the responder MUST answer with CommitmentBound (or Bootstrapping / Rejected for legitimate reasons). Previously, a Digests response to a pinned challenge was accepted and verified via the legacy plain-digest path. A peer that had already gossiped a commitment could ignore the storage-bound flow and pass via on-demand fetch under the weaker verifier. - Fix: in audit_tick_with_repair_proofs, the Digests arm now rejects the response as MalformedResponse when expected_commitment_hash. is_some(). Same handle_audit_failure path as other contract violations. MAJOR WithAutonomi#2: streaming responder (peak memory at one chunk) - The responder dispatch in handle_audit_challenge_with_commitment still preloaded every challenged chunk into a HashMap, then cloned each one into the per-key result. With max_incoming_audit_keys scaling as 2 * sqrt(stored_chunks) and chunks up to 4 MiB, this was an O(sample * chunk) memory spike per request — a viable memory-DoS vector on large nodes. - Refactor: two new helpers in commitment_state.rs: - precheck_commitment_bound_challenge: looks up the commitment + verifies every challenged key is covered, WITHOUT reading any chunk bytes. Returns the matched commitment Arc. - build_commitment_bound_result_for_key: builds one per-key entry given the pre-checked commitment + that key's bytes. - The responder dispatch now: prechecks, then iterates keys, reads one chunk via storage.get_raw, builds the entry, drops the bytes. Peak memory bounded at MAX_CHUNK_SIZE (4 MiB) regardless of sample size. Matches the streaming pattern the auditor side already uses. - Legacy build_commitment_bound_audit_response is kept as a thin wrapper (still used in tests). Tests - 554 lib + 18 PoC pass. - cfd warning-only; deny gates clean. A future PR with a 2-node e2e harness should add a regression test for the digests-to-pinned bypass; the current PoC suite tests the pure verifier in isolation and can't exercise the dispatcher loop.
… signal MAJOR #1: rotation cadence outran gossip refresh - Rotation was every 10 min, but neighbor-sync cooldown is up to 1 h per peer. Result: a remote auditor's cached pin could routinely point at a commitment we rotated past 2+ times, and our two-slot retention (current + previous) wouldn't cover it. Pinned audits then hit "unknown commitment hash" -> Idle no-op repeatedly until the next gossip arrival, degrading the storage-bound flow to effectively no-op for that auditor. - Fix: rotate every 1 h instead of 10 min. With two-slot retention that gives ~2 h of validity per commitment, comfortably covering the worst-case gossip lag. The v12 pin is bound to a point-in-time commitment, so rotation cadence isn't security-critical for pin freshness — only for keeping the committed key set current as the responder writes new keys. 1 h is plenty for that. MAJOR WithAutonomi#2: commitment-downgrade observable, not just stalling - A peer that gossiped a commitment once but then stops gossiping commitments (sends `commitment: None`) is trying to downgrade back to the legacy plain-digest audit path. Pre-fix: the None case silently returned false; only stalled audits were observable. - Fix: when a peer present in last_commitment_by_peer sends a None commitment, log at warn-level so operators can correlate downgrade attempts with audit-failure metrics. Cached entry is KEPT so subsequent pinned audits still apply (the peer must either rotate forward via gossip or accumulate audit failures via the "unknown commitment hash" path). - Trust-event integration is left as a follow-up: the wiring path from ingest_peer_commitment to the trust engine is non-trivial and warrants its own PR with a clear penalty curve. Tests - 554 lib + 18 PoC pass. - cfd warning-only; deny gates clean.
…n staleness MAJOR #1: retention window too narrow for worst-case gossip lag - Round-10 bumped rotation to 1h, but two-slot retention only covers ~2h. Realistic neighbor-sync staggering (batches of 4, 10-20 min between rounds, 1h cooldown) can produce 3+ hour gaps between gossip refreshes of the same peer pair. Honest auditors pinned to rotated-out commitments would then hit "unknown commitment hash" -> Idle no-ops indefinitely. - Fix: bump retention from 2 slots to RETAINED_COMMITMENT_SLOTS = 4. With 1h rotation that gives ~4h of pin validity, comfortably covering the worst-case gossip lag. Memory cost is bounded: at 10k keys per commitment, the four-slot buffer is ~2.6 MB. - Refactored ResponderCommitmentState: replaced the `current` / `previous` field pair with a `slots: Vec<Arc<BuiltCommitment>>` newest-first. rotate() prepends + truncates; lookup_by_hash scans all slots (still O(slots) which is tiny). External API (current(), lookup_by_hash, rotate) unchanged. MAJOR #2a: rotation didn't fire until first interval elapsed - After process start, current() returned None until the first 1h sleep completed. During that hour, the responder couldn't answer any commitment-bound audits — every challenge silently fell back to the legacy plain-digest path. - Fix: rebuild_and_rotate_commitment runs ONCE immediately on startup, before the sleep loop. First commit is available within seconds of startup. Subsequent rotations follow the regular 1h cadence. MAJOR #2b: "key not in commitment" wrongly counted as failure - A key recently replicated to the responder (via fresh-write hint) won't appear in the responder's commitment until the next 1h rotation. An auditor sampling that key and challenging the responder would get Rejected("key not in commitment: ..."), and the auditor would fire handle_audit_failure → trust penalty. But the responder actually HAS the bytes; only its Merkle tree is stale. - Fix: in audit_tick_with_repair_proofs, "key not in commitment" on a pinned challenge is treated as Idle (no penalty), same policy as "unknown commitment hash". Both are benign staleness signals; the next rotation will refresh the responder's tree. Strict gating retained: only applies when we DID pin, so a legacy unpinned audit cannot be bypassed. Tests - Updated commitment_state.rs unit tests for the new 4-slot retention semantics. - Updated PoC test `responder_drops_old_commitment_after_two_rotations` → renamed to `..._past_retention_window` and now rotates RETAINED_COMMITMENT_SLOTS+1 times. - 554 lib + 18 PoC pass. - cfd warning-only; deny gates clean.
… missing-bytes penalty + post-restart re-gossip CODEX ROUND-12 MAJOR #1: "key not in commitment" conflated benign staleness with real storage loss - The responder emitted Rejected("key not in commitment: ...") in two different situations: (a) the key was never in the commitment (just-replicated, awaiting next rotation) — benign; (b) the key WAS in the commitment but the responder cannot read the bytes anymore — real storage loss / withholding. Round-11 made the auditor treat both as Idle (no penalty), which meant case (b) escaped audit penalty entirely. - Fix: differentiate at the responder. Case (b) now emits Rejected reason "missing bytes for committed key: ..." and the auditor's benign-staleness branch only matches "key not in commitment", so case (b) falls through to handle_audit_failure with full penalty. MAJOR WithAutonomi#2: ML-DSA-65 signatures are randomized → pin doesn't survive restart - Commitment hash includes the signature, so rebuilding the same key set after restart produces a different hash. Pre-restart pinned audits then hit "unknown commitment hash" -> Idle until fresh gossip arrives — up to 1 h with the round-10 rotation cadence, during which time the node dodges commitment-bound audits. - The right fix would be to persist commitments to disk on rotate, but that's a meaningful change. Pragmatic alternative: after the first commitment is built on startup, trigger an immediate neighbor-sync round. The new commitment then gossips out within seconds, shrinking the recovery window from hours to sub-minute. DAVID'S PR REVIEW (round-12) MAJOR: RecentProvers lacked TTL eviction - The cache had per-key LRU-by-cap eviction but no time-based expiry. A rarely-audited key could keep stale entries indefinitely (until cap pressure evicts them). - Fix: add PROVER_ENTRY_TTL = 4h (4× the rotation interval). is_credited_holder ignores entries older than the TTL on read; new sweep_expired() reclaims their memory and is called once per rotation tick from the engine (1h cadence). MINOR: bandwidth impact undocumented - Added a "Wire size" section to StorageCommitment's docstring: ~5.3 KiB per commitment (32+4+32+1952+3293 bytes), gossiped on every NeighborSyncRequest/Response. With a close-group of 8 and bidirectional sync at the 1h rotation cadence, that's ~85 KiB/h per node — negligible against chunk-transfer bandwidth. Tests - 554 lib + 18 PoC pass. - cfd warning-only; deny gates clean. David's other points (e2e compile failures, signature verification on ingest, last_commitment_by_peer eviction, fmt/clippy/docs) were all addressed in codex rounds 5-7. The remaining items from his review are this commit's TTL + bandwidth doc.
…credit, rate limit, §3 shield
The v12 design (notes/security-findings-2026-05-22/proposal-gossip-audit-v12.md) is now fully implemented per its §§2-6 checklist:
§2 step 5 — sticky `commitment_capable` flag
- New `PeerCommitmentRecord` struct replaces the bare
`HashMap<PeerId, StorageCommitment>`. Holds last_commitment +
commitment_capable (sticky bool) + received_at + last_sig_verify_at.
- Once a peer gossips a valid commitment, capable flips to true and
never reverts. Even if we evict the cached commitment via TTL,
sybil cap, or restart, the peer is forever treated as v12-capable
until full PeerRemoved cleanup.
§2 step 3 + §11 DoS — per-peer 60s sig-verify rate limit
- `COMMITMENT_SIG_VERIFY_MIN_INTERVAL = 60s` caps ML-DSA verify
cost per peer. Checked after cheap structural gates (RT, peer-id
binding, pubkey-binding) and before the expensive sig verify.
A sybil that bypasses the RT gate (transient bucket pollution)
can no longer burn CPU with a flood of valid-looking gossips.
§3 — bootstrap-claim shield: refuse legacy fallback for capable peers
- audit_tick_with_repair_proofs now checks the peer record up front:
if commitment_capable but no cached commitment, return Idle. The
peer is fully expected to speak v12; falling back to legacy
plain-digest would let them downgrade. We wait for fresh gossip
to refresh the cache instead.
§5 (v12) — restored conditional invalidation
- Round-8 kept the pin unconditionally on Rejected("unknown
commitment hash") to prevent the legacy-fallback bypass. Now that
§3 (above) closes the bypass directly, we can implement the v12
design verbatim:
- Case 1 (lazy rotation): stored hash == rejected H → clear pin +
forget_commitment(H). recent_provers entries lose their match
basis → §6 holder-credit drops. Lazy node earns nothing.
- Case 2 (honest rotation race): stored hash != H (fresh C2
arrived in-flight) → leave it alone. Don't clobber.
- Case 3 (stale auditor): same as case 1; clear pin, wait for
fresh gossip.
§6 — holder-eligibility threaded into quorum
- New `evaluate_key_evidence_with_holder_check` variant takes a
predicate `(peer, key) -> bool`. Returning false downgrades a
Present claim to Unresolved (we don't trust "I have it" without a
recent commitment-bound audit). Paid-list evidence is independent
(it's a property of the receiving peer's own data, not a Present
claim).
- Wired into `run_verification_cycle` via
`VerificationCycleContext`: snapshots last_commitment_by_peer +
recent_provers once per cycle (cheap; bounded by RT × 16/key) and
evaluates each key against the snapshot. Synchronous predicate
avoids re-entering the locks during evaluate.
Tests
- 3 new quorum tests:
- quorum_downgrades_uncredited_present_peers
- quorum_passes_when_all_present_peers_are_credited
- paid_list_path_unaffected_by_holder_credit
- 2 new PoC tests:
- commitment_capable_flag_is_sticky_across_eviction
- capable_but_no_commitment_starts_capable
- 557 lib + 20 PoC pass.
- cfd warning-only; deny gates clean.
What's still NOT in this PR (legitimately out of scope)
- Disk persistence of commitments. ML-DSA signatures are randomized,
so the commitment hash changes across restart even for the same
key set. Mitigated by the round-12 immediate post-startup gossip
trigger (recovery window measured in seconds, not hours). Disk
persistence is a clean follow-up optimization.
- Pre-rotation event-driven rebuild on fresh-write. A just-replicated
key is currently auditable only after the next 1h rotation. The
auditor treats this as benign staleness (round-11 + round-12); no
trust penalty. Event-driven rebuild on fresh-write would close the
gap but adds wiring complexity for marginal gain.
…ect §6 TTL MEDIUM #1: rate limit only fired for peers with cached commitments - Round-12 tracked last_sig_verify_at inside PeerCommitmentRecord, but that record is only created after a successful verify. A peer we've never successfully verified could still burn ML-DSA cost on every invalid-but-structurally-plausible gossip. - Fix: new `sig_verify_attempts: HashMap<PeerId, Instant>` map, separate from last_commitment_by_peer. Stamped BEFORE the verify on EVERY attempt (success or failure). Reading + writing happens before the expensive verify, so a flood is rejected at the cheap hashmap-lookup step. Capped at MAX_LAST_COMMITMENT_BY_PEER with oldest-timestamp eviction, and dropped on PeerRemoved (same cleanup pattern as the commitment cache). - Threaded through ingest_peer_commitment → handle_replication_message → handle_sync_response → run_neighbor_sync_round → start_*_loop spawn-scope clones (3 call sites total). MEDIUM WithAutonomi#2: §6 TTL was 4h, design says 2 × max audit interval (40 min) - 4h kept holder credit alive much longer than v10/v12 §6 specifies, weakening "must re-prove under current conditions". Default max audit interval is 20 min → TTL = 40 min. - Fix: PROVER_ENTRY_TTL bumped from 4h to 40m. Doc updated to cite the v10/v12 spec line directly. Tests - 557 lib + 20 PoC pass (no test changes needed). - cfd warning-only; deny gates clean.
Round-13 used separate read + write locks for the check and the stamp. Two concurrent ingest_peer_commitment calls from the same peer could both miss the rate-limit check and both reach ML-DSA verify within the 60s window. Fix: combine into a single write-locked critical section. Read existing timestamp, compare, then insert under the same lock. The lock is held only for a hash-map lookup + insert (microseconds), never across the expensive verify itself. Tests - 557 lib + 20 PoC pass. - cfd warning-only; deny gates clean.
Summary
v12 storage-bound audit (
notes/security-findings-2026-05-22/proposal-gossip-audit-v12.md), end-to-end and design-complete. Closes Finding 1 (lazy node defeats audit via on-demand fetch) and Finding 2 (bootstrap-claim shield).This is a breaking wire change (StorageCommitment now embeds the sender's ML-DSA-65 public key; old peers will fail to decode). Auto-upgrade handles the version break.
Design checklist (v10 §§1-13 + v12 §5)
commitment_capableflagUnknownCommitmentHashinvalidationis_credited_holderthreaded into quorum (evaluate_key_evidence_with_holder_check)What this PR ships
commitment.rs,commitment_state.rs(incl. newPeerCommitmentRecord),commitment_audit.rs,recent_provers.rs.PeerRemoved.evaluate_key_evidence_with_holder_checkdowngrades uncredited Present claims to Unresolved. Wired intorun_verification_cycle.Security review
15 rounds of codex review + David's PR review. All findings addressed. Selected highlights:
BLAKE3(pubkey) == sender_peer_idsig_verify_attemptsmap stamped on every attemptThreat coverage (20 PoC tests in
tests/poc_commitment_audit_attacks.rs)AuditVerifyErrorvariantcommitment_capableflag is sticky across evictioncapable_but_no_commitmentconstructor preserves capabilityPlus 60 unit tests across the four new modules (including 3 new quorum tests for the holder-credit predicate).
Tests + CI
cargo fmt --checkclean.cargo clippy --all-targets --all-featureswarning-only; the deny gates (-D clippy::panic -D clippy::unwrap_used -D clippy::expect_used) pass.Test plan
cargo test --lib: 557 passcargo test --test poc_commitment_audit_attacks: 20 passOut-of-scope (future PRs)