Skip to content

feat(daemon): macOS DNS privacy — demote the hijacked default + scope corp domains (Phase 7d-4)#40

Merged
stslex merged 13 commits into
devfrom
phase-7d-4-macos-dns-privacy
Jun 29, 2026
Merged

feat(daemon): macOS DNS privacy — demote the hijacked default + scope corp domains (Phase 7d-4)#40
stslex merged 13 commits into
devfrom
phase-7d-4-macos-dns-privacy

Conversation

@stslex

@stslex stslex commented Jun 27, 2026

Copy link
Copy Markdown
Owner

What & why

The macOS backend only scoped the corp domains via /etc/resolver. That is insufficient against a VPN client that hijacks the system default resolver (the corp DNS is the global default, scoped to no utun, so non-corp DNS would also traverse the tunnel — the privacy leak Splitway exists to prevent). This phase adds the missing demote so macOS reaches DNS parity with Linux: the corp resolver sees only the configured corp domains; everything else resolves off-tunnel.

Implementation + synthetic-fixture tests only. No live infra was touched. The live packet-level / reconnect / revert acceptance is machine-bound and verified separately (checklist below). This gates PR #38 (dev→master) — macOS installs but does not deliver privacy against the VPN until this lands and verifies live.

How it works

  • Structural, vendor-neutral detection (detector/macos/parser.rs, detector.rs): reads the per-service DNS model — VPN-up = a network service whose resolver differs (set-wise) from the physical primary interface's own DHCP resolver. No utun keying (the active tunnel index varies between sessions), no vendor/product strings.
  • Detection reads the VPN's own service, not State:/Network/Global/DNS — load-bearing: our demote overwrites the physical service's DNS, which changes the global default. A Global-keyed detector would flip to "down" the instant the demote took effect → revert → re-hijack → oscillation. Reading the per-service signal our demote doesn't touch keeps the verdict stable (pinned by detection_survives_our_own_demote_of_the_physical_service).
  • Demote + restore (backend/macos/demote.rs, backend.rs): apply_rules snapshots the primary service's prior DNS to disk, overwrites it with the off-tunnel fallback (physical DHCP resolver, or a configured fallback_dns override), and revert_rules restores it on every exit path. Transactional across scope+demote — a demote failure rolls back the resolver scope, so the system is never left half-changed. The on-disk snapshot survives an unclean (SIGKILL) exit.
  • State machine decoupled from vpn_name on macOS (state.rs) via the existing reverts_globally() seam, at all three gating sites (desired / status.detected_dns / routing_state). Linux stays interface-keyed and unchanged. Applied now tracks demote_target so a fallback change (e.g. Wi-Fi switch) forces a re-apply; it is not added to the wire AppliedInfo (no protocol change).
  • Seams for testability: ScutilRunner (real shells scutil in script mode; fake captures the exact script + returns canned state) and SnapshotStore (real = on-disk; fake = in-memory). apply_with/revert_with take both so the scope+demote wiring — including rollback-on-demote-failure — is unit-tested with no live system.
  • Config: LocalConfig.fallback_dns: Option<Vec<String>> (serde default None) — None → detector's physical DHCP resolver; Some → pinned public resolver. Root-config-file-only; SetConfig preserves it (GUI doesn't edit it — GUI changes are out of scope).
  • VpnInfo.demote_target added (additive; Linux/OpenVPN detectors set None).

Scope boundary

DNS only — no IP-route manipulation on macOS (the client already keeps non-corp IP off the tunnel; same boundary as Linux). The GUI interface-picker becomes a benign no-op on macOS (its removal is a later GUI phase). Homebrew packaging is a later phase.

Verification (all in nix develop, rustc 1.95)

  • cargo fmt --all -- --check
  • cargo build
  • cargo clippy --all-targets -- -D warnings ✅ (0 warnings)
  • cargo test ✅ — all pass (164 daemon incl. 67 macOS-related, 54 shared, 43 gui-tauri)
  • scripts/check-no-leaked-infra.sh ✅ — synthetic placeholders only (corp.example.com, RFC 5737 IPs, utun0); no vendor/product names anywhere.

Deferred — live acceptance (machine-bound, run on the real VPN)

  1. Packet-level, both interfaces, fresh random subdomains: public queries on the physical interface, zero on the tunnel; a corp host still resolves.
  2. Event robustness: reconnect / Wi-Fi toggle / sleep-wake → the demote re-holds.
  3. Clean revert: stop / VPN-down restores the machine exactly as found.
  4. status / verify / check <corp-host> / check <public-host> report true state.

See docs/design/macos-dns-privacy.md.

🤖 Generated with Claude Code

… corp domains

The macOS backend only scoped the corp domains via /etc/resolver, which is
insufficient against a VPN client that hijacks the system *default* resolver
(corp DNS is the global default, scoped to no utun, so non-corp DNS would also
traverse the tunnel). This adds the missing demote, reaching DNS parity with
Linux: the corp resolver sees only the configured corp domains; everything else
resolves off-tunnel.

- Detection is rewritten to be structural and vendor-neutral: it reads the
  per-service DNS model (a service whose resolver differs from the physical
  link's is the VPN) instead of filtering `scutil --dns` by a utun. It reads the
  VPN signal from the VPN's own service, NOT the global default Splitway mutates,
  so our own demote cannot make detection oscillate.
- apply_rules demotes the primary service's DNS to an off-tunnel fallback (the
  physical interface's DHCP resolver, or a configured `fallback_dns` override),
  snapshotting the prior value to disk first; revert_rules restores it on every
  exit path. Transactional across scope+demote (a demote failure rolls back the
  scope); the on-disk snapshot survives an unclean exit.
- The state machine is decoupled from `vpn_name` on macOS via the existing
  reverts_globally() seam; Linux stays interface-keyed and unchanged.
- New seams (ScutilRunner, SnapshotStore) make the demote/restore wiring
  unit-testable without touching the live system; detection parsing + decision
  are pure. Tests use synthetic fixtures only.

DNS only — no IP-route manipulation (the client already splits IP; same boundary
as Linux). The GUI interface-picker becomes a benign no-op on macOS (removal is a
later GUI phase). Live packet/reconnect/revert acceptance is machine-bound and
verified separately.

See docs/design/macos-dns-privacy.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19b290fb46

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/detector/macos/parser.rs Outdated
Comment thread splitway-daemon/src/backend/macos/demote.rs Outdated
stslex and others added 2 commits June 28, 2026 00:27
…eholder

The scrub guard's internal-TLD check (check #1 in scripts/check-no-leaked-infra.sh)
flags any label followed by `.corp` (e.g. `jira.corp`), so the placeholder
`jira.corp.example.com` tripped it in CI even though it is synthetic. Plain
`corp.example.com` is fine (corp is the leading label, no preceding label+dot).
Swap the second corp test domain to `jira.example.com`, which is equally
synthetic and passes the guard. Test/doc-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…a primary-service change (Codex P1/P2)

Two correctness fixes from PR review of the macOS demote:

P1 — the demote rebuilt the primary service's DNS dict with `d.init` +
`ServerAddresses` only, dropping `InterfaceName`. The detector identified the
physical service by `InterfaceName == PrimaryInterface`, so the next detection
round after our own demote could fail to find the physical service (or pick the
VPN service if it also reported the primary interface), inverting corp/fallback
or undoing the demote — the exact oscillation the per-service model exists to
prevent. Now:
  - the demote re-adds the captured `InterfaceName` (`build_set_dns_script` takes
    it; the snapshot stores it);
  - detection anchors the physical service on the authoritative `PrimaryService`
    id (falling back to the interface name only when the id is unknown) and
    excludes it by id from the VPN-service search, so a VPN service that also
    reports the primary interface can never be mistaken for physical.

P2 — when the primary network service changes while the VPN stays up (e.g. Wi-Fi
→ Ethernet) and a snapshot already existed, `demote` skipped snapshotting the new
key but still overwrote it; a later `restore` only replayed the old snapshot,
leaving the new primary pinned to the fallback. Now a demote whose primary key
differs from the snapshot restores the old service first, then snapshots and
demotes the new one — exactly one service demoted at a time, none stranded.

Snapshot format gains a self-describing `<tag>\t<value>` form (key/iface/server)
so the optional interface line is unambiguous. New unit tests cover interface
preservation, service-id anchoring (incl. a VPN sharing the primary interface),
the interface-name fallback, and the primary-service-change restore-then-snapshot
path. Tests/synthetic-fixtures only; four Cargo checks + scrub green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ef6978c0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/detector/macos/parser.rs
Comment thread splitway-daemon/src/backend/macos/demote.rs
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

Independent deep review (multi-agent) — corrected & re-scoped

Correction (this comment was edited). My first pass was generated against a stale local checkout (2ef6978). The branch is at 4a8a1b9, and commits 23f9678..4a8a1b9 already address everything I'd originally flagged — the scutil list subKey [N] = parsing, the scope-restore on a demote-failure rollback, secondary-network scoping, DNS-dict preservation (get before override), watching State:/Network/Global/IPv4, and the installed_fallback recording. Apologies for the noise; the original findings are obsolete.

Re-reviewed against the current tree, validated to compile clippy-clean for macOS via cargo clippy --target x86_64-apple-darwin -p splitway-daemon --all-targets -- -D warnings. Of the 16 review threads, 2 are genuinely unresolved, both real P2s introduced by the latest installed_fallback work in 4a8a1b9:

1. state.rs:395 — removing fallback_dns while up doesn't re-demote to the real DHCP resolver

After a demote with an override, the detector reads the demoted value back as demote_target (the watcher dedups on it), so last_info.demote_target becomes the override we installed. When fallback_dns is then removed, effective_demote_target's fallthrough returns that same override, desired() still equals the applied target, and nothing re-demotes to the real DHCP resolver until a later network/VPN cycle. (Not a leak — the override is still off-tunnel — and the snapshot still holds the real prior for the eventual revert.)

Fix: treat a fallback_dns change as re-arm-worthy in watch_settings_changed. The re-arm reverts first (restoring the real prior DHCP from the demote snapshot), then re-samples, so the next demote targets the correct resolver. Cleanly reuses the existing switch machinery; harmless on Linux (ignores fallback_dns).

2. demote.rs:420 — a lost installed_fallback record makes a retry capture our own fallback as the prior

If the post-write installed_fallback save fails (e.g. a transient /var/run write error) after the set already succeeded, the snapshot's installed_fallback stays empty while the service shows our fallback. The next same-service demote then sees the live servers differ from the (empty) recorded installed_fallback and snapshots our own fallback as prior_servers — so a later restore writes the fallback back instead of the real DHCP resolver.

Fix: in the same-service branch, also treat the live servers as "ours" when they equal the fallback about to be (re)installed — so a retry with the same fallback is recognised regardless of whether the installed_fallback record persisted.


Both fixes are minimal and come with regression tests; I can push them to the branch if useful. The rest of the PR (redaction, CI running cargo test on macОS, atomic_write symlink-safety, the fallback_dns IPC boundary, Linux non-regression) checks out.

@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ef6978c0d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/detector/macos/detector.rs Outdated
Comment thread splitway-daemon/src/backend/macos/demote.rs
…+ DNS-dict preservation (Codex P1/P1/P2)

Three correctness fixes from PR review of the macOS DNS privacy work.

P1 — `scutil list State:/Network/Service/.*/DNS` does NOT print bare keys: each
match is a prefixed row (`subKey [n] = State:/.../DNS`). `read_dns_model` passed
the whole row to `show`, so every per-service read returned `No such key`, the
service model stayed empty, and `current_vpn_state()` reported `Down` even with
the VPN DNS present — the detector was effectively inert. Parse the key out of
each row (new `parse_list_key`, unit-tested) and filter to `State:/Network/Service/`
keys before showing them.

P1 — `decide` treated any non-physical service whose DNS differs from the
physical resolver as a VPN. On a multi-homed Mac (Ethernet primary while Wi-Fi
stays associated with its own DHCP DNS) the secondary service differs, so with no
VPN running the daemon would apply the corp domains to the secondary resolver and
report VPN up. The differing service must now also be the default-resolver
hijacker (`is_default_resolver_hijacker`): it rides the primary interface's
default route, runs on a tunnel pseudo-interface (utun/ppp/ipsec/tun/tap), or is
unscoped (no interface). A service bound to a distinct hardware interface (a
second Ethernet/Wi-Fi/cellular link, a VM/Thunderbolt bridge) is a parallel
network and is excluded. The hijacker filter precedes the difference check, so a
real VPN is still found when a secondary network is also present, regardless of
order. Matching is by interface kind (the stable BSD driver prefix), never a
vendor string, so detection stays vendor-neutral and does not read the mutable
global default (no oscillation under our own demote).

P2 — `build_set_dns_script` used `d.init`, starting from an empty dictionary, so
the `set` replaced the whole service DNS dict and dropped DHCP/VPN-provided fields
(`SearchDomains`, `DomainName`, `SupplementalMatchDomains`, `SearchOrder`) both
while demoted and on restore, breaking local search-domain behaviour until the
next network reconfiguration. The script now `get`s the service's current dict
into the working buffer before overriding only `ServerAddresses`, so unmanaged
fields ride through; `InterfaceName` is preserved by `get` (still re-added as a
belt-and-suspenders). Restore of a service that had no prior `ServerAddresses`
now `d.remove`s only our override after the `get` instead of removing the whole
key, so the rest of the dict survives.

New/updated unit tests cover list-row parsing, the hijacker predicate (incl. a
secondary physical network, a tunnel VPN alongside one, and an unscoped
hijacker), and the get-before-override / remove-only-our-servers script shapes.
Tests/synthetic-fixtures only (RFC 5737 placeholders). Verified: fmt, Linux
build/clippy/test, and x86_64-apple-darwin cargo check + clippy -D warnings
(--all-targets) all green; macOS test execution runs on CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23f96781df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/backend/macos/demote.rs
Comment thread splitway-daemon/src/backend/macos/backend.rs Outdated
…lush cache on restore-only revert (Codex P2/P2)

Two more P2 findings from the re-review of the macOS demote.

P2 — the same-service snapshot could go stale. When the same primary service
stayed active but its DHCP resolver changed (a new lease) and the watcher
re-emitted Up, `demote` kept the original snapshot, so a later `restore` wrote the
pre-change resolver back over the current DHCP one — disconnecting the VPN could
leave macOS on the previous network's DNS. The same-service branch now reads the
service's live DNS: if it differs from our fallback (the real resolver changed
under us) it refreshes the snapshot to the current resolver; if it still shows our
fallback it keeps the original (so we never snapshot our own fallback as the
"prior").

P2 — a restore-only revert skipped the cache flush. `revert_rules` flushed only
when resolver files were removed, but a revert also restores any demoted default.
If the files were already gone (e.g. a prior partial revert removed them before a
failed `scutil` restore, then a retry succeeds) the default DNS changed while
`removed == 0`, so the stale cache kept serving demoted/default answers.
`demote::restore` now reports whether it restored a snapshot, `revert_with`
returns `RevertOutcome { removed, restored_default }`, and `revert_rules` flushes
when either changed.

New/updated unit tests cover same-service keep-vs-refresh and a revert that
reports a restore with zero files removed. Tests/synthetic-fixtures only (RFC 5737
placeholders). Verified: fmt and x86_64-apple-darwin cargo check + clippy
-D warnings (--all-targets) green; macOS test execution runs on CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4f68d3583a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/backend/macos/backend.rs Outdated
Comment thread splitway-daemon/src/detector/macos/watch.rs Outdated
…back + watch Global/IPv4 (Codex P2/P2)

Two more P2 findings from the re-review.

P2 — a demote-failure rollback during a re-apply wiped the working scope.
`apply_with`'s rollback removed every managed `/etc/resolver` file, so if a
re-apply's scope write succeeded but the demote then failed (a transient `scutil`
hiccup), the previously-working split DNS — for unchanged AND dropped domains —
was deleted until a later retry, a half-configured state. `apply_to_dir` now
returns the journal of every change it made (files overwritten, created, and —
newly journaled — pruned), and `apply_with` rolls back exactly that on a demote
failure, restoring the prior scope file-for-file. `remove_managed` gained an
optional journal parameter so a prune is undoable along with the writes.

P2 — the watcher missed the Global IPv4 key that detection reads.
`current_vpn_state` reads `State:/Network/Global/IPv4` for `PrimaryService`/
`PrimaryInterface`, but the watch only armed Global DNS plus the per-service/
interface patterns. macOS can switch the primary route between already-configured
services by updating only `Global/IPv4` without touching any watched DNS key,
leaving the daemon asleep with a stale primary-service decision and demote target.
Add `State:/Network/Global/IPv4` to the watched keys.

New test: a re-apply whose demote fails restores the overwritten file's prior
content and re-creates the pruned file. Tests/synthetic-fixtures only (RFC 5737
placeholders). Verified: fmt and x86_64-apple-darwin cargo check + clippy
-D warnings (--all-targets) green; macOS test execution runs on CI. `watch.rs` is
FFI glue (intentionally not unit-tested).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4de18938a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/detector/macos/detector.rs Outdated
Comment thread splitway-daemon/src/detector/macos/state.rs
…tity + reconcile an orphaned demote at startup (Codex P2/P2)

Two more P2 findings from the re-review.

P2 — the secondary-network filter read `InterfaceName` from the wrong key. The
per-service DNS entity does not reliably carry `InterfaceName` (its schema is the
DNS fields), so a secondary Wi-Fi/Ethernet service often had none in its DNS dict
→ `read_dns_model` stored `None` → `is_default_resolver_hijacker(None)` treated it
as an unscoped hijacker → a multi-homed Mac with no VPN could be detected VPN-up
and point the corp domains at the secondary DHCP resolver. Read the interface from
the service's `State:/Network/Service/<id>/IPv4` (then IPv6) entity — the reliable
binding — falling back to the DNS dict only if neither names one.

P2 — an unclean exit could strand a demote. The on-disk demote snapshot survives a
SIGKILL, but a suppressed initial `Down` sample (`last == None`) meant that if the
daemon was killed while demoted and the VPN went down before restart, nothing
restored the default resolver or cleared the snapshot — it stayed pinned to the
off-tunnel fallback until the next VPN up→down cycle. `run_state` now reconciles
orphaned persisted state once at startup, before arming the watch, for
global-revert backends (macOS): an idempotent revert that clears any leftover
snapshot and restores the prior default, after which the watch's initial `Up` (if
any) re-applies. Per-interface backends (Linux) keep no such cross-restart state
and skip it; `applied` stays `None` (this only clears a prior process's state).

New daemon-core tests (run on Linux): the startup cleanup reverts for a
global-revert backend and is a no-op for a per-interface one. The detector
interface read is I/O glue (untested, like the rest of `read_dns_model`). Verified:
fmt, Linux clippy + daemon tests (174 pass), and x86_64-apple-darwin clippy
-D warnings (--all-targets).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81f367090f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/backend/macos/demote.rs Outdated
Comment thread splitway-daemon/src/daemon/state.rs Outdated
…change keeps the real prior + retry a failed startup demote cleanup (Codex P2/P2)

Two more P2 findings, both follow-ups on the previous re-review fixes.

P2 — a `fallback_dns` change while demoted could pin the wrong default. The
same-service snapshot refresh compared the service's live DNS against the NEW
`fallback`. When `fallback_dns` changed (F1→F2) while the VPN was active, the
service still showed our PREVIOUS fallback F1; since F1 != F2 the refresh captured
F1 as `prior_servers`, so a later restore wrote our old fallback instead of the
original DHCP resolver. `DemoteSnapshot` now records `installed_fallback` (the
fallback we actually wrote, updated only after a successful write); the refresh
compares the live DNS against that, so a previous fallback we installed is no
longer mistaken for a DHCP update. A genuine DHCP change is still adopted.

P2 — a failed startup cleanup was never retried. The startup reconcile of orphaned
demote state only logged on failure; with `applied == None`, later reconciles take
`revert()`'s no-op path, so a transient failure left the machine pinned to the
fallback until a full VPN up→down cycle. A `pending_global_cleanup` flag now
records the failure, and every `revert()` retries the global cleanup until it
succeeds (cleared on success, or once an apply establishes our own state).

New tests: a `fallback_dns` change keeps the original prior (macOS, darwin
cross-check) and a failed startup cleanup is retried on the next reconcile (daemon
core, runs on Linux). Verified: fmt, Linux clippy + daemon tests (175 pass), and
x86_64-apple-darwin clippy -D warnings (--all-targets).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a8a1b957b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/daemon/state.rs
Comment thread splitway-daemon/src/backend/macos/demote.rs
…al prior when the installed-fallback record was lost (Codex P2/P2)

Two reversibility gaps in the demote path, both around the installed_fallback
bookkeeping added in 4a8a1b9:

- Removing (or changing) `fallback_dns` while the VPN stays up did not re-demote
  to the real DHCP resolver. After a demote the detector reads the demoted value
  back as `demote_target`, so `effective_demote_target`'s fallthrough keeps
  returning the override and `desired()` stays equal to the applied target.
  Treat a `fallback_dns` change as re-arm-worthy in `watch_settings_changed`:
  the re-arm reverts first (restoring the real prior DHCP resolver from the
  demote snapshot) then re-samples, so the next demote targets the correct
  resolver. Harmless on Linux (ignores fallback_dns).

- A same-service re-demote could capture our own fallback as `prior_servers`
  when the post-write `installed_fallback` record had been lost (e.g. a
  transient /var/run write error after the `set` succeeded): the live servers
  then differ from the empty recorded fallback and look like a DHCP update. Also
  treat the live servers as ours when they equal the `fallback` about to be
  (re)installed, so a retry is recognised regardless of whether the
  installed_fallback record persisted.

Adds a regression test for each. Verified with `cargo fmt` and
`cargo clippy --target x86_64-apple-darwin -p splitway-daemon --all-targets -- -D warnings`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019HLLqoCWZb1fAqYPsw1hug
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7657bee75c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/daemon/state.rs
Follow-up on the startup orphaned-state retry: `pending_global_cleanup`'s retry
was only wired through `revert()`, which `shutdown()` skipped when `applied` is
`None`. So if the macOS startup cleanup failed transiently and the daemon was
stopped before any later reconcile/Down/disable called `revert()`, shutdown
reported the system clean while a stale demote snapshot / default DNS could
remain. `shutdown()` now calls `revert()` when `applied` is set OR a global
cleanup is pending — `revert()` runs the pending cleanup and returns `Err`
(-> unclean) if it still fails, so shutdown no longer falsely reports clean.

New daemon-core tests (run on Linux): shutdown runs the pending cleanup when
nothing is applied, and reports unclean when it still fails. Verified: fmt, Linux
clippy + daemon tests (177 pass), and x86_64-apple-darwin clippy -D warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ef6267754d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/detector/macos/detector.rs Outdated
Comment thread splitway-daemon/src/detector/macos/watch.rs
…the service IPv4/IPv6 keys (Codex P2/P2)

- read_dns_model now propagates a `scutil show` error instead of skipping the
  service. A missing/vanished key is reported as `No such key` on stdout with
  exit 0 (→ Ok, then empty servers), so an Err here is a genuine command failure
  (spawn error or non-zero exit). Skipping it silently dropped a live service
  from the model — e.g. the VPN service read after a successful `list` — letting
  `decide` conclude "down" and revert the rules. Propagating makes the watcher
  keep the last known state on a transient failure.

- The SCDynamicStore watch now also subscribes to `State:/Network/Service/.*/IPv4`
  and `/IPv6`. The detector reads each service's `InterfaceName` from those
  entities (`read_service_interface`) to classify physical vs. an unscoped default
  hijacker; a service gaining/losing its interface binding without a DNS or global
  change must re-trigger detection, else it stays classified from stale data (an
  unknown interface misread as a hijacker → false "VPN up").

Verified with `cargo fmt` and `cargo clippy --target x86_64-apple-darwin -p splitway-daemon --all-targets -- -D warnings`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019HLLqoCWZb1fAqYPsw1hug
@stslex

stslex commented Jun 28, 2026

Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e66e80f1a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread splitway-daemon/src/detector/macos/parser.rs Outdated
Comment thread splitway-daemon/src/daemon/state.rs
…+ surface a pending demote-cleanup in status (Codex P2/P2)

Two more P2s from the re-review.

P2 (parser) — when `PrimaryService` is absent, `decide` anchored the physical
service on the FIRST service matching the primary interface. A VPN service can
also report that interface, so if `scutil list` returned it first, the physical
service became the corp resolver and the VPN service became the DHCP resolver —
inverting corp/fallback (apply corp domains to the off-tunnel DNS, demote the
default to the corp DNS, a leak). The no-`PrimaryService` fallback now anchors
only when EXACTLY ONE DNS service reports the primary interface; more than one is
ambiguous -> conservative Down.

P2 (state) — `routing_state()` ignored the new `pending_global_cleanup` flag, so a
restart that left an orphaned macOS demote pending reported `VpnDown`/`Disabled`/
`NoDomains` while the default DNS could still be demoted. The flag now joins
`needs_resync`/`orphaned` in the out-of-sync check, so status reads `ApplyFailed`
until the cleanup succeeds.

Tests: an ambiguous primary-interface fallback is Down while the unambiguous one
(VPN on a tunnel) is still Up (parser, darwin cross-check); status reads
`ApplyFailed` while a global cleanup is pending (daemon core, runs on Linux).
Verified: fmt, Linux clippy + daemon tests (178 pass), and x86_64-apple-darwin
clippy -D warnings.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@stslex stslex left a comment

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent review — macOS DNS demote (Phase 7d-4)

I reviewed this PR from several angles in parallel (detector/parser classification, demote backend transactionality + snapshot lifecycle, daemon state-machine reconcile/shutdown, and config/platform/docs cross-cutting). Overall this is high-quality, well-tested work — the transactional rollback, the on-disk snapshot survival, the reverts_globally() seam, and the no-oscillation detection design are all sound, and the prior review rounds closed a lot of sharp edge cases. CI is green and no real infra data is leaked (synthetic placeholders throughout). Below is one new finding plus independent confirmation of the two threads still open.

New finding (inline)

  • P2 — fallback_dns is never validated and reaches the scutil script unescaped (backend/macos/demote.rs build_set_dns_script, + shared/config/mod.rs). Detector-sourced resolvers are safe, but a config-supplied fallback_dns value is interpolated verbatim, so a typo/whitespace/newline silently malforms the script instead of failing with a clear error. Details and suggested fix in the inline comment.

Confirming the two open threads (both still valid at e66e80f — not re-posting inline to avoid duplicate threads)

  • detector/macos/parser.rs decide (~L199–205) — ambiguous physical-service fallback. When PrimaryService is absent, physical is the first service whose InterfaceName == primary_interface. A hijacking VPN service can also report the primary interface, so if scutil list returns it first, physical becomes the VPN and the later vpn_dns becomes the real DHCP resolver → corp domains get pointed off-tunnel and the default is demoted to the corp DNS (a full inversion). This is the privacy-critical case. Worth noting: the existing test falls_back_to_interface_name_when_primary_service_is_unknown only exercises the physical-first ordering, so the inverted ordering is currently untested. Suggested direction: in the no-PrimaryService path, exclude is_default_resolver_hijacker services before picking the physical anchor (or treat multiple primary-interface DNS services as ambiguous → Down), and add a test with the VPN service listed first.
  • daemon/state.rs routing_state (L865) — pending_global_cleanup not surfaced. The out-of-sync check is self.needs_resync || !self.orphaned.is_empty(); it omits pending_global_cleanup. In the "VPN already down at restart + startup cleanup failed" case (applied == None), status reports a clean VpnDown/Disabled/NoDomains while a stale demote snapshot/default DNS may still be installed and pending retry. One-line fix: add || self.pending_global_cleanup to L865. routing_state_maps_each_branch has no assertion for this state, so a regression test would help.

Looked at and judged fine (not issues)

  • effective_demote_target correctly maps Some(vec[]) to "no override" (L394) — no empty-demote failure loop.
  • The post-write installed_fallback record failure is already defended by the !same_server_set(&current.servers, fallback) guard (L385) — re-checked, no snapshot corruption on that path.
  • atomic_write (tmp + O_EXCL + fsync + rename) makes a torn snapshot write very unlikely; serde defaults keep old configs parsing; Linux stays interface-keyed and sets demote_target = None — no regression.

Nice work overall — addressing the fallback_dns validation and the two open threads would close out the remaining gaps.


Generated by Claude Code

Comment thread splitway-daemon/src/backend/macos/demote.rs
…util demote (P2)

The configured `fallback_dns` override is folded into the demote-target and, on
macOS, appended verbatim into the root `scutil` demote script. A non-IP entry (a
typo, a hostname, whitespace, or a newline) would malform that script rather than
fail cleanly — sitting against the "a failed apply must never leave the system
half-configured" bar.

- Add `splitway_shared::config::is_ip_literal` (strict `IpAddr` parse — rejects
  surrounding/embedded whitespace and newlines, so it doubles as an injection
  guard). Pure and unit-tested.
- `effective_demote_target` now accepts the override only when every entry is an
  IP literal; otherwise it logs and falls back to the detector's demote-target
  (the physical DHCP resolver), so a bad config degrades to auto-detection rather
  than a broken/mangled demote.

Adds tests for the helper and for the override-rejection path. Verified with
`cargo fmt` and `cargo clippy --target x86_64-apple-darwin -p splitway-daemon -p splitway-shared --all-targets -- -D warnings`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_019HLLqoCWZb1fAqYPsw1hug
@stslex stslex merged commit a58c2d4 into dev Jun 29, 2026
23 checks passed
@stslex stslex deleted the phase-7d-4-macos-dns-privacy branch June 29, 2026 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant