diff --git a/ROADMAP.md b/ROADMAP.md index b064de5..200735e 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -244,6 +244,27 @@ reimplemented per frontend: [`docs/design/macos-self-install.md`](docs/design/macos-self-install.md). (Homebrew — installing the same `.app` + binaries, with no competing `service` block — is the next phase.) +- **7d-4 — macOS DNS privacy (demote + scope)**: the macOS backend reaches DNS + parity with Linux. The previous backend only *scoped* the corp domains via + `/etc/resolver`; that is insufficient against a VPN client that hijacks the + system **default** resolver (the corp resolver is the global default, scoped to + no `utun`, so non-corp DNS would also traverse the tunnel). This phase adds the + **demote**: snapshot the primary network service's DNS, overwrite it with an + off-tunnel fallback (the physical interface's own DHCP resolver, or a + configured `fallback_dns` override), and restore it on every revert path — + transactional and reversible (an on-disk snapshot survives an unclean exit). + Detection is rewritten to be **structural and vendor-neutral**: it reads the + per-service DNS model (a service whose resolver differs from the physical + link's is the VPN) rather than filtering `scutil --dns` by a `utun` — and reads + the VPN signal from the VPN's *own* service, not the global default Splitway + mutates, so the demote does not cause detection to oscillate. The state machine + is decoupled from `vpn_name` on macOS (gated via `reverts_globally()`); Linux + stays interface-keyed and unchanged. DNS only — no IP-route manipulation (the + client already splits IP; same boundary as Linux). The GUI interface-picker + becomes a benign no-op on macOS (removal is a later GUI phase). Implementation + + synthetic-fixture tests land here; the live packet-level / reconnect / revert + acceptance is machine-bound and verified separately. See + [`docs/design/macos-dns-privacy.md`](docs/design/macos-dns-privacy.md). ### Phase 8 — feature freeze + hardening diff --git a/docs/design/macos-dns-privacy.md b/docs/design/macos-dns-privacy.md new file mode 100644 index 0000000..19f9667 --- /dev/null +++ b/docs/design/macos-dns-privacy.md @@ -0,0 +1,224 @@ +# macOS DNS privacy — demote the hijacked default + scope the corp domains + +The macOS backend used to do only half of split-DNS: it wrote +`/etc/resolver/` files so the configured corp domains resolve via the +corp DNS. That is the **scope** half. It is sufficient only when the VPN client +*scopes* its DNS to the tunnel. The observed corporate VPN does the opposite — a +**global DNS hijack**: it registers the corp resolver as the system **default**, +so every query that is *not* a configured corp domain would also go to the corp +resolver, over the tunnel. That is the privacy leak Splitway exists to prevent. + +This phase adds the missing half — **demote** — so macOS reaches parity with the +Linux build: the corp resolver sees only the configured corp domains; everything +else leaves via the normal (physical) path and never traverses the tunnel. + +> **Detection is structural and vendor-neutral.** Nothing in the code, tests, or +> this document names a VPN product or its services. Detection keys on the +> *shape* of the DNS configuration (a service whose resolver differs from the +> physical link's), never on a vendor string — so it generalises across clients. + +## What the established facts were (proven on the author's Mac) + +- **The corp resolver is the system default, not tunnel-scoped.** There is no + resolver scoped to any `utun`; the active tunnel `utun` index even varies + between sessions. So the previous interface-keyed detector (filter + `scutil --dns` by a chosen `utun`) found nothing, for any interface choice. +- **Demote holds.** Overwriting the primary network service's `ServerAddresses` + to a different resolver stuck (the client did not re-assert in a tight loop). +- **Demote is real, not cosmetic.** With the default demoted, no DNS for public + names traversed the tunnel — the client does not transparently intercept `:53`. +- **`/etc/resolver/` is immune.** The client never touches those files; a + scoped resolver there takes precedence over the default for its domain. +- **IP is already split by the client** — non-corp IPs route off-tunnel — so + Splitway does **no** IP-route manipulation on macOS (unlike Linux's + default-route demote). Splitway governs DNS only; see the boundary below. + +(Synthetic stand-ins used throughout: corp DNS `192.0.2.53`, physical DHCP DNS +`198.51.100.1`, public-resolver override `203.0.113.9`, corp domains +`corp.example.com` / `jira.example.com`, interfaces `en0` / `utun0`.) + +## The mechanism: demote + scope, transactional and reversible + +On apply (VPN up, with corp domains configured), the macOS backend does both: + +1. **Scope** — write `/etc/resolver/` → corp DNS for each corp domain + (unchanged from before; on-tunnel, intended). Transactional: a mid-write + failure restores every file to its prior bytes. +2. **Demote** — overwrite the primary network service's DNS + (`State:/Network/Service//DNS` `ServerAddresses`) with an off-tunnel + **fallback** resolver, so non-corp DNS resolves off-tunnel. The prior value is + **snapshotted to disk first** so it can be restored exactly. + +Net effect: corp domains → corp DNS (on-tunnel); everything else → fallback DNS +(off-tunnel, invisible to the corp resolver). + +The two steps are transactional **across both**: if the demote fails, the +resolver scope just written is rolled back, so the system is never left +half-changed (scoped but with the default still hijacked). The apply then +surfaces the error rather than recording success. + +On revert (VPN down / disable / stop / shutdown) the backend removes every +managed `/etc/resolver` file **and** restores the demoted default from the +snapshot (then clears it). Restore runs on every exit path — the daemon already +reverts on `SIGTERM` (what `launchctl bootout` sends). + +### The fallback resolver + +The off-tunnel fallback defaults to the **physical primary interface's own DHCP +resolver** (the resolver that interface would use absent any override), which the +detector discovers. A config override — `fallback_dns` in the daemon config — +pins a specific public resolver instead (e.g. `["203.0.113.9"]`). The state +machine folds the override (if set) over the detector's value before handing the +effective fallback to the backend. The override is a root-config-file-only field +(the GUI does not edit it — out of scope this phase). + +## Structural, vendor-neutral detection + +Detection reads the SystemConfiguration dynamic store (via `scutil` in script +mode) and decides **structurally**: + +- the **primary interface** — `State:/Network/Global/IPv4` `PrimaryInterface` + (e.g. `en0`); +- every network service's DNS entry — `State:/Network/Service/.*/DNS` + (`InterfaceName` + `ServerAddresses`). + +The **physical service** is anchored authoritatively by the **primary service +id** (`State:/Network/Global/IPv4` `PrimaryService`), falling back to the primary +interface name only when the id is unknown — because a VPN service can *also* +report the primary interface name, so matching on the interface alone could pick +the wrong service and invert corp/fallback. Its resolver is the demote-target. A +**VPN service** is any service *other than the physical one* (compared by id) +whose DNS differs from the physical resolver — i.e. a non-physical resolver is in +play. VPN is **up** iff such a service exists; its resolver is the corp DNS. The +decision is a set comparison (order-insensitive), and it never references a +`utun` name or a vendor string. + +When Splitway demotes the physical service, it **re-adds that service's +`InterfaceName`** (the demote write would otherwise drop it, since it rebuilds a +minimal DNS dict). Keeping `InterfaceName` — and anchoring on the service id — +means the physical service stays identifiable on the next detection round, so our +own demote never makes detection lose the physical service (which would invert +corp/fallback or undo the demote). + +### Why detection reads per-service DNS, not `State:/Network/Global/DNS` + +This is load-bearing for stability. Splitway's *demote* overwrites the physical +primary service's DNS, which changes the **global default**. A detector keyed on +`State:/Network/Global/DNS` would therefore see the global default become the +fallback the moment our demote took effect, conclude "no VPN" (global == the +physical resolver), revert → the VPN's default returns → re-demote → **oscillation**. + +Reading the VPN's corp DNS from its *own* service entry — which our demote does +**not** touch — keeps detection stable while the demote is in effect: the verdict +is unchanged before and after our own write. A unit test +(`detection_survives_our_own_demote_of_the_physical_service`) pins this. + +## Decoupling the state machine from `vpn_name` (macOS only) + +The state machine used to gate apply/revert on +`info.interface_name == config.vpn_name`. On macOS there is no stable, DNS-scoped +VPN interface to pin (the active `utun` varies), so that gate would never pass. +The gate is now branched on the backend's existing `reverts_globally()` seam: + +- **global-revert backend (macOS)** → the interface gate is skipped; apply is + driven by the DNS-model detection the detector already decided. The advisory + `interface_name` rides along in `VpnInfo` but nothing keys on it. +- **per-interface backend (Linux)** → unchanged: the gate still requires + `interface_name == vpn_name`. + +The same branch covers the two read-only projections that used the gate +(`status().detected_dns` and `routing_state()`), so the macOS status readout is +honest. Linux behaviour is byte-for-byte unchanged (its `MockBackend` defaults +`reverts_globally` to false; all existing Linux state tests pass untouched). + +## Reconcile on event + +Reconnect / Wi-Fi toggle / sleep-wake / re-auth can re-install the corp default +or change the physical DHCP resolver. The SCDynamicStore watch fires on the +relevant DNS keys; the detector re-reads the model and re-emits `Up` whenever the +corp DNS **or** the demote-target changed (the watcher dedups only genuine +no-ops). The applied snapshot now also tracks the demote-target, so a change to +it forces a re-apply (re-demote to the new fallback) rather than being treated as +already converged. This is purely event-driven — no busy loop, no timer; one +re-apply per genuine change. + +Because the observed client did **not** re-assert in a tight loop, no +sub-second re-apply guard is needed; a re-assert that arrives as a DNS-key change +is handled by the normal event path. The demote is idempotent (re-setting the +same fallback), and the snapshot is captured only on the *first* demote of a +given service, so a re-apply never overwrites the original prior state with our +own fallback. + +**Primary-service change.** If the primary network service itself changes while +the VPN stays up (e.g. Wi-Fi → Ethernet) and a demote snapshot already exists for +the *old* service, the demote first **restores the old service** from its +snapshot (so it is not left pinned to Splitway's fallback), then snapshots and +demotes the new primary. Exactly one service is ever demoted at a time, and no +previous primary is stranded on the fallback. + +## Reversibility (the operational-safety contract) + +- The pre-demote primary-service DNS is **snapshotted to disk** before the + overwrite (`/var/run/splitway/dns-demote.snapshot`), so an unclean exit — the + daemon `SIGKILL`ed between demote and a later revert — can still be undone on + the next start. The snapshot uses `atomic_write` (intact on a crash mid-write). +- Restore rewrites exactly the snapshotted servers, or removes the key entirely + when the service had no explicit prior DNS (so SystemConfiguration repopulates + it from the real source), then clears the snapshot. +- Every system-network mutation is captured-before-write and rolled back on any + failure path, mirroring the `/etc/resolver` apply's existing discipline — the + machine is never left with a broken or half-demoted resolver. + +## Testability + +All `scutil` contact goes through two seams so the logic is unit-tested without +touching the live system: + +- detection parses the dynamic-store dumps with **pure** functions over the + synthetic-fixture dump shapes; the structural decision is a separate pure step; +- the demote/restore go through an injected `ScutilRunner` (the real impl shells + out; tests inject a fake that **captures the exact script issued** and returns + canned state) and a `SnapshotStore` (real = on-disk; tests = in-memory). The + `apply_with` / `revert_with` wiring (including the rollback-on-demote-failure) + is tested with both seams faked. + +## Scope boundary — DNS only, not IP routing + +Splitway governs **DNS**, not IP routing. The observed client's split-tunnel +already keeps non-corp **IP** traffic off the tunnel, so macOS does no route +manipulation. A full-tunnel / include-routes VPN that carried IP traffic through +the corp tunnel is **out of scope** (the same boundary as Linux): Splitway would +still split the DNS, but the IP path is the VPN client's concern. This is +deliberate and documented so the boundary is not mistaken for a gap. + +## What is not built here + +- **GUI changes** (interface-picker removal, a corp-domains / fallback-DNS UI) — + a later phase. The macOS daemon no longer depends on `vpn_name` for + correctness; the picker remains a benign no-op on macOS (the daemon + auto-detects and ignores a picked `vpn_name`). The state machine is strictly + daemon-side here. +- **Homebrew packaging** — a later phase; nothing ships until the live + acceptance below passes. +- **The live verification itself** — see below. This phase is implementation + + synthetic-fixture tests only. + +## Deferred — live acceptance (run on the real VPN, not in this phase) + +1. Packet-level, both interfaces, fresh random subdomains: public queries on the + physical interface, **zero on the tunnel**; a corp host still resolves (via + `/etc/resolver`). +2. Event robustness: reconnect / Wi-Fi toggle / sleep-wake → the demote re-holds. +3. Clean revert: stop / VPN-down restores the machine exactly as found. +4. `status` / `verify` / `check ` / `check ` report the + true state. + +## Links + +- [socket-group.md](socket-group.md) — the unprivileged-GUI access model. +- [macos-self-install.md](macos-self-install.md) — how the macOS daemon is + installed/run (the privileged bootstrap this DNS work runs under). +- [linux-default-route-catch-all.md](linux-default-route-catch-all.md) — the + Linux analogue (demote the link's DNS default-route so split-DNS holds). +- [architecture.md](../architecture.md) — the truth contract and the DNS-only + boundary. diff --git a/splitway-daemon/src/backend/linux/backend.rs b/splitway-daemon/src/backend/linux/backend.rs index aa580b2..ca93170 100644 --- a/splitway-daemon/src/backend/linux/backend.rs +++ b/splitway-daemon/src/backend/linux/backend.rs @@ -307,6 +307,7 @@ mod tests { let info = VpnInfo { interface_name: "tun0".to_string(), dns_servers: Vec::new(), + demote_target: None, }; let result = LinuxBackend.apply_rules(&info, &["corp.example.com".to_string()]); assert!( diff --git a/splitway-daemon/src/backend/macos/backend.rs b/splitway-daemon/src/backend/macos/backend.rs index 12ec67e..f26f5f7 100644 --- a/splitway-daemon/src/backend/macos/backend.rs +++ b/splitway-daemon/src/backend/macos/backend.rs @@ -13,6 +13,7 @@ use splitway_shared::config::atomic_write; use splitway_shared::ipc::{LinkDnsState, ResolutionInfo}; use splitway_shared::platform::{DnsBackend, PlatformError, VpnInfo}; +use super::demote; use super::resolver::{is_managed, is_valid_domain, resolver_contents, resolver_path}; use super::MacosBackend; @@ -26,21 +27,36 @@ impl DnsBackend for MacosBackend { "no DNS servers in VpnInfo".to_string(), )); } - apply_to_dir(Path::new(RESOLVER_DIR), &vpn_info.dns_servers, domains)?; + let fallback = vpn_info.demote_target.as_deref().filter(|f| !f.is_empty()); + apply_with( + Path::new(RESOLVER_DIR), + &vpn_info.dns_servers, + domains, + fallback, + &demote::RealScutil, + &demote::FileSnapshotStore::new(), + )?; flush_dns_cache(); Ok(()) } fn revert_rules(&self, _interface: &str) -> Result<(), PlatformError> { - // There is no per-interface resolver state on macOS — resolver files - // are keyed by domain. Reverting removes every file we own. - let removed = remove_managed(Path::new(RESOLVER_DIR), None).map_err(|e| { - PlatformError::CommandFailed(format!("failed to remove resolver files: {e}")) - })?; - if removed > 0 { + let outcome = revert_with( + Path::new(RESOLVER_DIR), + &demote::RealScutil, + &demote::FileSnapshotStore::new(), + )?; + // Flush when EITHER resolver files were removed OR a demoted default was + // restored — restoring the default changes resolution even when no files + // were removed (e.g. a retried revert after the files were already gone), + // so the cache must not keep serving the demoted/default answers. + if outcome.removed > 0 || outcome.restored_default { flush_dns_cache(); } - log::info!("reverted {removed} splitway resolver file(s)"); + log::info!( + "reverted {} splitway resolver file(s) and restored any demoted default", + outcome.removed + ); Ok(()) } @@ -98,8 +114,79 @@ impl DnsBackend for MacosBackend { } } +/// The apply pipeline, parameterized over the resolver dir and the demote seam +/// so the scope+demote wiring is unit-testable without touching the live system. +/// Two steps, transactional across both: +/// +/// 1. **Scope** — `apply_to_dir` routes `domains` to `servers` via +/// `/etc/resolver` (itself transactional; rolls back on any failure). +/// 2. **Demote** — when `fallback` is `Some` (the VPN hijacked the system +/// default), [`demote::demote`] sends non-corp DNS off-tunnel. If it fails, +/// step 1's writes are rolled back **to the exact pre-apply scope** so the +/// system is never left half-changed (scoped but with the default still +/// hijacked), and the demote error is surfaced. Rolling back this apply's +/// journal — not a blanket managed-file removal — means a re-apply whose +/// demote fails restores the previously-working split DNS instead of wiping +/// it. The demote's own snapshot still lets a later revert restore the default. +fn apply_with( + dir: &Path, + servers: &[String], + domains: &[String], + fallback: Option<&[String]>, + scutil: &dyn demote::ScutilRunner, + snapshots: &dyn demote::SnapshotStore, +) -> Result<(), PlatformError> { + let journal = apply_to_dir(dir, servers, domains)?; + + if let Some(fallback) = fallback { + if let Err(e) = demote::demote(scutil, snapshots, fallback) { + // Undo exactly this apply's scope changes — overwritten files back to + // their prior bytes, newly-created and pruned files undone — so a + // transient demote failure restores the prior scope rather than + // dropping every managed resolver file. + rollback(&journal); + return Err(e); + } + } + Ok(()) +} + +/// What a revert changed: how many managed resolver files were removed, and +/// whether a demoted system default was restored. Either being non-trivial means +/// resolution changed and the DNS cache must be flushed. +struct RevertOutcome { + removed: usize, + restored_default: bool, +} + +/// The revert pipeline, parameterized over the resolver dir and the demote seam. +/// Removes every managed `/etc/resolver` file AND restores any demoted system +/// default from the on-disk snapshot. Reports both (see [`RevertOutcome`]) so the +/// caller can flush the DNS cache when either changed. +/// +/// The restore runs even when no resolver files were removed: a prior run may +/// have demoted without (or after pruning) resolver files, and the snapshot is +/// the record of record. A restore failure is surfaced (not swallowed) so the +/// caller retries rather than recording a clean revert over a still-demoted +/// default. +fn revert_with( + dir: &Path, + scutil: &dyn demote::ScutilRunner, + snapshots: &dyn demote::SnapshotStore, +) -> Result { + let removed = remove_managed(dir, None, None).map_err(|e| { + PlatformError::CommandFailed(format!("failed to remove resolver files: {e}")) + })?; + let restored_default = demote::restore(scutil, snapshots)?; + Ok(RevertOutcome { + removed, + restored_default, + }) +} + /// Prior on-disk state of a resolver file before this apply touched it, used to /// undo a failed apply. +#[derive(Debug)] enum Prior { /// The file existed; these are its bytes, restored on rollback. Existed(Vec), @@ -118,7 +205,15 @@ enum Prior { /// never a partial or empty set. Then prunes our now-unwanted files; a prune /// failure is returned (not swallowed) so the caller retries rather than /// recording success while a dropped domain's file keeps routing. -fn apply_to_dir(dir: &Path, servers: &[String], domains: &[String]) -> Result<(), PlatformError> { +/// +/// On success returns the **journal** of every change made (files overwritten, +/// created, and pruned), so a caller (the apply pipeline) can undo exactly this +/// reconcile — restoring the prior scope — if a *later* step (the demote) fails. +fn apply_to_dir( + dir: &Path, + servers: &[String], + domains: &[String], +) -> Result, PlatformError> { // Reject names that are not safe single path components before touching the // filesystem: `dir.join("../x")` would escape /etc/resolver, and a // slash-containing name would never match the single-component filenames @@ -201,20 +296,17 @@ fn apply_to_dir(dir: &Path, servers: &[String], domains: &[String]) -> Result<() let keep: BTreeSet<&str> = domains.iter().map(String::as_str).collect(); // Surface prune failures: a leftover file for a dropped domain keeps routing - // it through the VPN. Roll back this call's writes first — on a failed - // reconcile the state machine does not record `applied`, so newly-created - // files left behind would be skipped by a later revert. This undoes only - // this call's writes; a partially-completed prune is not un-deleted, but - // prune only ever removes files that are being dropped or are already stale, - // so those deletions are safe to leave. The result stays consistent with the - // retained `applied` state. - if let Err(e) = remove_managed(dir, Some(&keep)) { + // it through the VPN. The prune records each removed file into `written` (as + // `Prior::Existed`), so on a prune failure — or a later demote failure — the + // rollback restores the pruned files too, leaving the prior scope exactly as + // it was rather than a partial set. + if let Err(e) = remove_managed(dir, Some(&keep), Some(&mut written)) { rollback(&written); return Err(PlatformError::CommandFailed(format!( "failed to prune stale resolver files: {e}" ))); } - Ok(()) + Ok(written) } /// Undo the writes recorded in `written`, most recent first: restore each @@ -238,7 +330,15 @@ fn rollback(written: &[(PathBuf, Prior)]) { /// filename is in `keep` (when `Some`). A missing `dir` is treated as "nothing /// to remove". Returns how many files were removed. Files we do not own (no /// marker) and unreadable files are left untouched. -fn remove_managed(dir: &Path, keep: Option<&BTreeSet<&str>>) -> io::Result { +/// +/// When `journal` is `Some`, each removed file is recorded as +/// `(path, Prior::Existed(bytes))` so a later [`rollback`] can re-create it — the +/// apply pipeline uses this so a prune is undoable along with the writes. +fn remove_managed( + dir: &Path, + keep: Option<&BTreeSet<&str>>, + mut journal: Option<&mut Vec<(PathBuf, Prior)>>, +) -> io::Result { let entries = match fs::read_dir(dir) { Ok(entries) => entries, Err(e) if e.kind() == io::ErrorKind::NotFound => return Ok(0), @@ -262,11 +362,16 @@ fn remove_managed(dir: &Path, keep: Option<&BTreeSet<&str>>) -> io::Result { + // Only touch files we wrote; never a user-authored resolver file. Read + // the bytes (not a lossy string) so a journaled entry restores them + // exactly on rollback. + match fs::read(&path) { + Ok(bytes) if is_managed(&String::from_utf8_lossy(&bytes)) => { fs::remove_file(&path)?; removed += 1; + if let Some(journal) = journal.as_deref_mut() { + journal.push((path, Prior::Existed(bytes))); + } } _ => {} } @@ -373,6 +478,226 @@ mod tests { vec!["10.0.0.1".to_string(), "10.0.0.2".to_string()] } + // --- scope + demote wiring (apply_with / revert_with) -------------------- + // + // These exercise the two-step pipeline with the demote seam faked, so they + // assert BOTH the /etc/resolver files written and the exact scutil script + // issued — without touching the live system. + + use super::demote::test_support::{FakeScutil, MemSnapshots}; + + #[test] + fn apply_with_scopes_and_demotes_to_the_fallback() { + let dir = temp_dir("apply-with-demote"); + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + let corp = vec!["192.0.2.53".to_string()]; + let fallback = vec!["198.51.100.1".to_string()]; + + apply_with( + &dir, + &corp, + &[ + "corp.example.com".to_string(), + "jira.example.com".to_string(), + ], + Some(&fallback), + &scutil, + &snaps, + ) + .unwrap(); + + // Scope: a managed resolver file per corp domain, pointing at corp DNS. + for domain in ["corp.example.com", "jira.example.com"] { + let body = fs::read_to_string(dir.join(domain)).unwrap(); + assert!(is_managed(&body)); + assert!(body.contains("nameserver 192.0.2.53")); + } + // Demote: exactly the set-fallback script on the primary service, and + // the prior default snapshotted for restore. + let scripts = scutil.scripts.borrow(); + assert_eq!(scripts.len(), 1); + assert!(scripts[0].contains("d.add ServerAddresses * 198.51.100.1\n")); + assert!(scripts[0].contains("set State:/Network/Service/ABC/DNS\n")); + drop(scripts); + assert_eq!( + snaps.slot.borrow().as_ref().unwrap().prior_servers, + vec!["198.51.100.1".to_string()] + ); + } + + #[test] + fn apply_with_no_fallback_scopes_only_no_demote() { + let dir = temp_dir("apply-scope-only"); + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + apply_with( + &dir, + &["192.0.2.53".to_string()], + &["corp.example.com".to_string()], + None, + &scutil, + &snaps, + ) + .unwrap(); + assert!(dir.join("corp.example.com").exists()); + // No demote issued, nothing snapshotted. + assert!(scutil.scripts.borrow().is_empty()); + assert!(snaps.slot.borrow().is_none()); + } + + #[test] + fn apply_with_rolls_back_the_scope_when_the_demote_fails() { + // Transactional across both steps: a demote failure must undo the + // resolver scope so the system is never left half-changed. + let dir = temp_dir("apply-rollback"); + let scutil = FakeScutil::up(); + scutil.fail_on_set(); + let snaps = MemSnapshots::default(); + + let result = apply_with( + &dir, + &["192.0.2.53".to_string()], + &["corp.example.com".to_string()], + Some(&["198.51.100.1".to_string()]), + &scutil, + &snaps, + ); + assert!(result.is_err(), "a demote failure fails the whole apply"); + // The scope was rolled back — no managed resolver file remains. + assert!( + !dir.join("corp.example.com").exists(), + "the resolver scope must be rolled back on demote failure" + ); + } + + #[test] + fn demote_failure_on_a_reapply_restores_the_previous_scope() { + // A re-apply over a previously-working scope whose demote then fails must + // restore the PRIOR files (overwritten ones back to their old content, + // pruned ones re-created), not wipe every managed resolver file. + let dir = temp_dir("apply-reapply-rollback"); + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + // First apply (no demote): two managed files become the working scope. + apply_with( + &dir, + &["192.0.2.53".to_string()], + &[ + "corp.example.com".to_string(), + "old.example.com".to_string(), + ], + None, + &scutil, + &snaps, + ) + .unwrap(); + let corp_before = fs::read_to_string(dir.join("corp.example.com")).unwrap(); + assert!(dir.join("old.example.com").exists()); + + // Re-apply with new servers, dropping old.example.com — but the demote fails. + scutil.fail_on_set(); + let result = apply_with( + &dir, + &["192.0.2.99".to_string()], + &["corp.example.com".to_string()], + Some(&["198.51.100.1".to_string()]), + &scutil, + &snaps, + ); + assert!(result.is_err(), "the demote failure fails the apply"); + // The previous scope is restored exactly: corp.example.com back to its + // prior content, and the pruned old.example.com re-created. + assert_eq!( + fs::read_to_string(dir.join("corp.example.com")).unwrap(), + corp_before, + "the overwritten file is restored to its pre-reapply content" + ); + assert!( + dir.join("old.example.com").exists(), + "the pruned file is restored on rollback — not left dropped" + ); + fs::remove_dir_all(&dir).unwrap(); + } + + #[test] + fn revert_with_removes_files_and_restores_the_default() { + let dir = temp_dir("revert-with"); + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + // Apply (scope + demote), then revert. + apply_with( + &dir, + &["192.0.2.53".to_string()], + &["corp.example.com".to_string()], + Some(&["198.51.100.1".to_string()]), + &scutil, + &snaps, + ) + .unwrap(); + scutil.scripts.borrow_mut().clear(); + + let outcome = revert_with(&dir, &scutil, &snaps).unwrap(); + assert_eq!( + outcome.removed, 1, + "the one managed resolver file is removed" + ); + assert!(outcome.restored_default, "the demoted default was restored"); + assert!(!dir.join("corp.example.com").exists()); + // The default was restored to the snapshotted prior servers, and the + // snapshot cleared. + let scripts = scutil.scripts.borrow(); + assert_eq!(scripts.len(), 1); + assert!(scripts[0].contains("d.add ServerAddresses * 198.51.100.1\n")); + drop(scripts); + assert!(snaps.slot.borrow().is_none()); + } + + #[test] + fn revert_with_nothing_applied_is_a_clean_noop() { + let dir = temp_dir("revert-noop"); + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + let outcome = revert_with(&dir, &scutil, &snaps).unwrap(); + assert_eq!(outcome.removed, 0); + assert!(!outcome.restored_default); + assert!(scutil.scripts.borrow().is_empty()); + } + + #[test] + fn revert_with_reports_a_restore_even_when_no_files_were_removed() { + // P2: the resolver files were already gone (removed == 0) but a demote + // snapshot still exists (e.g. a prior revert removed the files, then the + // scutil restore failed; this retry restores the default). revert_with must + // report the restore so the caller still flushes the DNS cache. + let dir = temp_dir("revert-restore-only"); + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + apply_with( + &dir, + &["192.0.2.53".to_string()], + &["corp.example.com".to_string()], + Some(&["198.51.100.1".to_string()]), + &scutil, + &snaps, + ) + .unwrap(); + // Simulate the resolver file already being gone from a prior partial revert. + fs::remove_file(dir.join("corp.example.com")).unwrap(); + scutil.scripts.borrow_mut().clear(); + + let outcome = revert_with(&dir, &scutil, &snaps).unwrap(); + assert_eq!(outcome.removed, 0, "no resolver files left to remove"); + assert!( + outcome.restored_default, + "the demoted default was still restored → caller must flush" + ); + assert!( + snaps.slot.borrow().is_none(), + "snapshot cleared after restore" + ); + } + #[test] fn apply_writes_one_marked_file_per_domain() { let dir = temp_dir("apply-writes"); @@ -425,7 +750,7 @@ mod tests { let user_file = dir.join("user.example"); fs::write(&user_file, "nameserver 9.9.9.9\n").unwrap(); - let removed = remove_managed(&dir, None).unwrap(); + let removed = remove_managed(&dir, None, None).unwrap(); assert_eq!(removed, 1); assert!(!dir.join("a.com").exists(), "our file is removed"); assert!(user_file.exists(), "the user's file is untouched"); @@ -437,7 +762,7 @@ mod tests { let mut missing = std::env::temp_dir(); missing.push(format!("splitway-absent-resolver-{}", std::process::id())); let _ = fs::remove_dir_all(&missing); - assert_eq!(remove_managed(&missing, None).unwrap(), 0); + assert_eq!(remove_managed(&missing, None, None).unwrap(), 0); } #[test] diff --git a/splitway-daemon/src/backend/macos/demote.rs b/splitway-daemon/src/backend/macos/demote.rs new file mode 100644 index 0000000..835acaa --- /dev/null +++ b/splitway-daemon/src/backend/macos/demote.rs @@ -0,0 +1,1012 @@ +//! Demoting a hijacked system **default** resolver so non-corp DNS resolves +//! off-tunnel — the second half of macOS DNS privacy (the first half being the +//! per-domain `/etc/resolver` scope in [`super::backend`]). +//! +//! # Why a demote is needed +//! +//! When the VPN client registers the corp resolver as the system *default* +//! (rather than scoping it to the tunnel), every query that is **not** for a +//! corp domain would otherwise go to the corp resolver, over the tunnel — +//! exactly the privacy leak Splitway exists to prevent. Demoting overwrites the +//! primary network service's `ServerAddresses` (the source of the global +//! default) with an off-tunnel **fallback** resolver, so only the corp domains +//! (pinned by the `/etc/resolver` files) reach the corp resolver. +//! +//! # Reversibility (the load-bearing property) +//! +//! The demote must never leave the machine with a broken or half-changed +//! resolver. Before overwriting, the prior `ServerAddresses` is **snapshotted to +//! disk** ([`SnapshotStore`]); restore rewrites exactly that. The snapshot lives +//! on disk (not just in memory) so an unclean exit — the daemon SIGKILLed +//! between demote and a later revert — can still be undone on the next start. +//! +//! # Testability +//! +//! All `scutil` contact goes through the [`ScutilRunner`] seam: the real impl +//! shells out, tests inject a fake that *captures the exact script issued* and +//! returns canned state, so the demote/restore logic is unit-tested without +//! touching the live system. The script text itself is built by pure functions +//! ([`build_set_dns_script`] / `build_*`) that are tested directly. + +use std::fs; +use std::io::Write; +use std::path::PathBuf; +use std::process::{Command, Stdio}; + +use splitway_shared::platform::PlatformError; + +/// The dynamic-store key holding the primary network service id we demote. +const GLOBAL_IPV4_KEY: &str = "State:/Network/Global/IPv4"; + +/// The snapshot file: the pre-demote primary-service DNS, captured so revert can +/// restore it byte-for-byte even across an unclean daemon exit. Lives under the +/// daemon's runtime dir (root-owned, like the socket). +pub(super) const SNAPSHOT_PATH: &str = "/var/run/splitway/dns-demote.snapshot"; + +/// A captured pre-demote state: which service was demoted, the `InterfaceName` +/// it was bound to, and the `ServerAddresses` it had before. +/// +/// Serialised to [`SNAPSHOT_PATH`] as a tiny self-describing line format (no +/// serde dependency pulled in for one struct): each line is `\t`, +/// with tags `key` (the service DNS key, exactly once), `iface` (the +/// `InterfaceName`, at most once — omitted when the service had none), `server` +/// (one per prior resolver, in order), and `fallback` (one per installed-fallback +/// resolver, in order). The tags make the optional `iface` line unambiguous even +/// when there are no servers. +#[derive(Debug, Clone, PartialEq, Eq)] +pub(super) struct DemoteSnapshot { + /// The full `State:/Network/Service//DNS` key that was overwritten. + pub service_dns_key: String, + /// The `InterfaceName` the service was bound to before the demote, if any. + /// Re-applied on demote so the demoted service stays identifiable as the + /// physical service by the detector (a `d.init` write would otherwise drop + /// it — see [`build_set_dns_script`] and the detector's `decide`). + pub interface_name: Option, + /// The `ServerAddresses` the service had before the demote (possibly empty + /// if the service carried no explicit servers — restore then clears ours). + pub prior_servers: Vec, + /// The fallback resolver Splitway last installed on this service. Lets a + /// same-service re-demote tell our own fallback (which we must NOT capture as + /// the "prior") from a real DHCP update — even after a `fallback_dns` config + /// change, where the service may still show a *previous* fallback we wrote. + /// Updated only after a fallback write succeeds. + pub installed_fallback: Vec, +} + +impl DemoteSnapshot { + /// Serialise to the on-disk `\t` line format. + fn serialize(&self) -> String { + let mut out = String::new(); + out.push_str("key\t"); + out.push_str(&self.service_dns_key); + out.push('\n'); + if let Some(iface) = &self.interface_name { + out.push_str("iface\t"); + out.push_str(iface); + out.push('\n'); + } + for s in &self.prior_servers { + out.push_str("server\t"); + out.push_str(s); + out.push('\n'); + } + for f in &self.installed_fallback { + out.push_str("fallback\t"); + out.push_str(f); + out.push('\n'); + } + out + } + + /// Parse the on-disk `\t` line format. Requires a `key` line; + /// `iface` is optional; `server` and `fallback` lines are collected in order. + /// Returns `None` for an empty/garbled snapshot (no `key`). + fn deserialize(text: &str) -> Option { + let mut service_dns_key: Option = None; + let mut interface_name: Option = None; + let mut prior_servers: Vec = Vec::new(); + let mut installed_fallback: Vec = Vec::new(); + for line in text.lines() { + let line = line.trim_end_matches(['\r', '\n']); + let Some((tag, value)) = line.split_once('\t') else { + continue; // skip a malformed line rather than failing the whole load + }; + match tag { + "key" => service_dns_key = Some(value.to_string()), + "iface" if !value.is_empty() => interface_name = Some(value.to_string()), + "server" if !value.is_empty() => prior_servers.push(value.to_string()), + "fallback" if !value.is_empty() => installed_fallback.push(value.to_string()), + _ => {} + } + } + Some(DemoteSnapshot { + service_dns_key: service_dns_key?, + interface_name, + prior_servers, + installed_fallback, + }) + } +} + +/// The current DNS dict of one service, as read from +/// `State:/Network/Service//DNS` — the bits the demote must snapshot and +/// preserve. +#[derive(Debug, Clone, Default, PartialEq, Eq)] +pub(super) struct ServiceDnsState { + /// The `InterfaceName` the service is bound to, if present. + pub interface_name: Option, + /// The service's `ServerAddresses`. + pub servers: Vec, +} + +/// The seam over `scutil`: read the bits the demote needs, and run a script. +/// The real impl ([`RealScutil`]) shells out; tests inject a fake. +pub(super) trait ScutilRunner { + /// The primary network service id from `State:/Network/Global/IPv4`'s + /// `PrimaryService`, or `None` if there is no primary service (offline). + fn primary_service(&self) -> Result, PlatformError>; + + /// The current DNS dict (servers + `InterfaceName`) of + /// `State:/Network/Service//DNS`. + fn service_dns_state(&self, service_dns_key: &str) -> Result; + + /// Run a `scutil` script (the text piped to `scutil`'s stdin). + fn run_script(&self, script: &str) -> Result<(), PlatformError>; +} + +/// Build the `scutil` script that overrides `key`'s `ServerAddresses`, leaving +/// every other field of the service's DNS dictionary intact. +/// +/// `ServerAddresses` is the only field Splitway manages; the rest of the dict +/// (`SearchDomains`, `DomainName`, `SupplementalMatchDomains`, `SearchOrder`, +/// `InterfaceName`, …) belongs to DHCP / the VPN. So the script `get`s the +/// service's CURRENT dictionary into the working buffer first, then overrides +/// only `ServerAddresses`, then writes the merged dict back. A bare `d.init` +/// would instead start from an empty dictionary and the `set` would *replace* the +/// whole dict, dropping those unmanaged fields until the next network +/// reconfiguration repopulated them (breaking local search-domain / supplemental +/// DNS behaviour while Splitway is active). On a missing key `get` leaves the +/// (empty) initialised dict, so a fresh service is still created. +/// +/// - `servers` non-empty → override `ServerAddresses` (demote to the fallback, or +/// restore the prior resolvers). +/// - `servers` empty → `d.remove ServerAddresses`, dropping only *our* override +/// while keeping the rest of the dict (restore of a service that had no explicit +/// prior `ServerAddresses`). +/// +/// `InterfaceName` is preserved automatically by `get`; it is additionally +/// re-added when known as a belt-and-suspenders guarantee that the demoted +/// *physical* service stays identifiable to the detector (which anchors it by +/// `InterfaceName == PrimaryInterface`; see the detector's `decide`), even on the +/// missing-key path where `get` loaded nothing. +pub(super) fn build_set_dns_script( + key: &str, + servers: &[String], + interface_name: Option<&str>, +) -> String { + let mut script = String::new(); + script.push_str("open\n"); + script.push_str("d.init\n"); + // Load the live dict so unmanaged fields ride through unchanged. + script.push_str(&format!("get {key}\n")); + if !servers.is_empty() { + script.push_str("d.add ServerAddresses *"); + for s in servers { + script.push(' '); + script.push_str(s); + } + script.push('\n'); + } else { + // Drop only our ServerAddresses override; keep every other field. + script.push_str("d.remove ServerAddresses\n"); + } + if let Some(iface) = interface_name { + // A single-value key (no `*`): the interface the service is bound to. + script.push_str(&format!("d.add InterfaceName {iface}\n")); + } + script.push_str(&format!("set {key}\n")); + script.push_str("quit\n"); + script +} + +/// The `State:/Network/Service//DNS` key for a service id. +pub(super) fn service_dns_key(service_id: &str) -> String { + format!("State:/Network/Service/{service_id}/DNS") +} + +/// The real `scutil`-backed [`ScutilRunner`]. +pub(super) struct RealScutil; + +impl ScutilRunner for RealScutil { + fn primary_service(&self) -> Result, PlatformError> { + let dump = run_scutil(&format!("show {GLOBAL_IPV4_KEY}\n"))?; + Ok(crate::detector::macos_parse_scalar_field( + &dump, + "PrimaryService", + )) + } + + fn service_dns_state(&self, service_dns_key: &str) -> Result { + let dump = run_scutil(&format!("show {service_dns_key}\n"))?; + Ok(ServiceDnsState { + interface_name: crate::detector::macos_parse_scalar_field(&dump, "InterfaceName"), + servers: crate::detector::macos_parse_array_field(&dump, "ServerAddresses"), + }) + } + + fn run_script(&self, script: &str) -> Result<(), PlatformError> { + run_scutil(script).map(|_| ()) + } +} + +/// Pipe `script` to `scutil`'s stdin and return its stdout. +fn run_scutil(script: &str) -> Result { + let mut child = Command::new("scutil") + .stdin(Stdio::piped()) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .spawn() + .map_err(|e| PlatformError::CommandFailed(format!("failed to spawn scutil: {e}")))?; + { + let mut stdin = child.stdin.take().ok_or_else(|| { + PlatformError::CommandFailed("scutil stdin was not captured".to_string()) + })?; + stdin + .write_all(script.as_bytes()) + .map_err(|e| PlatformError::CommandFailed(format!("writing scutil script: {e}")))?; + } + let output = child + .wait_with_output() + .map_err(|e| PlatformError::CommandFailed(format!("waiting for scutil: {e}")))?; + if !output.status.success() { + return Err(PlatformError::CommandFailed(format!( + "scutil exited with {}: {}", + output.status, + String::from_utf8_lossy(&output.stderr).trim() + ))); + } + Ok(String::from_utf8_lossy(&output.stdout).into_owned()) +} + +/// Reads/writes the on-disk demote snapshot. A trait so tests inject a temp-dir +/// store; the real impl uses [`SNAPSHOT_PATH`]. +pub(super) trait SnapshotStore { + fn load(&self) -> Option; + fn save(&self, snapshot: &DemoteSnapshot) -> Result<(), PlatformError>; + fn clear(&self); +} + +/// The real on-disk snapshot store at [`SNAPSHOT_PATH`]. +pub(super) struct FileSnapshotStore { + path: PathBuf, +} + +impl FileSnapshotStore { + pub(super) fn new() -> Self { + FileSnapshotStore { + path: PathBuf::from(SNAPSHOT_PATH), + } + } +} + +impl SnapshotStore for FileSnapshotStore { + fn load(&self) -> Option { + let text = fs::read_to_string(&self.path).ok()?; + DemoteSnapshot::deserialize(&text) + } + + fn save(&self, snapshot: &DemoteSnapshot) -> Result<(), PlatformError> { + if let Some(parent) = self.path.parent() { + fs::create_dir_all(parent).map_err(|e| { + PlatformError::CommandFailed(format!( + "creating snapshot dir {}: {e}", + parent.display() + )) + })?; + } + // atomic_write keeps the snapshot intact on a crash mid-write. + splitway_shared::config::atomic_write(&self.path, snapshot.serialize().as_bytes()) + .map_err(|e| PlatformError::CommandFailed(format!("writing demote snapshot: {e}"))) + } + + fn clear(&self) { + let _ = fs::remove_file(&self.path); + } +} + +/// Order-insensitive equality of two resolver lists (treated as sets) — used to +/// tell whether a service's live DNS is still our fallback or a new real resolver. +fn same_server_set(a: &[String], b: &[String]) -> bool { + if a.len() != b.len() { + return false; + } + let mut a: Vec<&String> = a.iter().collect(); + let mut b: Vec<&String> = b.iter().collect(); + a.sort(); + b.sort(); + a == b +} + +/// Demote the system default resolver to `fallback`, snapshotting the prior +/// primary-service DNS first so [`restore`] can undo it. Idempotent: re-running +/// with the same fallback is a no-op-equivalent (it re-sets the same servers); +/// the snapshot is only captured on the *first* demote (when none exists), so a +/// re-demote never overwrites the original prior state with our own fallback. +/// +/// Returns `Ok(false)` (not an error) when there is no primary service to demote +/// — the caller treats that as "nothing to do". +pub(super) fn demote( + scutil: &dyn ScutilRunner, + snapshots: &dyn SnapshotStore, + fallback: &[String], +) -> Result { + if fallback.is_empty() { + return Err(PlatformError::CommandFailed( + "refusing to demote the system default to an empty resolver set".to_string(), + )); + } + let Some(service_id) = scutil.primary_service()? else { + log::warn!("no primary network service; skipping default-DNS demote"); + return Ok(false); + }; + let key = service_dns_key(&service_id); + + // Snapshot handling, keyed by the service we are about to demote: + // + // - No snapshot yet → capture this service's prior DNS (the first demote). + // - A snapshot for THIS SAME service → keep it, UNLESS the service's live DNS + // has changed to something other than our fallback (a DHCP renewal handed the + // same service a new resolver and the watcher re-emitted Up): then refresh + // the snapshot to the current resolver, so a later restore writes back the + // latest service DNS rather than the stale pre-change one. A reconcile + // re-apply where the service still shows our fallback keeps the snapshot (we + // must never capture our own fallback as the "prior"). + // - A snapshot for a DIFFERENT service (the primary changed while up — e.g. a + // Wi-Fi↔Ethernet switch) → restore that previous service from its snapshot + // first, so it is not left stranded on our fallback, then capture and demote + // the new primary. Exactly one service is ever demoted at a time. + match snapshots.load() { + Some(existing) if existing.service_dns_key == key => { + let current = scutil.service_dns_state(&key)?; + // Keep the original prior while the service still shows a fallback WE + // installed — never capture our own fallback. We recognise "ours" two + // ways: the recorded `installed_fallback` (handles a `fallback_dns` + // change, where the service still shows a *previous* fallback we wrote) + // AND the `fallback` we are about to (re)install (handles a retry after + // the post-write `installed_fallback` record below failed, leaving it + // stale/empty while the service already shows this fallback — without + // this clause the retry would mistake our own fallback for a DHCP + // update and snapshot it as the prior, losing the real resolver). A + // value that is neither empty nor one of those is a genuine DHCP update + // on the same service → adopt it as the new prior so a later restore + // writes it. + if !current.servers.is_empty() + && !same_server_set(¤t.servers, &existing.installed_fallback) + && !same_server_set(¤t.servers, fallback) + { + snapshots.save(&DemoteSnapshot { + service_dns_key: key.clone(), + interface_name: current.interface_name, + prior_servers: current.servers, + // Preserved here; the post-write step below reconciles it to + // the fallback actually installed. + installed_fallback: existing.installed_fallback, + })?; + } + // else: keep the existing snapshot unchanged. + } + Some(existing) => { + // Primary changed: un-demote the old service, then snapshot the new. + restore_snapshot(scutil, &existing)?; + snapshots.clear(); + capture_snapshot(scutil, snapshots, &key)?; + } + None => { + capture_snapshot(scutil, snapshots, &key)?; + } + } + + // Re-apply the interface name we captured so the demoted (physical) service + // stays identifiable to the detector (see build_set_dns_script). + let interface_name = snapshots.load().and_then(|s| s.interface_name); + scutil.run_script(&build_set_dns_script( + &key, + fallback, + interface_name.as_deref(), + ))?; + + // Record the fallback now on the service so the NEXT same-service demote + // recognises it as ours, even across a `fallback_dns` change. Done AFTER a + // successful write so a failed write leaves the previously-recorded fallback + // (still the value on the service) intact. + if let Some(mut snapshot) = snapshots.load() { + if !same_server_set(&snapshot.installed_fallback, fallback) { + snapshot.installed_fallback = fallback.to_vec(); + snapshots.save(&snapshot)?; + } + } + + log::info!("demoted the system default resolver to the off-tunnel fallback"); + Ok(true) +} + +/// Capture `key`'s current DNS dict (servers + interface name) into the +/// snapshot store, before it is overwritten by a demote. `installed_fallback` is +/// left empty here and set by the demote's post-write step once the fallback is +/// actually on the service. +fn capture_snapshot( + scutil: &dyn ScutilRunner, + snapshots: &dyn SnapshotStore, + key: &str, +) -> Result<(), PlatformError> { + let state = scutil.service_dns_state(key)?; + snapshots.save(&DemoteSnapshot { + service_dns_key: key.to_string(), + interface_name: state.interface_name, + prior_servers: state.servers, + installed_fallback: Vec::new(), + }) +} + +/// Restore the demoted default from the snapshot, then clear it. Returns whether +/// a snapshot was present and restored (so the caller can flush the DNS cache +/// even when no resolver files were removed). A missing snapshot is a clean no-op +/// → `Ok(false)`. Always clears the snapshot on success so a later demote re-snaps. +pub(super) fn restore( + scutil: &dyn ScutilRunner, + snapshots: &dyn SnapshotStore, +) -> Result { + let Some(snapshot) = snapshots.load() else { + return Ok(false); // nothing demoted + }; + restore_snapshot(scutil, &snapshot)?; + snapshots.clear(); + log::info!("restored the system default resolver from the demote snapshot"); + Ok(true) +} + +/// Run the `scutil` write that restores one snapshot's service to its prior DNS. +/// Restores the prior `ServerAddresses` when there were any, otherwise removes +/// only our `ServerAddresses` override — either way `get`-preserving the rest of +/// the service's live DNS dict (so DHCP-provided SearchDomains/etc. survive the +/// round-trip). Does not touch the snapshot store — the caller owns clearing it. +fn restore_snapshot( + scutil: &dyn ScutilRunner, + snapshot: &DemoteSnapshot, +) -> Result<(), PlatformError> { + let script = build_set_dns_script( + &snapshot.service_dns_key, + &snapshot.prior_servers, + snapshot.interface_name.as_deref(), + ); + scutil.run_script(&script) +} + +/// Test doubles for the demote seam, shared by this module's tests and the +/// backend's apply/revert wiring tests. +#[cfg(test)] +pub(super) mod test_support { + use super::*; + use std::cell::RefCell; + + /// A fake `scutil` that captures every script run and returns canned state. + /// `primary` is interior-mutable so a test can model the primary service + /// changing while up (the P2 case). `service_states` maps a service DNS key + /// to the dict the fake reports for it; an unmapped key falls back to + /// `default_state`. + pub(in super::super) struct FakeScutil { + pub primary: RefCell>, + pub default_state: ServiceDnsState, + pub service_states: RefCell>, + pub scripts: RefCell>, + pub fail_on_set: std::cell::Cell, + } + + impl FakeScutil { + /// A primary service ("ABC", on en0) with one prior resolver — the + /// common case. + pub fn up() -> Self { + FakeScutil { + primary: RefCell::new(Some("ABC".to_string())), + default_state: ServiceDnsState { + interface_name: Some("en0".to_string()), + servers: vec!["198.51.100.1".to_string()], + }, + service_states: RefCell::new(std::collections::HashMap::new()), + scripts: RefCell::new(Vec::new()), + fail_on_set: std::cell::Cell::new(false), + } + } + + /// Make subsequent `set` scripts fail (to test rollback paths). + pub fn fail_on_set(&self) { + self.fail_on_set.set(true); + } + + /// Point `primary_service()` at a different service id (P2: the primary + /// changed while up). + pub fn set_primary(&self, id: &str) { + *self.primary.borrow_mut() = Some(id.to_string()); + } + + /// Script the DNS dict the fake reports for a specific service key. + pub fn set_service_state(&self, key: &str, state: ServiceDnsState) { + self.service_states + .borrow_mut() + .insert(key.to_string(), state); + } + } + + impl ScutilRunner for FakeScutil { + fn primary_service(&self) -> Result, PlatformError> { + Ok(self.primary.borrow().clone()) + } + fn service_dns_state(&self, key: &str) -> Result { + Ok(self + .service_states + .borrow() + .get(key) + .cloned() + .unwrap_or_else(|| self.default_state.clone())) + } + fn run_script(&self, script: &str) -> Result<(), PlatformError> { + if self.fail_on_set.get() && script.contains("set ") { + return Err(PlatformError::CommandFailed("simulated set failure".into())); + } + self.scripts.borrow_mut().push(script.to_string()); + Ok(()) + } + } + + /// An in-memory snapshot store for tests. + #[derive(Default)] + pub(in super::super) struct MemSnapshots { + pub slot: RefCell>, + } + impl SnapshotStore for MemSnapshots { + fn load(&self) -> Option { + self.slot.borrow().clone() + } + fn save(&self, snapshot: &DemoteSnapshot) -> Result<(), PlatformError> { + *self.slot.borrow_mut() = Some(snapshot.clone()); + Ok(()) + } + fn clear(&self) { + *self.slot.borrow_mut() = None; + } + } +} + +#[cfg(test)] +mod tests { + use super::test_support::{FakeScutil, MemSnapshots}; + use super::*; + + fn snap( + key: &str, + iface: Option<&str>, + servers: &[&str], + installed: &[&str], + ) -> DemoteSnapshot { + DemoteSnapshot { + service_dns_key: key.to_string(), + interface_name: iface.map(str::to_string), + prior_servers: servers.iter().map(|s| s.to_string()).collect(), + installed_fallback: installed.iter().map(|s| s.to_string()).collect(), + } + } + + // --- pure script builders ------------------------------------------------- + + #[test] + fn set_dns_script_sets_server_addresses_and_preserves_the_interface() { + let script = build_set_dns_script( + "State:/Network/Service/ABC/DNS", + &["198.51.100.1".to_string()], + Some("en0"), + ); + assert_eq!( + script, + "open\n\ + d.init\n\ + get State:/Network/Service/ABC/DNS\n\ + d.add ServerAddresses * 198.51.100.1\n\ + d.add InterfaceName en0\n\ + set State:/Network/Service/ABC/DNS\n\ + quit\n" + ); + } + + #[test] + fn set_dns_script_gets_the_live_dict_before_overriding_to_preserve_fields() { + // The P2 invariant: `get` the existing dict BEFORE overriding + // ServerAddresses so unmanaged DHCP/VPN fields (SearchDomains, DomainName, + // SupplementalMatchDomains, …) ride through instead of being replaced by a + // minimal dict. + let script = build_set_dns_script( + "State:/Network/Service/ABC/DNS", + &["203.0.113.9".to_string()], + Some("en0"), + ); + let get_at = script.find("get State:/Network/Service/ABC/DNS\n").unwrap(); + let override_at = script.find("d.add ServerAddresses").unwrap(); + assert!( + get_at < override_at, + "must load the live dict with `get` before overriding ServerAddresses" + ); + // No bare-init minimal write that would replace the whole dict. + assert!(!script.contains("d.init\nd.add ServerAddresses")); + } + + #[test] + fn set_dns_script_omits_the_interface_when_unknown() { + let script = build_set_dns_script( + "State:/Network/Service/ABC/DNS", + &["198.51.100.1".to_string()], + None, + ); + assert!(!script.contains("InterfaceName")); + assert!(script.contains("d.add ServerAddresses * 198.51.100.1\n")); + } + + #[test] + fn set_dns_script_handles_multiple_servers() { + let script = build_set_dns_script( + "State:/Network/Service/ABC/DNS", + &["198.51.100.1".to_string(), "198.51.100.2".to_string()], + Some("en0"), + ); + assert!(script.contains("d.add ServerAddresses * 198.51.100.1 198.51.100.2\n")); + } + + #[test] + fn set_dns_script_with_no_servers_removes_only_our_server_addresses() { + // No prior servers → drop our ServerAddresses override (after `get`-ing the + // live dict) rather than adding any, keeping every other field intact. + let script = build_set_dns_script("State:/Network/Service/ABC/DNS", &[], Some("en0")); + assert!(script.contains("get State:/Network/Service/ABC/DNS\n")); + assert!(script.contains("d.remove ServerAddresses\n")); + assert!(!script.contains("d.add ServerAddresses")); + assert!(script.contains("d.add InterfaceName en0\n")); + assert!(script.contains("set State:/Network/Service/ABC/DNS\n")); + } + + #[test] + fn service_dns_key_is_well_formed() { + assert_eq!( + service_dns_key("ABC-123"), + "State:/Network/Service/ABC-123/DNS" + ); + } + + #[test] + fn snapshot_round_trips_through_disk_format() { + let s = snap( + "State:/Network/Service/ABC/DNS", + Some("en0"), + &["198.51.100.1", "198.51.100.2"], + &["203.0.113.9", "203.0.113.10"], + ); + assert_eq!(DemoteSnapshot::deserialize(&s.serialize()).unwrap(), s); + } + + #[test] + fn snapshot_round_trips_with_no_prior_servers_and_no_interface() { + let s = snap("State:/Network/Service/ABC/DNS", None, &[], &[]); + assert_eq!(DemoteSnapshot::deserialize(&s.serialize()).unwrap(), s); + } + + #[test] + fn snapshot_round_trips_with_an_interface_but_no_servers() { + let s = snap( + "State:/Network/Service/ABC/DNS", + Some("en0"), + &[], + &["203.0.113.9"], + ); + assert_eq!(DemoteSnapshot::deserialize(&s.serialize()).unwrap(), s); + } + + #[test] + fn snapshot_deserialize_rejects_garbage_without_a_key() { + assert!(DemoteSnapshot::deserialize("").is_none()); + assert!(DemoteSnapshot::deserialize("server\t1.2.3.4\n").is_none()); + } + + // --- demote / restore orchestration -------------------------------------- + + #[test] + fn demote_snapshots_prior_then_sets_the_fallback_preserving_the_interface() { + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + let did = demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + assert!(did); + // Prior state captured for the primary service, including its interface. + assert_eq!( + snaps.load().unwrap(), + snap( + "State:/Network/Service/ABC/DNS", + Some("en0"), + &["198.51.100.1"], + &["203.0.113.9"], + ) + ); + // Exactly one script issued: set the fallback AND re-add InterfaceName so + // the demoted physical service stays identifiable (P1 fix). + let scripts = scutil.scripts.borrow(); + assert_eq!(scripts.len(), 1); + assert!(scripts[0].contains("d.add ServerAddresses * 203.0.113.9\n")); + assert!( + scripts[0].contains("d.add InterfaceName en0\n"), + "the demote must preserve InterfaceName so detection still finds the physical service" + ); + assert!(scripts[0].contains("set State:/Network/Service/ABC/DNS\n")); + } + + #[test] + fn redemote_same_service_still_on_our_fallback_keeps_the_snapshot() { + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + let after_first = snaps.load().unwrap(); + // The demote took effect: the service now reports OUR fallback as its DNS. + scutil.set_service_state( + "State:/Network/Service/ABC/DNS", + ServiceDnsState { + interface_name: Some("en0".to_string()), + servers: vec!["203.0.113.9".to_string()], + }, + ); + // A second demote (same primary, still our fallback) must keep the ORIGINAL + // prior, not snapshot our own fallback. + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + assert_eq!(snaps.load().unwrap(), after_first); + assert_eq!( + after_first.prior_servers, + vec!["198.51.100.1".to_string()], + "the snapshot must remain the real prior resolver, never our fallback" + ); + } + + #[test] + fn redemote_same_service_refreshes_the_snapshot_when_the_dhcp_dns_changed() { + // P2: the SAME primary service stays active but its DHCP resolver changes + // (a new lease overwrites our fallback) and the watcher re-emits Up. The + // snapshot must refresh to the NEW resolver, so a later restore writes back + // the latest service DNS instead of pinning the pre-change one. + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + // DHCP hands the same service a new resolver. + scutil.set_service_state( + "State:/Network/Service/ABC/DNS", + ServiceDnsState { + interface_name: Some("en0".to_string()), + servers: vec!["198.51.100.77".to_string()], + }, + ); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + assert_eq!( + snaps.load().unwrap(), + snap( + "State:/Network/Service/ABC/DNS", + Some("en0"), + &["198.51.100.77"], + &["203.0.113.9"], + ), + "the snapshot must track the new DHCP resolver so restore is not stale" + ); + } + + #[test] + fn redemote_after_a_fallback_change_keeps_the_original_prior() { + // P2: `fallback_dns` changes from one resolver to another while demoted, so + // the service still shows our PREVIOUS fallback. That previous fallback must + // not be mistaken for a DHCP update and captured as the prior — restore must + // still write the real DHCP resolver, not our old fallback. + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + // The demote took effect: the service shows our first fallback. + scutil.set_service_state( + "State:/Network/Service/ABC/DNS", + ServiceDnsState { + interface_name: Some("en0".to_string()), + servers: vec!["203.0.113.9".to_string()], + }, + ); + // fallback_dns changes to a different resolver; re-demote. + demote(&scutil, &snaps, &["203.0.113.50".to_string()]).unwrap(); + let after = snaps.load().unwrap(); + assert_eq!( + after.prior_servers, + vec!["198.51.100.1".to_string()], + "the prior must stay the real DHCP resolver, never our previous fallback" + ); + assert_eq!( + after.installed_fallback, + vec!["203.0.113.50".to_string()], + "the recorded installed fallback tracks the newly applied one" + ); + } + + #[test] + fn redemote_keeps_prior_when_the_installed_fallback_record_was_lost() { + // P2: the post-write `installed_fallback` record can fail (e.g. a transient + // `/var/run` write error) after the demote's `set` already succeeded, leaving + // the snapshot's installed_fallback empty while the service already shows our + // fallback. A retry with the SAME fallback must still recognise that value as + // ours and keep the real DHCP prior — not snapshot our own fallback (which a + // later restore would then write back as if it were the prior resolver). + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + // Seed the post-failure state: the real DHCP prior is captured, but the + // installed_fallback record was lost (empty), and the service already shows + // the fallback the previous `set` wrote. + snaps + .save(&snap( + "State:/Network/Service/ABC/DNS", + Some("en0"), + &["198.51.100.1"], + &[], + )) + .unwrap(); + scutil.set_service_state( + "State:/Network/Service/ABC/DNS", + ServiceDnsState { + interface_name: Some("en0".to_string()), + servers: vec!["203.0.113.9".to_string()], + }, + ); + // Retry the demote with the SAME fallback. + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + let after = snaps.load().unwrap(); + assert_eq!( + after.prior_servers, + vec!["198.51.100.1".to_string()], + "a retry must not capture our own fallback as the prior when the \ + installed_fallback record was lost" + ); + assert_eq!( + after.installed_fallback, + vec!["203.0.113.9".to_string()], + "the post-write step repairs the lost installed_fallback record" + ); + } + + #[test] + fn demote_on_a_changed_primary_restores_the_old_then_snapshots_the_new() { + // P2: the primary service changes while up (e.g. Wi-Fi → Ethernet) with a + // snapshot already present. The old service must be un-demoted (not left + // stranded on our fallback), and the NEW service snapshotted + demoted. + let scutil = FakeScutil::up(); // primary = ABC (en0) + let snaps = MemSnapshots::default(); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + scutil.scripts.borrow_mut().clear(); + + // The primary switches to XYZ on en1 with its own DHCP resolver. + scutil.set_primary("XYZ"); + scutil.set_service_state( + "State:/Network/Service/XYZ/DNS", + ServiceDnsState { + interface_name: Some("en1".to_string()), + servers: vec!["198.51.100.50".to_string()], + }, + ); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + + // The snapshot now tracks the NEW primary with its real prior DNS. + assert_eq!( + snaps.load().unwrap(), + snap( + "State:/Network/Service/XYZ/DNS", + Some("en1"), + &["198.51.100.50"], + &["203.0.113.9"], + ) + ); + let scripts = scutil.scripts.borrow(); + // First the OLD service is restored to its real prior (198.51.100.1), + // then the NEW service is demoted to the fallback. + assert!( + scripts + .iter() + .any(|s| s.contains("set State:/Network/Service/ABC/DNS") + && s.contains("d.add ServerAddresses * 198.51.100.1")), + "the previous primary must be restored to its real DNS, not left on the fallback" + ); + assert!( + scripts + .iter() + .any(|s| s.contains("set State:/Network/Service/XYZ/DNS") + && s.contains("d.add ServerAddresses * 203.0.113.9")), + "the new primary must be demoted to the fallback" + ); + } + + #[test] + fn demote_with_no_primary_service_is_a_clean_noop() { + let scutil = FakeScutil::up(); + *scutil.primary.borrow_mut() = None; + let snaps = MemSnapshots::default(); + let did = demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + assert!(!did); + assert!(snaps.load().is_none()); + assert!(scutil.scripts.borrow().is_empty()); + } + + #[test] + fn demote_to_empty_fallback_is_rejected() { + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + assert!(demote(&scutil, &snaps, &[]).is_err()); + // Nothing captured or issued on rejection. + assert!(snaps.load().is_none()); + assert!(scutil.scripts.borrow().is_empty()); + } + + #[test] + fn restore_sets_prior_servers_with_the_interface_and_clears_snapshot() { + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + scutil.scripts.borrow_mut().clear(); + + restore(&scutil, &snaps).unwrap(); + let scripts = scutil.scripts.borrow(); + assert_eq!(scripts.len(), 1); + // Restored the original servers AND the interface name... + assert!(scripts[0].contains("d.add ServerAddresses * 198.51.100.1\n")); + assert!(scripts[0].contains("d.add InterfaceName en0\n")); + // ...and cleared the snapshot so a later demote re-snaps. + drop(scripts); + assert!(snaps.load().is_none()); + } + + #[test] + fn restore_clears_only_our_servers_when_there_were_no_prior_ones() { + // A service with no explicit prior ServerAddresses: restore drops just our + // override (d.remove ServerAddresses) after `get`-ing the live dict, so any + // DHCP-provided SearchDomains/etc. are preserved rather than wiped. + let scutil = FakeScutil::up(); + scutil.set_service_state( + "State:/Network/Service/ABC/DNS", + ServiceDnsState { + interface_name: Some("en0".to_string()), + servers: vec![], + }, + ); + let snaps = MemSnapshots::default(); + demote(&scutil, &snaps, &["203.0.113.9".to_string()]).unwrap(); + scutil.scripts.borrow_mut().clear(); + + restore(&scutil, &snaps).unwrap(); + let scripts = scutil.scripts.borrow(); + assert_eq!(scripts.len(), 1); + assert!(scripts[0].contains("get State:/Network/Service/ABC/DNS\n")); + assert!(scripts[0].contains("d.remove ServerAddresses\n")); + assert!(!scripts[0].contains("d.add ServerAddresses")); + assert!(scripts[0].contains("set State:/Network/Service/ABC/DNS\n")); + } + + #[test] + fn restore_with_no_snapshot_is_a_clean_noop() { + let scutil = FakeScutil::up(); + let snaps = MemSnapshots::default(); + restore(&scutil, &snaps).unwrap(); + assert!(scutil.scripts.borrow().is_empty()); + } + + #[test] + fn demote_propagates_a_set_failure_after_snapshotting() { + // If the set fails, the snapshot is already on disk, so a later restore + // can still recover — the demote surfaces the error rather than masking. + let scutil = FakeScutil::up(); + scutil.fail_on_set(); + let snaps = MemSnapshots::default(); + let result = demote(&scutil, &snaps, &["203.0.113.9".to_string()]); + assert!(result.is_err()); + // Snapshot persisted so the half-done demote is recoverable on revert. + assert!(snaps.load().is_some()); + } +} diff --git a/splitway-daemon/src/backend/macos/mod.rs b/splitway-daemon/src/backend/macos/mod.rs index f23754d..50da20d 100644 --- a/splitway-daemon/src/backend/macos/mod.rs +++ b/splitway-daemon/src/backend/macos/mod.rs @@ -5,6 +5,7 @@ //! actual filesystem work lives in [`backend`]. mod backend; +mod demote; mod resolver; pub struct MacosBackend; diff --git a/splitway-daemon/src/daemon/state.rs b/splitway-daemon/src/daemon/state.rs index 7e19ad3..13e9ef3 100644 --- a/splitway-daemon/src/daemon/state.rs +++ b/splitway-daemon/src/daemon/state.rs @@ -132,12 +132,18 @@ pub enum StateCommand { /// A snapshot of what is currently applied to the system. Includes the DNS /// servers so that a VPN DNS rotation (same interface and domains, different /// servers) is seen as out-of-sync and re-applied rather than treated as -/// already converged. +/// already converged. Likewise includes the demote-target (macOS off-tunnel +/// fallback) so a change to it — e.g. the physical DHCP resolver changing after +/// a Wi-Fi switch — also forces a re-apply (re-demote to the new fallback) +/// rather than being treated as converged. The demote-target is *not* part of +/// the wire `AppliedInfo` projection (it is a backend-internal concern, no +/// protocol change). #[derive(Debug, Clone, PartialEq, Eq)] struct Applied { interface: String, domains: Vec, dns_servers: Vec, + demote_target: Option>, } /// Project the internal applied snapshot to its wire form. Defined once and used @@ -204,6 +210,13 @@ pub struct StateMachine { /// desired target equals the — now possibly stale — `applied` snapshot, so a /// post-failure "already converged" check can never skip a needed re-apply. needs_resync: bool, + /// Set when the startup reconcile of orphaned persisted state (a global-revert + /// backend's demote left by a previous unclean exit) failed transiently. With + /// `applied == None`, an ordinary `revert()` is a no-op, so without this the + /// orphaned state would linger until a full VPN up→down cycle. While set, each + /// `revert()` retries the global cleanup until it succeeds. See + /// [`Self::cleanup_orphaned_state_on_startup`]. + pending_global_cleanup: bool, /// Cancel handle for the current watch's forwarding task. Aborting it drops /// the detector's `Receiver`, which closes the channel and lets the /// detector release its resources (see [`Self::arm_watch`]). @@ -238,6 +251,7 @@ impl StateMachine { applied: None, orphaned: Vec::new(), needs_resync: false, + pending_global_cleanup: false, watch_cancel: None, watch_generation: 0, // No watch is armed until `arm_watch` runs (in `run_state`, before @@ -349,20 +363,68 @@ impl StateMachine { /// never set (e.g. one an OpenVPN up-script installed). If a *prior* session /// had DNS and the new one does not, this reverts that prior session's rules. /// - /// The last event's interface must also match the configured `vpn_name`. - /// A config change that switches `vpn_name` resets `last_info`/`vpn_up` and - /// re-arms the watch (see [`Self::adopt_config`]), so the old interface is - /// reverted and the new watch resamples; `last_info.interface_name` therefore - /// matches `vpn_name` whenever the configured interface is up. + /// On interface-keyed backends (Linux) the last event's interface must also + /// match the configured `vpn_name`. A config change that switches `vpn_name` + /// resets `last_info`/`vpn_up` and re-arms the watch (see + /// [`Self::adopt_config`]), so the old interface is reverted and the new + /// watch resamples; `last_info.interface_name` therefore matches `vpn_name` + /// whenever the configured interface is up. + /// + /// On a **global-revert backend** (macOS) detection is driven by the system + /// DNS model, not an interface name — the VPN's DNS is the hijacked system + /// default, scoped to no `utun`, and the active tunnel index varies between + /// sessions. There the interface name is advisory and must NOT gate apply, so + /// this check is skipped (the backend gates on its own DNS-model signal). + /// Linux behaviour is unchanged. + fn interface_gate_passes(&self, info: &VpnInfo) -> bool { + if self.backend.reverts_globally() { + true + } else { + info.interface_name == self.config.vpn_name + } + } + + /// The off-tunnel fallback resolver to fold into the `VpnInfo` handed to the + /// backend: the configured `fallback_dns` override if set **and every entry is + /// a valid IP literal**, else the detector-supplied `demote_target` (the + /// physical interface's own DHCP resolver). A malformed override is ignored — + /// it would otherwise reach the root `scutil` demote script verbatim. `None` on + /// a backend/VPN that does not demote. The backend applies the demote only when + /// this is `Some(non-empty)`. + fn effective_demote_target(&self, info: &VpnInfo) -> Option> { + match &self.config.fallback_dns { + // A configured override wins — but only when every entry is a real IP + // literal. The values are appended verbatim into the root `scutil` + // demote script, so a non-IP entry (a typo, a hostname, whitespace, an + // empty element) would malform the script rather than fail cleanly. + // Reject the whole override in that case and fall back to the detector's + // demote-target, logging so the bad config is visible. + Some(servers) if !servers.is_empty() => { + if servers.iter().all(|s| config::is_ip_literal(s)) { + Some(servers.clone()) + } else { + log::warn!( + "ignoring fallback_dns: every entry must be an IP literal; \ + using the detected physical resolver instead" + ); + info.demote_target.clone() + } + } + _ => info.demote_target.clone(), + } + } + fn desired(&self) -> Option<(VpnInfo, Vec)> { let active = self.config.enabled && self.vpn_up && !self.config.vpn_hosts.is_empty(); match &self.last_info { Some(info) - if active - && !info.dns_servers.is_empty() - && info.interface_name == self.config.vpn_name => + if active && !info.dns_servers.is_empty() && self.interface_gate_passes(info) => { - Some((info.clone(), self.config.vpn_hosts.clone())) + // Fold the configured fallback override (if any) into the + // demote-target so the backend receives the effective fallback. + let mut info = info.clone(); + info.demote_target = self.effective_demote_target(&info); + Some((info, self.config.vpn_hosts.clone())) } _ => None, } @@ -459,6 +521,9 @@ impl StateMachine { interface: info.interface_name.clone(), domains, dns_servers: info.dns_servers.clone(), + // Already folded with the `fallback_dns` override by + // `desired()`; a change here forces a re-apply (re-demote). + demote_target: info.demote_target.clone(), }; // A matching snapshot only means "already converged" when the // last apply/revert actually succeeded; after a failure the @@ -483,6 +548,10 @@ impl StateMachine { ); self.applied = Some(target); self.needs_resync = false; + // Applying establishes our own state (on macOS the demote + // subsumes any prior process's snapshot), so a previously + // pending orphaned-cleanup is no longer outstanding. + self.pending_global_cleanup = false; Ok(()) } Ok(Err(e)) => { @@ -523,8 +592,16 @@ impl StateMachine { /// `applied` is left set, so a later reconcile or shutdown retries it. async fn revert(&mut self) -> Result<(), PlatformError> { let Some(applied) = self.applied.clone() else { - // Nothing recorded as applied: the system already matches the - // "reverted" goal, so any prior uncertainty is resolved. + // Nothing recorded as applied. A previous process's orphaned global + // demote may still need cleaning (a startup reconcile that failed + // transiently) — retry it here so a Down/disable/reload converges it + // instead of waiting for a full VPN up→down cycle. + if self.pending_global_cleanup { + self.run_global_cleanup().await?; + self.pending_global_cleanup = false; + } + // The system already matches the "reverted" goal, so any prior + // uncertainty is resolved. self.needs_resync = false; return Ok(()); }; @@ -558,6 +635,57 @@ impl StateMachine { } } + /// Reconcile orphaned persisted state left by a previous unclean exit, once at + /// startup (before the watch is armed). + /// + /// A **global-revert backend** (macOS) persists its default-DNS demote on disk + /// so it survives a SIGKILL. If the daemon was killed while demoted and the VPN + /// then went down before this restart, the watch's initial sample is `Down` + /// and is suppressed (nothing changed from the watcher's view), so without this + /// nothing would restore the default resolver or clear the snapshot — the + /// machine would stay pinned to the off-tunnel fallback until the next VPN + /// up→down cycle. A global revert is idempotent (a no-op when nothing was + /// persisted); and if a VPN is in fact up, the watch's initial `Up` re-applies + /// immediately after. Per-interface backends (Linux) keep no cross-restart + /// persisted state of this kind, so they skip it (their revert needs a live + /// `applied` interface anyway). `applied` stays `None` either way — this only + /// cleans state a *previous* process left behind. + async fn cleanup_orphaned_state_on_startup(&mut self) { + if !self.backend.reverts_globally() { + return; + } + if self.run_global_cleanup().await.is_err() { + // A transient failure must not strand the orphaned state: mark it so + // every later `revert()` retries until it succeeds. + self.pending_global_cleanup = true; + } + } + + /// Run one global cleanup revert (used by the startup reconcile and the + /// retry-on-`revert` path). Logs the outcome and returns the backend result. + async fn run_global_cleanup(&self) -> Result<(), PlatformError> { + let backend = self.backend.clone(); + let interface = self.config.vpn_name.clone(); + match tokio::task::spawn_blocking(move || backend.revert_rules(&interface)).await { + Ok(Ok(())) => { + log::debug!("reconciled any orphaned default-DNS demote state"); + Ok(()) + } + Ok(Err(e)) => { + log::warn!( + "orphaned-state cleanup failed: {e}; it will be retried on the next reconcile" + ); + Err(e) + } + Err(e) => { + log::error!("orphaned-state cleanup task panicked: {e}"); + Err(PlatformError::CommandFailed(format!( + "orphaned-state cleanup task panicked: {e}" + ))) + } + } + } + /// Entry point for a forwarded detector event. Drops events from a watch /// generation we have since superseded (an interface switch may leave an /// in-flight event from the old forwarding task), so the old interface's @@ -659,10 +787,14 @@ impl StateMachine { // intentionally ignored: shutdown reports cleanliness from the // `self.orphaned` check below, which covers the same lingering set. let _ = self.revert_orphaned().await; - let applied_clean = if self.applied.is_none() { - true - } else { - log::info!("shutdown: reverting active rules"); + // Revert applied rules, and ALSO run a pending global cleanup (a macOS + // startup reconcile that failed transiently) so a stale demote snapshot / + // default DNS is not left behind when `applied` is `None`. `revert()` + // covers both: it reverts when `applied` is set, and runs the pending + // global cleanup otherwise — returning `Err` (→ unclean) if it still fails, + // so shutdown does not falsely report clean while a demote may remain. + let applied_clean = if self.applied.is_some() || self.pending_global_cleanup { + log::info!("shutdown: reverting active rules and any pending cleanup"); match self.revert().await { Ok(()) => true, Err(e) => { @@ -670,6 +802,8 @@ impl StateMachine { false } } + } else { + true }; if applied_clean && self.orphaned.is_empty() { log::info!("shutdown: system left clean"); @@ -700,16 +834,17 @@ impl StateMachine { // surfaced regardless of *apply* state so a client can show the // interface's resolver read-only (the DNS-auto model). Sourced from // the last detector reading, gated on (a) the interface still being - // up and (b) its name matching the configured `vpn_name` — the same - // guards `desired()`/`routing_state()` use. The `vpn_up` gate matters - // because a `Down` event only flips `vpn_up` and leaves `last_info` - // populated, so without it a disconnected VPN would keep reporting its - // last DNS as "detected". Empty when no interface is configured, it is - // down, or it pushes no DNS. + // up and (b) the interface gate — its name matching `vpn_name` on + // Linux, or always (DNS-model-driven) on macOS — the same guard + // `desired()`/`routing_state()` use. The `vpn_up` gate matters because + // a `Down` event only flips `vpn_up` and leaves `last_info` populated, + // so without it a disconnected VPN would keep reporting its last DNS + // as "detected". Empty when no interface is configured, it is down, or + // it pushes no DNS. detected_dns: self .last_info .as_ref() - .filter(|info| self.vpn_up && info.interface_name == self.config.vpn_name) + .filter(|info| self.vpn_up && self.interface_gate_passes(info)) .map(|info| info.dns_servers.clone()) .unwrap_or_default(), detector_health: self.detector_health.clone(), @@ -729,12 +864,15 @@ impl StateMachine { /// occur) is reported as `NoDnsFromVpn` rather than a near-unreachable state. /// /// A known out-of-sync condition takes precedence over every "inactive" - /// reason: a failed apply/revert (`needs_resync`) or an interface orphaned by - /// a failed switch (`orphaned` non-empty) means stale split-DNS rules may - /// still be installed somewhere, so reporting `Disabled`/`Applied`/etc. would - /// claim a clean state the daemon does not believe in. Both surface as - /// `ApplyFailed` ("out of sync") until cleanup succeeds — e.g. a `Disable` - /// whose revert failed reads `ApplyFailed`, not `Disabled`. + /// reason: a failed apply/revert (`needs_resync`), an interface orphaned by a + /// failed switch (`orphaned` non-empty), or a pending global cleanup (a macOS + /// startup demote-reconcile that failed transiently — `pending_global_cleanup`) + /// each means stale split-DNS rules or a stale demoted default may still be + /// installed, so reporting `Disabled`/`VpnDown`/`Applied`/etc. would claim a + /// clean state the daemon does not believe in. All surface as `ApplyFailed` + /// ("out of sync") until cleanup succeeds — e.g. a `Disable` whose revert + /// failed reads `ApplyFailed`, not `Disabled`; and a restart that could not + /// clear an orphaned demote reads `ApplyFailed`, not `VpnDown`. fn routing_state(&self) -> RoutingState { // A config file that does not parse takes precedence over everything: // the daemon froze on the last-good config, and the user must learn @@ -743,9 +881,12 @@ impl StateMachine { if self.config_invalid { return RoutingState::ConfigInvalid; } - // Out-of-sync (a failed apply/revert, or a lingering orphaned interface) - // overrides the inactive-reason branches below — see the doc above. - if self.needs_resync || !self.orphaned.is_empty() { + // Out-of-sync (a failed apply/revert, a lingering orphaned interface, or a + // pending macOS demote-cleanup) overrides the inactive-reason branches + // below — see the doc above. Without the pending-cleanup term, a restart + // that left an orphaned demote pending would report `VpnDown` while the + // default DNS is still demoted. + if self.needs_resync || !self.orphaned.is_empty() || self.pending_global_cleanup { return RoutingState::ApplyFailed; } if !self.config.enabled { @@ -757,11 +898,10 @@ impl StateMachine { if !self.vpn_up { return RoutingState::VpnDown; } - let has_vpn_dns = matches!( - &self.last_info, - Some(info) - if !info.dns_servers.is_empty() && info.interface_name == self.config.vpn_name - ); + let has_vpn_dns = self + .last_info + .as_ref() + .is_some_and(|info| !info.dns_servers.is_empty() && self.interface_gate_passes(info)); if !has_vpn_dns { return RoutingState::NoDnsFromVpn; } @@ -1238,10 +1378,25 @@ impl StateMachine { } /// Whether a config delta requires re-arming the detector watch — i.e. it -/// touches a field the watch is keyed on. Domain/`enabled` edits do not (those -/// only change `desired()`), so they reconcile without tearing the watch down. +/// touches a field the watch is keyed on, OR a field whose change needs the +/// revert + re-sample that a re-arm performs. Domain/`enabled` edits do not +/// (those only change `desired()`), so they reconcile without tearing the watch +/// down. +/// +/// `fallback_dns` is included for the revert + re-sample, not because the watch +/// is keyed on it: after a macOS demote the detector reads the demoted value back +/// as the demote-target, so `last_info.demote_target` becomes the override we +/// installed. Without a fresh sample, removing or changing the override would +/// leave the physical service pinned to the old fallback (`desired()` still +/// equals the applied target, so nothing re-applies until the next network/VPN +/// cycle). The re-arm reverts first — restoring the real prior DHCP resolver from +/// the demote snapshot — then re-samples, so the next demote targets the correct +/// resolver. (Harmless on Linux, which ignores `fallback_dns`.) fn watch_settings_changed(old: &LocalConfig, new: &LocalConfig) -> bool { - old.vpn_name != new.vpn_name || old.vpn_backend != new.vpn_backend || old.openvpn != new.openvpn + old.vpn_name != new.vpn_name + || old.vpn_backend != new.vpn_backend + || old.openvpn != new.openvpn + || old.fallback_dns != new.fallback_dns } /// The reply when a mutation cannot read the config file (missing or malformed). @@ -1270,6 +1425,11 @@ pub async fn run_state( mut rx: mpsc::Receiver, mut shutdown: oneshot::Receiver>, ) { + // Reconcile any orphaned persisted state left by a previous unclean exit + // (e.g. a global-revert backend's default-DNS demote snapshot) before arming + // the watch — the initial Down sample is suppressed, so this is the only path + // that clears it when the VPN is already down at startup. + machine.cleanup_orphaned_state_on_startup().await; // Arm the VPN watch once here, before the command loop, so all watch // lifecycle (start at boot, re-arm on config change) lives in one owner — // the actor — rather than being split with `run_async`. @@ -1330,6 +1490,9 @@ mod tests { #[derive(Default)] struct MockBackend { applies: Mutex)>>, + /// The `demote_target` carried by each apply, in order — so tests can + /// assert the state machine folded in the `fallback_dns` override. + applied_demote_targets: Mutex>>>, reverts: Mutex>, fail_apply: AtomicBool, fail_revert: AtomicBool, @@ -1401,6 +1564,10 @@ mod tests { .lock() .unwrap() .push((info.interface_name.clone(), domains.to_vec())); + self.applied_demote_targets + .lock() + .unwrap() + .push(info.demote_target.clone()); Ok(()) } @@ -1529,6 +1696,7 @@ mod tests { Ok(VpnInfo { interface_name: interface.to_string(), dns_servers: vec!["10.0.0.1".to_string()], + demote_target: None, }) } @@ -1563,6 +1731,7 @@ mod tests { .send(VpnEvent::Up(VpnInfo { interface_name: iface, dns_servers: vec!["10.0.0.1".to_string()], + demote_target: None, })) .await; // Then release on receiver drop, like NM/OpenVPN's `tx.closed()`. @@ -1712,6 +1881,7 @@ mod tests { enabled, vpn_backend: config::VpnBackend::default(), openvpn: config::OpenVpnConfig::default(), + fallback_dns: None, } } @@ -1719,6 +1889,17 @@ mod tests { VpnEvent::Up(VpnInfo { interface_name: interface.to_string(), dns_servers: vec!["10.0.0.1".to_string()], + demote_target: None, + }) + } + + /// A macOS-style Up: the detector reports a demote-target (the physical DHCP + /// resolver) alongside the corp DNS. `interface_name` is advisory there. + fn vpn_up_macos(interface: &str, demote_target: &[&str]) -> VpnEvent { + VpnEvent::Up(VpnInfo { + interface_name: interface.to_string(), + dns_servers: vec!["192.0.2.53".to_string()], + demote_target: Some(demote_target.iter().map(|s| s.to_string()).collect()), }) } @@ -1787,6 +1968,122 @@ mod tests { assert!(sm.applied.is_none()); } + #[tokio::test] + async fn startup_reconciles_orphaned_demote_for_a_global_revert_backend() { + // A macOS-style backend persists its default-DNS demote across an unclean + // exit, so the daemon reverts any orphaned state once at startup — a + // suppressed initial Down would otherwise never clear it. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + let mut sm = machine( + backend.clone(), + config(true, &["a.com"]), + "startup-cleanup-global", + ); + + sm.cleanup_orphaned_state_on_startup().await; + + assert_eq!( + backend.reverts.lock().unwrap().as_slice(), + &["wg0"], + "the global-revert backend is reconciled once at boot" + ); + assert!( + sm.applied.is_none(), + "cleanup records nothing as applied — it only clears a prior process's state" + ); + assert!( + !sm.pending_global_cleanup, + "a successful cleanup leaves nothing pending" + ); + } + + #[tokio::test] + async fn startup_skips_cleanup_for_a_per_interface_backend() { + // A per-interface backend (Linux) keeps no cross-restart persisted state of + // this kind, so startup must not issue a spurious revert. + let backend = Arc::new(MockBackend::default()); // reverts_globally == false + let mut sm = machine( + backend.clone(), + config(true, &["a.com"]), + "startup-cleanup-skip", + ); + + sm.cleanup_orphaned_state_on_startup().await; + + assert!( + backend.reverts.lock().unwrap().is_empty(), + "a per-interface backend is not reverted at boot" + ); + } + + #[tokio::test] + async fn failed_startup_cleanup_is_retried_on_a_later_reconcile() { + // P2: the startup cleanup fails transiently. Because `applied` is None an + // ordinary reconcile's revert() would be a no-op — but the pending flag + // makes the next revert retry the global cleanup until it succeeds, so an + // orphaned demote is not stranded until a full VPN up→down cycle. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + backend.set_fail_revert(true); // the startup cleanup fails + let mut sm = machine( + backend.clone(), + config(true, &["a.com"]), + "startup-cleanup-retry", + ); + + sm.cleanup_orphaned_state_on_startup().await; + assert!( + sm.pending_global_cleanup, + "a failed cleanup is recorded as pending" + ); + assert_eq!( + backend.reverts.lock().unwrap().len(), + 1, + "the failing attempt still issued a revert" + ); + + // A later reconcile (VPN still down, nothing applied) retries the cleanup, + // which now succeeds and clears the pending flag. + backend.set_fail_revert(false); + sm.reconcile_primary().await.unwrap(); + assert!( + !sm.pending_global_cleanup, + "a successful retry clears the pending flag" + ); + assert_eq!( + backend.reverts.lock().unwrap().len(), + 2, + "the cleanup is retried until it succeeds" + ); + } + + #[tokio::test] + async fn status_reports_apply_failed_while_a_global_cleanup_is_pending() { + // P2: a pending macOS demote-cleanup (a startup reconcile that failed) means + // a stale default DNS may still be installed, so status must read + // ApplyFailed — not VpnDown — even though nothing is applied and the VPN is + // down at restart. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + backend.set_fail_revert(true); + let mut sm = machine( + backend.clone(), + config(true, &["a.com"]), + "status-pending-cleanup", + ); + sm.cleanup_orphaned_state_on_startup().await; // fails → pending set + assert!(sm.pending_global_cleanup); + assert!(!sm.vpn_up, "VPN is down at restart"); + assert!(sm.applied.is_none()); + + assert_eq!( + sm.routing_state(), + RoutingState::ApplyFailed, + "a pending demote-cleanup is out-of-sync, not a clean VpnDown" + ); + } + #[tokio::test] async fn down_reverts() { let backend = Arc::new(MockBackend::default()); @@ -1877,6 +2174,7 @@ mod tests { sm.on_event(VpnEvent::Up(VpnInfo { interface_name: "wg0".to_string(), dns_servers: vec!["10.9.9.9".to_string()], + demote_target: None, })) .await; @@ -1974,6 +2272,7 @@ mod tests { sm.on_event(VpnEvent::Up(VpnInfo { interface_name: "wg0".to_string(), dns_servers: Vec::new(), + demote_target: None, })) .await; @@ -2005,6 +2304,7 @@ mod tests { sm.on_event(VpnEvent::Up(VpnInfo { interface_name: "wg0".to_string(), dns_servers: Vec::new(), + demote_target: None, })) .await; @@ -2045,6 +2345,7 @@ mod tests { enabled: true, vpn_backend: config::VpnBackend::default(), openvpn: config::OpenVpnConfig::default(), + fallback_dns: None, }; sm.config_store.save(&new_cfg).unwrap(); let resp = sm.on_request(Request::ReloadConfig).await; @@ -2169,6 +2470,59 @@ mod tests { assert!(sm.applied.is_some()); } + #[tokio::test] + async fn shutdown_runs_a_pending_global_cleanup_even_with_nothing_applied() { + // P2: a macOS startup cleanup failed (pending) and nothing is applied. If + // the daemon is stopped before any reconcile retries it, shutdown must + // still run the global cleanup rather than reporting clean while a stale + // demote snapshot / default DNS remains. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + backend.set_fail_revert(true); + let mut sm = machine( + backend.clone(), + config(true, &["a.com"]), + "shutdown-pending-cleanup", + ); + sm.cleanup_orphaned_state_on_startup().await; // fails → pending set + assert!(sm.pending_global_cleanup); + backend.reverts.lock().unwrap().clear(); + + backend.set_fail_revert(false); // the retry at shutdown succeeds + let clean = sm.shutdown().await; + assert!(clean, "shutdown is clean once the pending cleanup succeeds"); + assert!(!sm.pending_global_cleanup, "the pending cleanup is cleared"); + assert_eq!( + backend.reverts.lock().unwrap().as_slice(), + &["wg0"], + "shutdown ran the global cleanup" + ); + } + + #[tokio::test] + async fn shutdown_reports_unclean_when_a_pending_cleanup_still_fails() { + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + backend.set_fail_revert(true); + let mut sm = machine( + backend.clone(), + config(true, &["a.com"]), + "shutdown-pending-fail", + ); + sm.cleanup_orphaned_state_on_startup().await; + assert!(sm.pending_global_cleanup); + + let clean = sm.shutdown().await; // fail_revert still true + assert!( + !clean, + "shutdown reports unclean while a demote snapshot may remain" + ); + assert!( + sm.pending_global_cleanup, + "the pending flag stays set for a future retry" + ); + } + #[tokio::test] async fn repeated_disable_retries_a_failed_revert() { let backend = Arc::new(MockBackend { @@ -2496,6 +2850,16 @@ mod tests { let mut ovpn = base.clone(); ovpn.openvpn.management = "127.0.0.1:7505".to_string(); assert!(watch_settings_changed(&base, &ovpn)); + + // A `fallback_dns` change also re-arms: the re-arm's revert + re-sample is + // what restores the real prior DHCP resolver and re-demotes to it after the + // override is changed/removed (the detector reads the demoted value back, so + // a plain reconcile would otherwise stay pinned to the old fallback). + let mut fallback = base.clone(); + fallback.fallback_dns = Some(vec!["203.0.113.9".to_string()]); + assert!(watch_settings_changed(&base, &fallback)); + // ...and removing it again is a change too. + assert!(watch_settings_changed(&fallback, &base)); } #[tokio::test] @@ -2533,6 +2897,7 @@ mod tests { sm.on_event(VpnEvent::Up(VpnInfo { interface_name: "wg0".to_string(), dns_servers: Vec::new(), + demote_target: None, })) .await; assert_eq!(sm.routing_state(), RoutingState::NoDnsFromVpn); @@ -3000,6 +3365,140 @@ mod tests { ); } + // --- macOS DNS-privacy path: gate decoupled from vpn_name, demote folding - + + #[tokio::test] + async fn global_revert_backend_applies_regardless_of_interface_name() { + // On macOS the VPN's interface is not a stable, DNS-scoped link, so the + // configured `vpn_name` must NOT gate apply. A global-revert backend with + // an Up whose advisory interface_name differs from `vpn_name` must still + // apply (driven by the DNS-model signal the detector already decided). + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + // config.vpn_name is "wg0" (the helper default), but the macOS Up carries + // an unrelated advisory interface name. + let mut sm = machine( + backend.clone(), + config(true, &["corp.example.com"]), + "macos-gate", + ); + + sm.on_event(vpn_up_macos("utun7", &["198.51.100.1"])).await; + assert!( + sm.applied.is_some(), + "macOS apply must not be gated on interface_name == vpn_name" + ); + assert_eq!( + backend.applies.lock().unwrap().len(), + 1, + "the corp domains are applied despite the interface-name mismatch" + ); + } + + #[tokio::test] + async fn linux_backend_still_gates_on_interface_name() { + // The Linux contract is unchanged: a per-interface backend (default, + // reverts_globally=false) still requires the Up's interface to match + // `vpn_name`, so a mismatched interface does NOT apply. + let backend = Arc::new(MockBackend::default()); // reverts_globally = false + let mut sm = machine(backend.clone(), config(true, &["a.com"]), "linux-gate"); + // config.vpn_name == "wg0"; an Up on a different interface must not apply. + sm.on_event(vpn_up("wg1")).await; + assert!( + sm.applied.is_none(), + "Linux must still gate apply on interface_name == vpn_name" + ); + } + + #[tokio::test] + async fn macos_apply_uses_the_detector_demote_target_without_a_config_override() { + // With no `fallback_dns` override, the demote-target handed to the backend + // is the detector's (the physical DHCP resolver). + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + let mut sm = machine( + backend.clone(), + config(true, &["corp.example.com"]), + "macos-demote", + ); + + sm.on_event(vpn_up_macos("utun7", &["198.51.100.1"])).await; + assert_eq!( + backend.applied_demote_targets.lock().unwrap().as_slice(), + &[Some(vec!["198.51.100.1".to_string()])], + "without an override, the detector's demote-target is used" + ); + } + + #[tokio::test] + async fn macos_apply_folds_in_the_fallback_dns_config_override() { + // A configured `fallback_dns` override replaces the detector's + // demote-target in the VpnInfo handed to the backend. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + let mut cfg = config(true, &["corp.example.com"]); + cfg.fallback_dns = Some(vec!["203.0.113.9".to_string()]); + let mut sm = machine(backend.clone(), cfg, "macos-override"); + + // The detector still reports 198.51.100.1, but the config override wins. + sm.on_event(vpn_up_macos("utun7", &["198.51.100.1"])).await; + assert_eq!( + backend.applied_demote_targets.lock().unwrap().as_slice(), + &[Some(vec!["203.0.113.9".to_string()])], + "the fallback_dns override must replace the detector's demote-target" + ); + } + + #[tokio::test] + async fn macos_apply_ignores_a_non_ip_fallback_dns_override() { + // A `fallback_dns` override with a non-IP entry must NOT reach the backend: + // it would be interpolated verbatim into the root `scutil` demote script. It + // is rejected as a whole and the detector's demote-target is used instead. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + let mut cfg = config(true, &["corp.example.com"]); + cfg.fallback_dns = Some(vec!["not-an-ip".to_string()]); + let mut sm = machine(backend.clone(), cfg, "macos-bad-override"); + + sm.on_event(vpn_up_macos("utun7", &["198.51.100.1"])).await; + assert_eq!( + backend.applied_demote_targets.lock().unwrap().as_slice(), + &[Some(vec!["198.51.100.1".to_string()])], + "a non-IP fallback_dns override is ignored; the detector's demote-target is used" + ); + } + + #[tokio::test] + async fn macos_reapplies_on_a_redetect_after_the_default_was_reasserted() { + // Reconcile-on-event: if the VPN re-asserts its default after a network + // change, the detector re-emits Up (the watch fires on the DNS change). + // A fresh Up must re-drive apply (the backend re-demotes), bounded — one + // re-apply per event, no busy loop (this is purely event-driven; there is + // no timer here). We model two Up events and assert two applies. + let backend = Arc::new(MockBackend::default()); + backend.set_reverts_globally(true); + let mut sm = machine( + backend.clone(), + config(true, &["corp.example.com"]), + "macos-reapply", + ); + + sm.on_event(vpn_up_macos("utun7", &["198.51.100.1"])).await; + // A second Up with a CHANGED demote-target (e.g. Wi-Fi switched, new DHCP + // resolver) — the dedup upstream only forwards genuine changes, so the + // state machine sees a distinct event and re-applies with the new target. + sm.on_event(vpn_up_macos("utun7", &["198.51.100.9"])).await; + let targets = backend.applied_demote_targets.lock().unwrap(); + assert_eq!( + targets.as_slice(), + &[ + Some(vec!["198.51.100.1".to_string()]), + Some(vec!["198.51.100.9".to_string()]), + ], + "a re-detected change re-applies the demote with the new target" + ); + } + #[tokio::test] async fn watch_stream_ending_on_its_own_marks_detector_error() { // Regression for the review: `watch()` returning Ok does not mean the @@ -3637,6 +4136,7 @@ mod tests { enabled: true, vpn_backend: config::VpnBackend::default(), openvpn: config::OpenVpnConfig::default(), + fallback_dns: None, }; let sm = machine(backend.clone(), cfg, "verify-no-iface"); diff --git a/splitway-daemon/src/detector/linux/detector.rs b/splitway-daemon/src/detector/linux/detector.rs index ed6b1b9..768be8c 100644 --- a/splitway-daemon/src/detector/linux/detector.rs +++ b/splitway-daemon/src/detector/linux/detector.rs @@ -232,6 +232,9 @@ impl VpnDetector for LinuxDetector { Ok(VpnInfo { interface_name: interface.to_string(), dns_servers, + // Linux scopes DNS per link (systemd-resolved); the system default + // is never hijacked, so no off-tunnel demote is needed. + demote_target: None, }) } diff --git a/splitway-daemon/src/detector/linux/openvpn/detector.rs b/splitway-daemon/src/detector/linux/openvpn/detector.rs index eeb6ab0..b3d0028 100644 --- a/splitway-daemon/src/detector/linux/openvpn/detector.rs +++ b/splitway-daemon/src/detector/linux/openvpn/detector.rs @@ -20,6 +20,8 @@ impl VpnDetector for OpenVpnDetector { Ok(VpnInfo { interface_name: interface.to_string(), dns_servers, + // Linux scopes DNS per link; the default is not hijacked. + demote_target: None, }) } diff --git a/splitway-daemon/src/detector/linux/openvpn/mgmt.rs b/splitway-daemon/src/detector/linux/openvpn/mgmt.rs index 0a7fdcc..f33f452 100644 --- a/splitway-daemon/src/detector/linux/openvpn/mgmt.rs +++ b/splitway-daemon/src/detector/linux/openvpn/mgmt.rs @@ -401,6 +401,8 @@ async fn handle_line( VpnEvent::Up(VpnInfo { interface_name: interface.to_string(), dns_servers: last_dns.clone(), + // Linux scopes DNS per link; the default is not hijacked. + demote_target: None, }) } Transition::Down => { diff --git a/splitway-daemon/src/detector/macos/detector.rs b/splitway-daemon/src/detector/macos/detector.rs index 31f07e3..becf3af 100644 --- a/splitway-daemon/src/detector/macos/detector.rs +++ b/splitway-daemon/src/detector/macos/detector.rs @@ -1,22 +1,39 @@ -//! Wires the macOS parser/watch to the [`VpnDetector`] trait. +//! Wires the macOS structural DNS-model detection to the [`VpnDetector`] trait. +//! +//! The I/O here is deliberately thin: it shells `scutil` to dump the +//! SystemConfiguration dynamic-store keys, hands the raw text to the pure +//! [`parser`](super::parser) module, and runs the pure structural decision. All +//! the logic (what counts as "VPN up", which resolver is the demote-target) is +//! in the parser and unit-tested without a live system. use std::process::Command; use splitway_shared::platform::{PlatformError, VpnDetector, VpnEvent, VpnInfo}; -use super::parser::parse_scutil_dns; +use super::parser::{ + decide, parse_array_field, parse_scalar_field, Detected, DnsModel, ServiceDns, +}; use super::MacosDetector; impl VpnDetector for MacosDetector { + /// One-shot detection. The `interface` argument is **advisory on macOS**: + /// detection is driven by the system DNS model (which resolver is the + /// default vs. the physical interface's own), not by an interface name — + /// the active VPN tunnel is not reliably a named, DNS-scoped link here. fn detect(&self, interface: &str) -> Result { - let dns_servers = current_dns(interface)?; - if dns_servers.is_empty() { - return Err(PlatformError::VpnNotFound(interface.to_string())); + match current_vpn_state()? { + Detected::Up { + corp_dns, + demote_target, + } => Ok(VpnInfo { + // Advisory only — nothing keys on it on macOS. Report the + // configured name so status output stays recognisable. + interface_name: interface.to_string(), + dns_servers: corp_dns, + demote_target: Some(demote_target), + }), + Detected::Down => Err(PlatformError::VpnNotFound(interface.to_string())), } - Ok(VpnInfo { - interface_name: interface.to_string(), - dns_servers, - }) } fn watch( @@ -27,18 +44,217 @@ impl VpnDetector for MacosDetector { } } -/// Read the DNS servers macOS currently associates with `interface` via -/// `scutil --dns`. Returns an empty vec (not an error) when the interface has -/// no resolver, so the watcher can treat that as "down". Shared by `detect` -/// and the SCDynamicStore callback. -pub(super) fn current_dns(interface: &str) -> Result, PlatformError> { - let output = Command::new("scutil").arg("--dns").output()?; +/// Read the system DNS model from the dynamic store and run the structural +/// up/down decision. Shared by [`detect`](MacosDetector::detect) and the +/// SCDynamicStore watch callback. +/// +/// `Detected::Down` (not an error) means no VPN-imposed default — the watcher +/// treats it as "down". A genuine `scutil` failure is an `Err` the caller may +/// retry; "no override" is a normal, successful "nothing here". +pub(super) fn current_vpn_state() -> Result { + let model = read_dns_model()?; + Ok(decide(&model)) +} + +/// Assemble the [`DnsModel`] from the dynamic store via `scutil`: the primary +/// interface plus every network service's DNS entry. Detection reads the +/// **per-service** entries (not the mutable global default), so Splitway's own +/// demote of the physical service does not make detection flip to "down". +fn read_dns_model() -> Result { + let global_ipv4_dump = scutil_show("State:/Network/Global/IPv4")?; + let primary_interface = parse_scalar_field(&global_ipv4_dump, "PrimaryInterface"); + let primary_service = parse_scalar_field(&global_ipv4_dump, "PrimaryService"); + + // Enumerate every per-service DNS key and read its id + InterfaceName + servers. + // `scutil`'s `list` does NOT print bare keys: each match is a prefixed row + // (`subKey [] = State:/.../DNS`), so the key must be parsed out of the row + // before it is shown — otherwise every `show` is fed the whole row, returns + // `No such key`, and the model stays empty (detection always "down"). + let listing = scutil_list("State:/Network/Service/.*/DNS")?; + let mut services = Vec::new(); + for row in listing.lines() { + let key = parse_list_key(row); + // Skip blank/header rows and anything outside the listed pattern. + if !key.starts_with("State:/Network/Service/") { + continue; + } + // Propagate a real `scutil` failure rather than skipping the service: a + // missing/vanished key is reported as `No such key` on stdout with exit 0 + // (→ `Ok`, then empty servers below), so an `Err` here is a genuine command + // failure (spawn error or non-zero exit). Skipping it would silently drop a + // live service from the model — e.g. the VPN service, read after a + // successful `list` — and `decide` could then conclude "down" and revert the + // rules. Failing the read instead makes the watcher keep the last known + // state until the next change. + let dump = scutil_show(key)?; + let servers = parse_array_field(&dump, "ServerAddresses"); + if servers.is_empty() { + continue; // a service with no DNS (incl. the `No such key` case) contributes nothing + } + let service_id = service_id_from_key(key); + // The interface binding is read from the service's IPv4/IPv6 entity, NOT + // the DNS dict — the DNS schema is the DNS fields (ServerAddresses, …) and + // does not reliably carry InterfaceName. Reading it only from the DNS dict + // would leave a secondary Wi-Fi/Ethernet service's interface unknown, and + // `decide` would then misread that `None` as an unscoped/default hijacker + // (a false "VPN up"). Fall back to the DNS dict only if neither entity + // names an interface. + let interface_name = read_service_interface(&service_id) + .or_else(|| parse_scalar_field(&dump, "InterfaceName")); + services.push(ServiceDns { + service_id, + interface_name, + servers, + }); + } + + Ok(DnsModel { + primary_interface, + primary_service, + services, + }) +} + +/// Read the BSD interface a service is bound to, from its IPv4 (then IPv6) +/// dynamic-store entity — the reliable source for the binding. The per-service +/// DNS entity does not reliably carry `InterfaceName` (its schema is the DNS +/// fields), so reading the interface only from the DNS dict can leave a secondary +/// network's interface unknown, which `decide` would misclassify as an unscoped +/// (default-hijacker) resolver. Returns `None` only when neither entity names one. +fn read_service_interface(service_id: &str) -> Option { + for entity in ["IPv4", "IPv6"] { + let key = format!("State:/Network/Service/{service_id}/{entity}"); + if let Ok(dump) = scutil_show(&key) { + if let Some(iface) = parse_scalar_field(&dump, "InterfaceName") { + return Some(iface); + } + } + } + None +} + +/// Extract the dynamic-store key from one `scutil list` output row. `scutil` +/// prints each match as ` subKey [] = State:/Network/Service//DNS`, not +/// as a bare key, so the `subKey [n] = ` prefix must be stripped before the key +/// can be passed to `show` / [`service_id_from_key`]. A row without the ` = ` +/// marker is returned trimmed (defensive); the caller filters anything that is +/// not a service DNS key. +fn parse_list_key(row: &str) -> &str { + let row = row.trim(); + match row.rsplit_once(" = ") { + Some((_, key)) => key.trim(), + None => row, + } +} + +/// Extract the `` from a `State:/Network/Service//DNS` key, so it can be +/// matched against `PrimaryService`. Returns the whole key if it does not fit +/// the expected shape (so an unexpected key never silently collides with a real +/// service id). +fn service_id_from_key(key: &str) -> String { + key.strip_prefix("State:/Network/Service/") + .and_then(|rest| rest.strip_suffix("/DNS")) + .unwrap_or(key) + .to_string() +} + +/// Run `scutil` with a `show ` script over stdin and return the dump. A +/// missing key makes `scutil` print `No such key` (not a process failure); the +/// parser treats that as empty, so it is returned as-is rather than an error. +fn scutil_show(key: &str) -> Result { + scutil_script(&format!("show {key}\n")) +} + +/// Run `scutil` with a `list ` script and return the matching keys, +/// one per line. +fn scutil_list(pattern: &str) -> Result { + scutil_script(&format!("list {pattern}\n")) +} + +/// Drive `scutil` in script mode by piping `script` (terminated by an implicit +/// quit on EOF) to its stdin. Centralises the spawn + error mapping so the +/// individual readers stay one-liners and the whole I/O surface is one function. +fn scutil_script(script: &str) -> Result { + use std::io::Write; + use std::process::Stdio; + + let mut child = Command::new("scutil") + .stdin(Stdio::piped()) + .stdout(Stdio::piped()) + .stderr(Stdio::piped()) + .spawn() + .map_err(|e| PlatformError::CommandFailed(format!("failed to spawn scutil: {e}")))?; + + // Write the script, then drop stdin so scutil sees EOF and exits. + { + let stdin = child.stdin.take().ok_or_else(|| { + PlatformError::CommandFailed("scutil stdin was not captured".to_string()) + })?; + let mut stdin = stdin; + stdin + .write_all(script.as_bytes()) + .map_err(|e| PlatformError::CommandFailed(format!("writing scutil script: {e}")))?; + } + + let output = child + .wait_with_output() + .map_err(|e| PlatformError::CommandFailed(format!("waiting for scutil: {e}")))?; if !output.status.success() { return Err(PlatformError::CommandFailed(format!( - "scutil --dns failed: {}", + "scutil exited with {}: {}", + output.status, String::from_utf8_lossy(&output.stderr).trim() ))); } - let stdout = String::from_utf8_lossy(&output.stdout); - Ok(parse_scutil_dns(&stdout, interface)) + Ok(String::from_utf8_lossy(&output.stdout).into_owned()) +} + +#[cfg(test)] +mod tests { + use super::{parse_list_key, service_id_from_key}; + + #[test] + fn parse_list_key_strips_the_scutil_subkey_prefix() { + // The real `scutil list` row shape — the key must be parsed out, not shown + // verbatim (the bug: the whole row went to `show` → `No such key`). + assert_eq!( + parse_list_key(" subKey [0] = State:/Network/Service/ABC-123/DNS"), + "State:/Network/Service/ABC-123/DNS" + ); + assert_eq!( + parse_list_key("subKey [12] = State:/Network/Service/XYZ/DNS"), + "State:/Network/Service/XYZ/DNS" + ); + } + + #[test] + fn parse_list_key_passes_a_bare_key_through() { + // Defensive: a row already in bare-key form is returned trimmed. + assert_eq!( + parse_list_key(" State:/Network/Service/ABC/DNS "), + "State:/Network/Service/ABC/DNS" + ); + } + + #[test] + fn service_id_from_key_extracts_the_id() { + assert_eq!( + service_id_from_key("State:/Network/Service/ABC-123/DNS"), + "ABC-123" + ); + } + + #[test] + fn service_id_from_key_returns_whole_key_when_shape_is_unexpected() { + // An unexpected key shape is returned verbatim, so it never collides with + // a real service id. + assert_eq!( + service_id_from_key("Setup:/Network/Foo"), + "Setup:/Network/Foo" + ); + assert_eq!( + service_id_from_key("State:/Network/Service/ABC"), + "State:/Network/Service/ABC" + ); + } } diff --git a/splitway-daemon/src/detector/macos/mod.rs b/splitway-daemon/src/detector/macos/mod.rs index cb960c1..ac34ed3 100644 --- a/splitway-daemon/src/detector/macos/mod.rs +++ b/splitway-daemon/src/detector/macos/mod.rs @@ -1,13 +1,21 @@ -//! macOS VPN detector: reads per-interface DNS from `scutil --dns` and watches -//! for changes via SCDynamicStore. +//! macOS VPN detector: reads the SystemConfiguration DNS model from the dynamic +//! store (via `scutil`) and decides VPN up/down **structurally** — is the system +//! default resolver overridden by a non-physical (VPN) service? — rather than by +//! an interface name. See [`parser`] for the rationale (a global-DNS-hijack VPN +//! client scopes no resolver to any `utun`, so an interface-keyed read is blind). //! -//! Mirrors the Linux split: [`parser`] (pure `scutil` parsing) and [`state`] -//! (pure up/down mapping + dedup) are unit-tested; [`watch`] is the thin Core -//! Foundation plumbing, and [`detector`] wires them to the `VpnDetector` trait. +//! Mirrors the Linux split: [`parser`] (pure dynamic-store parsing + the up/down +//! decision) and [`state`] (pure dedup + the initial-sample retry policy) are +//! unit-tested; [`watch`] is the thin Core Foundation plumbing, and [`detector`] +//! wires them to the `VpnDetector` trait and owns the `scutil` I/O. mod detector; mod parser; mod state; mod watch; +/// Re-export the dynamic-store dump parsers for the macOS DNS backend's demote, +/// which reads the same `scutil` dump shape (see [`crate::detector`]). +pub(crate) use parser::{parse_array_field, parse_scalar_field}; + pub struct MacosDetector; diff --git a/splitway-daemon/src/detector/macos/parser.rs b/splitway-daemon/src/detector/macos/parser.rs index f64e701..0d063b9 100644 --- a/splitway-daemon/src/detector/macos/parser.rs +++ b/splitway-daemon/src/detector/macos/parser.rs @@ -1,179 +1,759 @@ -//! Pure parsing of `scutil --dns` output — no I/O, unit tested. +//! Pure parsing of the macOS SystemConfiguration DNS model — no I/O, unit +//! tested. This is the structural, vendor-neutral core of macOS VPN detection. +//! +//! # Why not `scutil --dns`, and why per-service (not the global default) +//! +//! The original detector filtered `scutil --dns` by a chosen `utun*` interface. +//! That fails against a VPN client that hijacks the system **default** resolver +//! instead of scoping its DNS to the tunnel: there is then *no* resolver scoped +//! to any `utun` (the active tunnel `utun` index even varies between sessions), +//! so an interface-keyed read finds nothing. We instead read the SystemConfig +//! dynamic store directly and decide structurally over the **per-service** DNS: +//! +//! - `State:/Network/Global/IPv4` `PrimaryInterface` / `PrimaryService` — the +//! physical primary interface (e.g. `en0`) and its service id. +//! - each `State:/Network/Service//DNS` — that service's `ServerAddresses` +//! and `InterfaceName`. +//! +//! The **physical service** is the one whose id is the primary service (falling +//! back to the primary interface name); its resolver is the demote-target. A +//! **VPN service** is any *other* service that (a) carries DNS differing from the +//! physical resolver AND (b) is plausibly the **default-resolver hijacker** — +//! i.e. it rides the primary interface's own default route, runs on a tunnel +//! pseudo-interface (`utun`/`ppp`/`ipsec`/…), or reports no interface (an +//! unscoped/default resolver). A service bound to a *distinct hardware* interface +//! (a second Ethernet/Wi-Fi/cellular link while another is primary) is a parallel +//! physical network, not a hijacker: its differing DHCP resolver must **not** read +//! as "VPN up". When such a hijacker service exists → **VPN is up** and its +//! resolver is the corp DNS. The decision keys on this structural *difference* and +//! on interface *kind* (the stable BSD driver prefix), never on any vendor/product +//! string, so it generalises across VPN clients. +//! +//! Detection deliberately does **not** read `State:/Network/Global/DNS`: Splitway's +//! own demote overwrites the physical service's DNS, which can change the global +//! default — so a Global-keyed detector would flip to "down" the instant our +//! demote took effect, then revert → the VPN re-asserts → re-demote → oscillation. +//! Reading the VPN's corp DNS from its *own* service entry (which our demote does +//! not touch) keeps detection stable while the demote is in effect. -/// Collect the DNS servers `scutil --dns` reports for `interface` (e.g. -/// `utun3`), in order of first appearance and de-duplicated. Returns an empty -/// vec when the interface has no resolver entry (the VPN is down). +/// One network service's DNS entry, as read from `State:/Network/Service//DNS`. +#[derive(Debug, Clone, PartialEq, Eq)] +pub(super) struct ServiceDns { + /// The service id (the `` in the `State:/Network/Service//DNS` key). + /// The authoritative anchor for the *physical* service: it equals the + /// `PrimaryService` from `State:/Network/Global/IPv4`. Preferred over the + /// interface name, which a VPN service can also report. + pub service_id: String, + /// The `InterfaceName` the service is bound to (e.g. `en0`), if present. + /// The fallback anchor for the physical service when the primary service id + /// is unknown; a VPN service is otherwise identified structurally, not by + /// name. + pub interface_name: Option, + /// The service's `ServerAddresses`. + pub servers: Vec, +} + +/// The structural DNS picture read from the dynamic store, before the up/down +/// decision. All fields are plain parsed data — the decision in [`decide`] is a +/// separate pure step so both halves are independently testable. /// -/// `scutil --dns` groups settings into `resolver #N` blocks; a block's owning -/// interface is named in its `if_index : N (iface)` line, and its servers in -/// `nameserver[i] : addr` lines. The interface's resolver typically appears in -/// both the main and the "scoped queries" sections, so duplicates are expected -/// and dropped. -pub(super) fn parse_scutil_dns(output: &str, interface: &str) -> Vec { - let mut servers: Vec = Vec::new(); - let mut block: Vec = Vec::new(); - let mut block_iface: Option = None; - - for line in output.lines() { - let line = line.trim(); - if line.starts_with("resolver #") { - commit(&mut servers, &mut block, &block_iface, interface); - block_iface = None; - } else if let Some(rest) = line.strip_prefix("nameserver[") { - // `nameserver[0] : 10.0.0.1` — require the `[` so only real - // `nameserver[N]` keys match, never some other `nameserver*` key. - if let Some((_, value)) = rest.split_once(':') { - let ip = value.trim(); - if !ip.is_empty() { - block.push(ip.to_string()); +/// Detection is driven by the **per-service** entries, NOT by +/// `State:/Network/Global/DNS`. That is deliberate: Splitway's own *demote* +/// overwrites the primary (physical) service's DNS, which can change the global +/// default — so a detector keyed on Global would flip to "down" the moment our +/// demote took effect (Global == physical), then revert → the VPN re-asserts → +/// re-demote → oscillation. Reading the VPN's corp DNS from its *own* service +/// entry (which our demote does not touch) keeps detection stable while the +/// demote is in effect. +#[derive(Debug, Clone, Default, PartialEq, Eq)] +pub(super) struct DnsModel { + /// `State:/Network/Global/IPv4` `PrimaryInterface` (e.g. `en0`). `None` if + /// the key or field is absent (no primary network — effectively offline). + pub primary_interface: Option, + /// `State:/Network/Global/IPv4` `PrimaryService` (the primary service id). + /// The authoritative anchor for the physical service in [`decide`]; a VPN + /// service can also report the primary interface name, so the service id is + /// preferred and the interface name is only a fallback. + pub primary_service: Option, + /// Every network service's DNS entry (from the per-service DNS keys). The + /// physical service is the one whose `service_id` is the primary service + /// (or, failing that, whose `interface_name` is the primary interface); a + /// VPN service is one whose DNS differs from the physical service's. + pub services: Vec, +} + +/// The detector's verdict over a [`DnsModel`]. +#[derive(Debug, Clone, PartialEq, Eq)] +pub(super) enum Detected { + /// A non-physical (VPN) service is in play → VPN up. `corp_dns` is that + /// service's resolver (queries for the corp domains go here, on-tunnel); + /// `demote_target` is where non-corp DNS must be sent so it resolves + /// off-tunnel (the physical interface's own DHCP resolver). + Up { + corp_dns: Vec, + demote_target: Vec, + }, + /// No non-physical service (or no primary network) → no VPN. Nothing to do. + Down, +} + +/// Parse one `scutil` dictionary dump (the text emitted by `show ` / +/// `get ` + `d.show`) and return the values of a named array key. +/// +/// The dump form (placeholder values): +/// ```text +/// { +/// ServerAddresses : { +/// 0 : 192.0.2.53 +/// 1 : 192.0.2.54 +/// } +/// __CONFIGURATION_ID__ : Default: ... +/// PrimaryInterface : en0 +/// } +/// ``` +/// Returns the array elements in order (here for `ServerAddresses`: +/// `["192.0.2.53", "192.0.2.54"]`). An absent key yields an empty vec. +pub(crate) fn parse_array_field(dump: &str, field: &str) -> Vec { + let mut values = Vec::new(); + let mut in_array = false; + // The `` opener line for the field we want, e.g. + // `ServerAddresses : {`. + let opener = format!("{field} :"); + for raw in dump.lines() { + let line = raw.trim(); + if !in_array { + // Enter the array only for an exact ` : {` opener, so + // a same-prefixed field (e.g. `ServerAddressesV6`) is not matched. + if line.starts_with(&opener) && line.contains("") && line.ends_with('{') { + in_array = true; + } + continue; + } + if line == "}" { + break; // end of the array + } + // Array element line: ` : `. Keep the value verbatim + // (IPv6 with `::` and `%zone` stay intact via split_once on the FIRST + // colon only). + if let Some((idx, value)) = line.split_once(':') { + if idx.trim().chars().all(|c| c.is_ascii_digit()) && !idx.trim().is_empty() { + let v = value.trim(); + if !v.is_empty() { + values.push(v.to_string()); } } - } else if let Some(rest) = line.strip_prefix("if_index") { - // `if_index : 14 (utun3)` — take the name inside the parentheses. - if let Some(name) = rest.split('(').nth(1).and_then(|s| s.split(')').next()) { - block_iface = Some(name.trim().to_string()); + } + } + values +} + +/// Parse a scalar string field from a `scutil` dictionary dump, e.g. +/// `PrimaryInterface : en0` → `Some("en0")`. Returns `None` if absent. +pub(crate) fn parse_scalar_field(dump: &str, field: &str) -> Option { + let opener = format!("{field} :"); + for raw in dump.lines() { + let line = raw.trim(); + if let Some(rest) = line.strip_prefix(&opener) { + // A scalar line is `Field : value`; reject the array opener form so + // `Foo : {` is never read as the scalar string " {". + let v = rest.trim(); + if v.contains("") || v.contains("") { + return None; + } + if !v.is_empty() { + return Some(v.to_string()); } } } - commit(&mut servers, &mut block, &block_iface, interface); - servers + None } -/// Fold the just-parsed block into `servers` if it belongs to `interface`, -/// de-duplicating, then clear the block for the next one. -fn commit( - servers: &mut Vec, - block: &mut Vec, - block_iface: &Option, - interface: &str, -) { - if block_iface.as_deref() == Some(interface) { - for ns in block.iter() { - if !servers.contains(ns) { - servers.push(ns.clone()); +/// The structural decision: is a non-physical (VPN) service in play? Pure over a +/// fully-assembled [`DnsModel`], reading the **per-service** entries (not the +/// mutable global default — see [`DnsModel`]). +/// +/// The physical service is the one bound to the primary interface; its DNS is +/// the demote-target. A **VPN service** is any *other* service that both carries +/// DNS differing from the physical service's AND looks like the default-resolver +/// hijacker (see [`is_default_resolver_hijacker`]) — so a benign secondary +/// physical network is not mistaken for a VPN. VPN is **up** iff such a service +/// exists, and `corp_dns` is its resolver. +/// +/// Edge cases, all → [`Detected::Down`] (no false-positive apply): +/// - no primary interface (offline) — nothing to anchor on; +/// - no physical service DNS found — cannot determine the demote-target, and a +/// demote to nothing is worse than not applying, so stay conservative; +/// - the only services differing from the physical are distinct secondary +/// physical networks (not the hijacker) — no VPN. +pub(super) fn decide(model: &DnsModel) -> Detected { + // No primary network at all → nothing is in play. + // A primary network must exist to anchor on (offline → nothing in play). + if model.primary_interface.is_none() && model.primary_service.is_none() { + return Detected::Down; + } + + // The physical service, anchored authoritatively by the primary SERVICE id + // when known (a VPN service can also report the primary interface name, so + // the id is preferred), falling back to the primary interface name. Its + // resolver is the demote-target (where non-corp DNS goes off-tunnel). + let physical = match model.primary_service.as_deref() { + Some(id) => model + .services + .iter() + .find(|s| !s.servers.is_empty() && s.service_id == id), + // No authoritative PrimaryService id: anchor on the primary interface + // name — but only when EXACTLY ONE DNS-bearing service reports it. A VPN + // service can also report the primary interface, so if more than one + // matches we cannot tell which is physical; picking the first would risk + // selecting the VPN's corp resolver as "physical", inverting corp/fallback + // (apply corp domains to the off-tunnel DNS and demote the default to the + // corp DNS — a leak). Treat that as ambiguous and stay conservative. + None => { + let mut matches = model.services.iter().filter(|s| { + !s.servers.is_empty() + && s.interface_name.as_deref() == model.primary_interface.as_deref() + }); + match (matches.next(), matches.next()) { + (Some(only), None) => Some(only), + _ => None, // zero matches, or ambiguous (more than one) → Down below } } + }; + let Some(physical) = physical else { + // No unambiguous physical resolver → cannot pick a safe demote-target. + return Detected::Down; + }; + + // A VPN service: a service OTHER than the physical one, carrying DNS that + // differs from the physical resolver (a non-physical resolver is in play). + // Excluding the physical service by id means the comparison survives our own + // demote (which sets the physical service's DNS to the fallback) — the VPN's + // own service still differs — and never mistakes the physical service for a + // VPN. This signal is independent of the mutable global default. + // + // The differing service must ALSO be the default-resolver hijacker, not a + // distinct secondary physical network (e.g. Wi-Fi associated while Ethernet is + // primary, with its own DHCP DNS) — otherwise that benign secondary resolver + // would be a false "VPN up". The hijacker filter precedes the difference check + // so, with both a real VPN and a secondary network present, the VPN is still + // found regardless of service order. + let vpn_dns = model + .services + .iter() + .filter(|s| s.service_id != physical.service_id && !s.servers.is_empty()) + .filter(|s| is_default_resolver_hijacker(s, model.primary_interface.as_deref())) + .find(|s| !same_set(&s.servers, &physical.servers)) + .map(|s| &s.servers); + + match vpn_dns { + Some(corp_dns) => Detected::Up { + corp_dns: corp_dns.clone(), + demote_target: physical.servers.clone(), + }, + None => Detected::Down, + } +} + +/// Order-insensitive equality of two resolver lists (treated as sets). +fn same_set(a: &[String], b: &[String]) -> bool { + if a.len() != b.len() { + return false; + } + let mut a_sorted: Vec<&String> = a.iter().collect(); + let mut b_sorted: Vec<&String> = b.iter().collect(); + a_sorted.sort(); + b_sorted.sort(); + a_sorted == b_sorted +} + +/// macOS tunnel / virtual interface name prefixes. A global-default-hijack VPN +/// rides one of these pseudo-interfaces (or the primary interface itself); a +/// *secondary physical* network (Wi-Fi while Ethernet is primary, a second +/// Ethernet, cellular, a VM/Thunderbolt bridge, …) is bound to a distinct +/// hardware interface that is none of these. Matching by interface *kind* (the +/// stable BSD driver prefix) — never a vendor/product string — keeps the detector +/// vendor-neutral while still excluding parallel physical networks. +const TUNNEL_INTERFACE_PREFIXES: [&str; 5] = ["utun", "ppp", "ipsec", "tun", "tap"]; + +/// Whether an interface name is a tunnel / virtual pseudo-interface (a VPN +/// carrier) rather than a hardware link. +fn is_tunnel_interface(iface: &str) -> bool { + TUNNEL_INTERFACE_PREFIXES + .iter() + .any(|prefix| iface.starts_with(prefix)) +} + +/// Whether a (non-physical) service is plausibly the **default-resolver +/// hijacker** rather than a benign secondary physical network. +/// +/// A VPN that hijacks the system default resolver either rides the primary +/// interface's own default route (its service reports the *primary* interface), +/// runs on a tunnel pseudo-interface (`utun`/`ppp`/`ipsec`/…), or registers an +/// unscoped/default resolver (its service reports *no* interface). A service +/// bound to a *distinct hardware* interface is a parallel physical network — a +/// second Ethernet/Wi-Fi/cellular link — and its differing DHCP resolver must NOT +/// read as "VPN up". This signal is stable under Splitway's own demote (it does +/// not depend on the mutable global default). +fn is_default_resolver_hijacker(service: &ServiceDns, primary_interface: Option<&str>) -> bool { + match service.interface_name.as_deref() { + // No interface → an unscoped / default (global-path) resolver. + None => true, + // Rides the primary default route, or is a VPN tunnel pseudo-interface. + // Anything else is a distinct secondary physical link → not a hijacker. + Some(iface) => Some(iface) == primary_interface || is_tunnel_interface(iface), } - block.clear(); } #[cfg(test)] mod tests { - use super::parse_scutil_dns; - - const SCUTIL: &str = "\ -DNS configuration - -resolver #1 - search domain[0] : lan - nameserver[0] : 192.168.1.1 - if_index : 4 (en0) - flags : Request A records, Request AAAA records - reach : 0x00000002 (Reachable) - -resolver #2 - domain : corp.example.com - nameserver[0] : 10.0.0.1 - nameserver[1] : 10.0.0.2 - if_index : 14 (utun3) - flags : Request A records - order : 102400 - -DNS configuration (for scoped queries) - -resolver #1 - nameserver[0] : 192.168.1.1 - if_index : 4 (en0) - flags : Scoped, Request A records - -resolver #2 - nameserver[0] : 10.0.0.1 - nameserver[1] : 10.0.0.2 - if_index : 14 (utun3) - flags : Scoped, Request A records -"; - - #[test] - fn collects_vpn_interface_servers_deduped() { - // utun3 appears in both the main and scoped sections; dedup to one set, - // order preserved. + use super::*; + + // The breaking shape (synthetic placeholders): the corp DNS is the system + // DEFAULT (no domain / no if_index), and the only scoped resolver is the + // physical interface — there is NO utun-scoped resolver. Mirrors a real + // global-DNS-hijack VPN client. Values are RFC 5737 placeholders. + const GLOBAL_DNS: &str = "\ + { + ServerAddresses : { + 0 : 192.0.2.53 + } + __CONFIGURATION_ID__ : Default: vpn-service 0 + SearchOrder : 50000 +}"; + + const GLOBAL_IPV4: &str = "\ + { + PrimaryInterface : en0 + PrimaryService : 0AFE20D4-0000-0000-0000-PLACEHOLDER01 + Router : 198.51.100.1 +}"; + + // The physical interface's own DHCP resolver — the demote-target source. + const PHYSICAL_DNS: &str = "\ + { + ServerAddresses : { + 0 : 198.51.100.1 + } +}"; + + #[test] + fn parses_server_addresses_array() { + assert_eq!( + parse_array_field(GLOBAL_DNS, "ServerAddresses"), + vec!["192.0.2.53".to_string()] + ); + } + + #[test] + fn parses_multiple_server_addresses_in_order() { + let dump = "\ + { + ServerAddresses : { + 0 : 192.0.2.53 + 1 : 192.0.2.54 + } +}"; + assert_eq!( + parse_array_field(dump, "ServerAddresses"), + vec!["192.0.2.53".to_string(), "192.0.2.54".to_string()] + ); + } + + #[test] + fn array_field_absent_yields_empty() { + assert!(parse_array_field(GLOBAL_IPV4, "ServerAddresses").is_empty()); + assert!(parse_array_field("", "ServerAddresses").is_empty()); + } + + #[test] + fn same_prefixed_array_field_is_not_matched() { + // A `ServerAddressesV6` array must not be read as `ServerAddresses`. + let dump = "\ + { + ServerAddressesV6 : { + 0 : 2001:db8::1 + } +}"; + assert!(parse_array_field(dump, "ServerAddresses").is_empty()); + } + + #[test] + fn preserves_ipv6_values_intact() { + let dump = "\ + { + ServerAddresses : { + 0 : 2001:db8::1 + 1 : fe80::1%en0 + } +}"; + assert_eq!( + parse_array_field(dump, "ServerAddresses"), + vec!["2001:db8::1".to_string(), "fe80::1%en0".to_string()] + ); + } + + #[test] + fn parses_primary_interface_scalar() { + assert_eq!( + parse_scalar_field(GLOBAL_IPV4, "PrimaryInterface"), + Some("en0".to_string()) + ); + } + + #[test] + fn scalar_field_absent_yields_none() { + assert_eq!(parse_scalar_field(GLOBAL_DNS, "PrimaryInterface"), None); + assert_eq!(parse_scalar_field("", "PrimaryInterface"), None); + } + + #[test] + fn scalar_field_does_not_misread_an_array_opener() { + // `ServerAddresses : {` must not parse as the scalar " {". + assert_eq!(parse_scalar_field(GLOBAL_DNS, "ServerAddresses"), None); + } + + // --- the structural decision (per-service model) ------------------------- + + fn svc(id: &str, iface: Option<&str>, servers: &[&str]) -> ServiceDns { + ServiceDns { + service_id: id.to_string(), + interface_name: iface.map(str::to_string), + servers: servers.iter().map(|s| s.to_string()).collect(), + } + } + + /// Build a model from a primary interface, the primary service id, and the + /// service entries. + fn model( + primary_iface: Option<&str>, + primary_svc: Option<&str>, + services: Vec, + ) -> DnsModel { + DnsModel { + primary_interface: primary_iface.map(str::to_string), + primary_service: primary_svc.map(str::to_string), + services, + } + } + + #[test] + fn detects_up_when_a_service_differs_from_the_physical() { + // The breaking case: a (VPN) service carries corp DNS that differs from + // the physical en0 service's DHCP resolver. The physical service is the + // primary service (id "phys"). + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("phys", Some("en0"), &["198.51.100.1"]), // physical DHCP + svc("vpn", Some("en0"), &["192.0.2.53"]), // VPN's own service (corp) + ], + ); + assert_eq!( + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } + ); + } + + #[test] + fn detection_survives_our_own_demote_of_the_physical_service() { + // After we demote, the PHYSICAL service's DNS is overwritten to the + // fallback (== its own DHCP resolver here). Detection must still see the + // VPN via its separate service entry — NOT flip to Down — so there is no + // oscillation. The physical service still reads 198.51.100.1; the VPN + // service still reads 192.0.2.53 → still Up, same verdict. + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("phys", Some("en0"), &["198.51.100.1"]), // physical (== demoted value) + svc("vpn", Some("en0"), &["192.0.2.53"]), // VPN service unchanged + ], + ); assert_eq!( - parse_scutil_dns(SCUTIL, "utun3"), - vec!["10.0.0.1".to_string(), "10.0.0.2".to_string()] + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } ); } #[test] - fn collects_other_interface_independently() { + fn physical_is_anchored_by_service_id_even_if_a_vpn_service_shares_the_interface() { + // The deeper P1 concern: a VPN service also reports the primary interface + // name. Anchoring the physical service on the primary SERVICE id (not the + // interface name) means the VPN service — listed FIRST here — is not + // mistaken for physical, so corp/fallback are never swapped. + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("vpn", Some("en0"), &["192.0.2.53"]), // VPN, also on en0, listed first + svc("phys", Some("en0"), &["198.51.100.1"]), // the real physical service + ], + ); assert_eq!( - parse_scutil_dns(SCUTIL, "en0"), - vec!["192.168.1.1".to_string()] + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } + ); + } + + #[test] + fn falls_back_to_interface_name_when_primary_service_is_unknown() { + // If PrimaryService is absent, anchor on the primary interface name — but + // only when EXACTLY ONE DNS service reports it. Here the VPN is on a tunnel, + // so the physical en0 service is the unambiguous match. + let m = model( + Some("en0"), + None, + vec![ + svc("phys", Some("en0"), &["198.51.100.1"]), + svc("vpn", Some("utun4"), &["192.0.2.53"]), + ], + ); + assert_eq!( + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } + ); + } + + #[test] + fn ambiguous_primary_interface_fallback_is_conservative_down() { + // P2: PrimaryService is absent AND more than one DNS service reports the + // primary interface (a VPN service can also report it). We cannot tell which + // is physical, so picking the first risks selecting the VPN's corp resolver + // as "physical" — inverting corp/fallback (corp domains to the off-tunnel + // DNS, default demoted to corp DNS). Stay conservative: Down. + let m = model( + Some("en0"), + None, + vec![ + svc("vpn", Some("en0"), &["192.0.2.53"]), // VPN also on en0, listed first + svc("phys", Some("en0"), &["198.51.100.1"]), + ], + ); + assert_eq!(decide(&m), Detected::Down); + } + + #[test] + fn detects_down_when_only_the_physical_service_exists() { + // No VPN: the only service with DNS is the physical interface's own. + let m = model( + Some("en0"), + Some("phys"), + vec![svc("phys", Some("en0"), &["198.51.100.1"])], ); + assert_eq!(decide(&m), Detected::Down); } #[test] - fn absent_interface_yields_empty() { - assert!(parse_scutil_dns(SCUTIL, "utun9").is_empty()); - assert!(parse_scutil_dns("", "utun3").is_empty()); + fn detects_down_when_offline_no_primary() { + let m = model( + None, + None, + vec![ + svc("phys", Some("en0"), &["198.51.100.1"]), + svc("vpn", Some("en0"), &["192.0.2.53"]), + ], + ); + assert_eq!(decide(&m), Detected::Down); } #[test] - fn nameservers_without_matching_if_index_are_ignored() { - // A block with servers but no if_index line is attributed to no - // interface, so nothing is collected. - let output = "resolver #1\n nameserver[0] : 1.2.3.4\n"; - assert!(parse_scutil_dns(output, "utun3").is_empty()); + fn detects_down_when_no_physical_service_dns_found() { + // The physical service carries no DNS (cannot pick a safe demote-target), + // even though some other service has DNS → conservative Down. + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("phys", Some("en0"), &[]), // physical, no DNS + svc("other", Some("en0"), &["192.0.2.53"]), // some other service + ], + ); + assert_eq!(decide(&m), Detected::Down); + } + + #[test] + fn decision_is_order_insensitive() { + // A service whose DNS is the SAME set as physical (just reordered) is not + // a VPN → Down (no service differs from the physical resolver). + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("phys", Some("en0"), &["198.51.100.1", "198.51.100.2"]), + svc("other", Some("en0"), &["198.51.100.2", "198.51.100.1"]), + ], + ); + assert_eq!(decide(&m), Detected::Down); } #[test] - fn attributes_when_if_index_is_the_last_line_of_the_block() { - // The buffering invariant: nameservers parsed before the if_index line - // are still attributed to the interface named by that later line. - let output = "\ -resolver #1 - nameserver[0] : 10.0.0.1 - nameserver[1] : 10.0.0.2 - if_index : 14 (utun3) -"; + fn a_tunnel_vpn_is_detected_by_its_differing_dns_not_its_index() { + // No SPECIFIC utun index/name is keyed on: a tunnel VPN is recognised by + // its differing DNS, and its interface only has to be a tunnel *kind* + // (utun/ppp/ipsec/…) — never a particular name — to qualify as a hijacker. + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("phys", Some("en0"), &["198.51.100.1"]), + // Whatever the tunnel interface is named (utun index varies), it + // is recognised as a tunnel kind carrying differing DNS. + svc("vpn", Some("utun7"), &["192.0.2.53"]), + ], + ); assert_eq!( - parse_scutil_dns(output, "utun3"), - vec!["10.0.0.1".to_string(), "10.0.0.2".to_string()] + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } + ); + } + + #[test] + fn a_secondary_physical_network_is_not_mistaken_for_a_vpn() { + // The P1 false positive: Ethernet en0 is primary while Wi-Fi en1 stays + // associated with its own DHCP resolver that differs from en0's. With NO + // VPN running, en1 is a distinct *hardware* interface (not the primary, + // not a tunnel) → a parallel network, not the default-resolver hijacker → + // Down (no false apply of the corp domains to the Wi-Fi resolver). + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("phys", Some("en0"), &["198.51.100.1"]), + svc("wifi", Some("en1"), &["198.51.100.99"]), // secondary DHCP DNS + ], ); + assert_eq!(decide(&m), Detected::Down); } #[test] - fn preserves_ipv6_and_double_digit_indices() { - // split_once(':') keeps the full IPv6 address (and %zone) intact, and - // double-digit nameserver indices parse like single-digit ones. - let output = "\ -resolver #1 - nameserver[0] : 2001:db8::1 - nameserver[1] : fe80::1%utun3 - nameserver[10] : 10.0.0.1 - if_index : 14 (utun3) -"; + fn a_tunnel_vpn_is_found_even_alongside_a_secondary_physical_network() { + // A real VPN (utun) AND a secondary physical Wi-Fi are both present, with + // the secondary listed FIRST. The secondary en1 is excluded; the tunnel + // hijacker is still found → Up with the VPN's corp DNS (order-insensitive). + let m = model( + Some("en0"), + Some("phys"), + vec![ + svc("wifi", Some("en1"), &["198.51.100.99"]), // secondary, listed first + svc("phys", Some("en0"), &["198.51.100.1"]), + svc("vpn", Some("utun4"), &["192.0.2.53"]), + ], + ); assert_eq!( - parse_scutil_dns(output, "utun3"), + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } + ); + } + + #[test] + fn an_unscoped_hijacker_with_no_interface_is_detected() { + // A global-default-hijack VPN whose own service entry carries no + // InterfaceName is an unscoped/default resolver → still a hijacker → Up. + let m = model( + Some("en0"), + Some("phys"), vec![ - "2001:db8::1".to_string(), - "fe80::1%utun3".to_string(), - "10.0.0.1".to_string(), - ] + svc("phys", Some("en0"), &["198.51.100.1"]), + ServiceDns { + service_id: "vpn".to_string(), + interface_name: None, + servers: vec!["192.0.2.53".to_string()], + }, + ], ); + assert_eq!( + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } + ); + } + + #[test] + fn hijacker_predicate_classifies_by_interface_kind() { + // Primary interface en0. Hijackers: the primary itself, any tunnel kind, + // or no interface. Non-hijackers: a distinct secondary hardware link. + let primary = Some("en0"); + let h = |iface: Option<&str>| { + is_default_resolver_hijacker(&svc("s", iface, &["192.0.2.53"]), primary) + }; + assert!(h(Some("en0")), "rides the primary default route"); + assert!( + h(Some("utun4")) && h(Some("ppp0")) && h(Some("ipsec0")), + "tunnel kinds" + ); + assert!(h(None), "unscoped/default resolver"); + assert!( + !h(Some("en1")), + "a second Ethernet/Wi-Fi is a secondary network" + ); + assert!( + !h(Some("bridge100")), + "a VM/Thunderbolt bridge is a secondary network" + ); + assert!(!h(Some("pdp_ip0")), "cellular is a secondary network"); } #[test] - fn interface_match_is_exact_not_substring() { - // utun3 must not match utun30. - let output = "\ -resolver #1 - nameserver[0] : 10.0.0.1 - if_index : 14 (utun30) -"; - assert!(parse_scutil_dns(output, "utun3").is_empty()); + fn full_fixture_parse_then_decide_is_up() { + // End-to-end over the real-form dumps: parse the primary interface + + // service id and the two service DNS dicts, assemble the per-service + // model, decide → Up with corp=192.0.2.53, demote=198.51.100.1. + let primary_iface = parse_scalar_field(GLOBAL_IPV4, "PrimaryInterface"); + let primary_svc = parse_scalar_field(GLOBAL_IPV4, "PrimaryService"); + let physical = parse_array_field(PHYSICAL_DNS, "ServerAddresses"); + // The VPN's own service DNS dump (corp resolver), same form as the others. + let vpn_service_dns = "\ + { + ServerAddresses : { + 0 : 192.0.2.53 + } + InterfaceName : en0 +}"; + let corp = parse_array_field(vpn_service_dns, "ServerAddresses"); + let phys_id = primary_svc.clone().unwrap(); + let m = DnsModel { + primary_interface: primary_iface, + primary_service: primary_svc, + services: vec![ + ServiceDns { + service_id: phys_id, + interface_name: Some("en0".to_string()), + servers: physical, + }, + ServiceDns { + service_id: "vpn-service".to_string(), + interface_name: Some("en0".to_string()), + servers: corp, + }, + ], + }; assert_eq!( - parse_scutil_dns(output, "utun30"), - vec!["10.0.0.1".to_string()] + decide(&m), + Detected::Up { + corp_dns: vec!["192.0.2.53".to_string()], + demote_target: vec!["198.51.100.1".to_string()], + } ); } } diff --git a/splitway-daemon/src/detector/macos/state.rs b/splitway-daemon/src/detector/macos/state.rs index 030347b..e49aa4c 100644 --- a/splitway-daemon/src/detector/macos/state.rs +++ b/splitway-daemon/src/detector/macos/state.rs @@ -3,29 +3,33 @@ //! ([`sample_with_retry`]). Kept here, free of FFI/IO, so both are unit-tested. //! //! SCDynamicStore re-fires on every related change, so the [`Deduper`] -//! suppresses samples that carry no new information. Unlike a plain up/down -//! flag, it tracks the actual DNS server list: a VPN DNS rotation (still "up" -//! but with different servers) is a real change and must be re-emitted, or the -//! `/etc/resolver` files would keep pointing at stale servers. +//! suppresses samples that carry no new information. It tracks the full detected +//! state ([`Detected`]) — both the corp DNS *and* the demote-target — so a +//! change to *either* (DNS rotation, or a new physical resolver after a Wi-Fi +//! switch) is a real change and is re-emitted, while an identical re-sample is +//! suppressed. Keying on the whole `Detected` (not just the servers) means a +//! demote-target change is not silently dropped. -/// What the watcher should do for a freshly-sampled DNS server list. +use super::parser::Detected; + +/// What the watcher should do for a freshly-sampled detection result. #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub(super) enum Emit { - /// Emit `Up` with the sampled servers (gained DNS, or the server set changed). + /// Emit `Up` (gained an override, or the corp DNS / demote-target changed). Up, - /// Emit `Down` (lost its DNS). + /// Emit `Down` (the override went away). Down, /// No change since the last emission — emit nothing. Nothing, } -/// Tracks the last emitted DNS state so identical samples are suppressed while a -/// changed server set still propagates. +/// Tracks the last emitted detection so identical samples are suppressed while a +/// changed state still propagates. #[derive(Debug, Default)] pub(super) struct Deduper { - /// `Some(servers)` = last emitted `Up` with these servers; `None` = last + /// `Some(detected)` = last emitted `Up` carrying this state; `None` = last /// emitted `Down` (or nothing emitted yet). - last: Option>, + last: Option, } /// How many times the post-arm initial sample reads `scutil` before giving up. @@ -57,27 +61,35 @@ pub(super) fn sample_with_retry( } impl Deduper { - /// Decide what to emit for the just-sampled `servers` (empty = the interface - /// is down), recording it as the new last-emitted state. - pub(super) fn decide(&mut self, servers: &[String]) -> Emit { - if servers.is_empty() { - if self.last.is_none() { - Emit::Nothing - } else { - self.last = None; - Emit::Down + /// Decide what to emit for the just-sampled detection, recording it as the + /// new last-emitted state. `Detected::Down` maps to a `Down` only if we were + /// previously `Up`; an unchanged `Up` is suppressed; any change to the + /// `Up` payload (corp DNS or demote-target) re-emits `Up`. + pub(super) fn decide(&mut self, detected: &Detected) -> Emit { + match detected { + Detected::Down => { + if self.last.is_none() { + Emit::Nothing + } else { + self.last = None; + Emit::Down + } + } + up @ Detected::Up { .. } => { + if self.last.as_ref() == Some(up) { + Emit::Nothing + } else { + self.last = Some(up.clone()); + Emit::Up + } } - } else if self.last.as_deref() == Some(servers) { - Emit::Nothing - } else { - self.last = Some(servers.to_vec()); - Emit::Up } } } #[cfg(test)] mod tests { + use super::super::parser::Detected; use super::{sample_with_retry, Deduper, Emit}; use std::cell::Cell; @@ -85,6 +97,23 @@ mod tests { values.iter().map(|v| v.to_string()).collect() } + /// An `Up` detection with the given corp DNS and a fixed demote-target. + fn up(corp: &[&str]) -> Detected { + Detected::Up { + corp_dns: s(corp), + demote_target: s(&["198.51.100.1"]), + } + } + + /// An `Up` detection with an explicit demote-target (for the demote-change + /// test). + fn up_with(corp: &[&str], demote: &[&str]) -> Detected { + Detected::Up { + corp_dns: s(corp), + demote_target: s(demote), + } + } + #[test] fn retry_succeeds_on_first_try_without_pausing() { let pauses = Cell::new(0); @@ -159,28 +188,48 @@ mod tests { #[test] fn first_up_then_identical_is_suppressed() { let mut dedup = Deduper::default(); - assert_eq!(dedup.decide(&s(&["1.1.1.1"])), Emit::Up); - assert_eq!(dedup.decide(&s(&["1.1.1.1"])), Emit::Nothing); + assert_eq!(dedup.decide(&up(&["192.0.2.53"])), Emit::Up); + assert_eq!(dedup.decide(&up(&["192.0.2.53"])), Emit::Nothing); } #[test] fn down_emitted_only_when_previously_up() { let mut dedup = Deduper::default(); - assert_eq!(dedup.decide(&s(&[])), Emit::Nothing); // never up -> nothing - assert_eq!(dedup.decide(&s(&["1.1.1.1"])), Emit::Up); - assert_eq!(dedup.decide(&s(&[])), Emit::Down); - assert_eq!(dedup.decide(&s(&[])), Emit::Nothing); // already down - assert_eq!(dedup.decide(&s(&["1.1.1.1"])), Emit::Up); // up again (last reset to None) + assert_eq!(dedup.decide(&Detected::Down), Emit::Nothing); // never up -> nothing + assert_eq!(dedup.decide(&up(&["192.0.2.53"])), Emit::Up); + assert_eq!(dedup.decide(&Detected::Down), Emit::Down); + assert_eq!(dedup.decide(&Detected::Down), Emit::Nothing); // already down + assert_eq!(dedup.decide(&up(&["192.0.2.53"])), Emit::Up); // up again (last reset to None) } #[test] - fn dns_server_rotation_reemits_up() { + fn corp_dns_rotation_reemits_up() { let mut dedup = Deduper::default(); - assert_eq!(dedup.decide(&s(&["1.1.1.1"])), Emit::Up); - // Still up, but the server changed: must re-emit, not suppress. - assert_eq!(dedup.decide(&s(&["2.2.2.2"])), Emit::Up); - assert_eq!(dedup.decide(&s(&["2.2.2.2"])), Emit::Nothing); + assert_eq!(dedup.decide(&up(&["192.0.2.53"])), Emit::Up); + // Still up, but the corp DNS changed: must re-emit, not suppress. + assert_eq!(dedup.decide(&up(&["192.0.2.54"])), Emit::Up); + assert_eq!(dedup.decide(&up(&["192.0.2.54"])), Emit::Nothing); // A different set (added server) is also a change. - assert_eq!(dedup.decide(&s(&["2.2.2.2", "3.3.3.3"])), Emit::Up); + assert_eq!(dedup.decide(&up(&["192.0.2.54", "192.0.2.55"])), Emit::Up); + } + + #[test] + fn demote_target_change_reemits_up() { + // The demote-target (physical DHCP resolver) changing — e.g. after a + // Wi-Fi switch — is a real change even when the corp DNS is unchanged, + // so it must re-emit so the backend re-demotes to the new fallback. + let mut dedup = Deduper::default(); + assert_eq!( + dedup.decide(&up_with(&["192.0.2.53"], &["198.51.100.1"])), + Emit::Up + ); + assert_eq!( + dedup.decide(&up_with(&["192.0.2.53"], &["198.51.100.9"])), + Emit::Up + ); + assert_eq!( + dedup.decide(&up_with(&["192.0.2.53"], &["198.51.100.9"])), + Emit::Nothing + ); } } diff --git a/splitway-daemon/src/detector/macos/watch.rs b/splitway-daemon/src/detector/macos/watch.rs index 9233dd6..ef789dd 100644 --- a/splitway-daemon/src/detector/macos/watch.rs +++ b/splitway-daemon/src/detector/macos/watch.rs @@ -48,7 +48,8 @@ use tokio::sync::mpsc::{self, Receiver, Sender}; use splitway_shared::platform::{PlatformError, VpnEvent, VpnInfo}; -use super::detector::current_dns; +use super::detector::current_vpn_state; +use super::parser::Detected; use super::state::{sample_with_retry, Deduper, Emit, INITIAL_SAMPLE_ATTEMPTS}; /// Delay between retries of the post-arm initial sample (see [`emit_initial`]). @@ -99,14 +100,31 @@ fn run_watch(interface: String, tx: Sender) { } }; - // Watch global DNS plus per-service DNS and per-interface IPv4 changes; any - // of these fires when a VPN comes up or goes down. The callback re-reads the - // full state, so over-broad keys only cost a redundant (deduped) read. - let keys: CFArray = - CFArray::from_CFTypes(&[CFString::from_static_string("State:/Network/Global/DNS")]); + // Watch the per-service DNS and per-interface IPv4 keys (what detection now + // reads — see `parser`), plus the two global keys. `State:/Network/Global/IPv4` + // is essential, not just an extra trigger: detection reads `PrimaryService`/ + // `PrimaryInterface` from it to anchor the physical service and demote target, + // and macOS can switch the primary route between already-configured services by + // updating only this key, without touching any watched DNS key — so without it + // the daemon would sleep with a stale primary-service decision. `Global/DNS` is + // the extra trigger. The callback re-reads the full model and dedups, so + // over-broad keys (and the callback our own demote of a service's DNS triggers) + // only cost a redundant, suppressed read — never a spurious state change. + let keys: CFArray = CFArray::from_CFTypes(&[ + CFString::from_static_string("State:/Network/Global/IPv4"), + CFString::from_static_string("State:/Network/Global/DNS"), + ]); let patterns: CFArray = CFArray::from_CFTypes(&[ CFString::from_static_string("(State|Setup):/Network/Service/.*/DNS"), CFString::from_static_string("State:/Network/Interface/.*/IPv4"), + // The per-service IPv4/IPv6 entities carry each service's `InterfaceName` + // binding, which `read_service_interface` reads to classify a service + // (physical vs. an unscoped default hijacker). A service gaining/losing an + // interface binding without a DNS or global change must re-trigger + // detection — otherwise the service stays classified from stale data (an + // unknown interface can be misread as a hijacker → false "VPN up"). + CFString::from_static_string("State:/Network/Service/.*/IPv4"), + CFString::from_static_string("State:/Network/Service/.*/IPv6"), ]); if !store.set_notification_keys(&keys, &patterns) { log::error!("SCDynamicStore::set_notification_keys failed; macOS VPN watch disabled"); @@ -164,18 +182,18 @@ fn on_change(_store: SCDynamicStore, _changed_keys: CFArray, ctx: &mut } } -/// Steady-state callback path: read the interface's current DNS, decide whether -/// it represents a new state, and send the corresponding event. Returns `false` +/// Steady-state callback path: read the current DNS model, decide whether it +/// represents a new state, and send the corresponding event. Returns `false` /// if the receiver has been dropped. fn emit_current(interface: &str, tx: &Sender, dedup: &mut Deduper) -> bool { - match current_dns(interface) { - Ok(servers) => emit_servers(interface, tx, dedup, servers), + match current_vpn_state() { + Ok(detected) => emit_detected(interface, tx, dedup, detected), // A transient `scutil` failure is not "VPN down": keep the last known // state instead of emitting a spurious Down that would revert rules. // SCDynamicStore re-fires the callback on the next change, so a one-off // hiccup here recovers on its own. Err(e) => { - log::warn!("reading DNS for {interface} failed: {e}; keeping last state"); + log::warn!("reading the DNS model failed: {e}; keeping last state"); true } } @@ -192,16 +210,14 @@ fn emit_current(interface: &str, tx: &Sender, dedup: &mut Deduper) -> /// notifications that fire during the retries are queued and delivered once the /// run loop starts, and the shared dedup prevents a double-emit. fn emit_initial(interface: &str, tx: &Sender, dedup: &mut Deduper) -> bool { - let sample = sample_with_retry( - INITIAL_SAMPLE_ATTEMPTS, - || current_dns(interface), - || std::thread::sleep(INITIAL_SAMPLE_RETRY_DELAY), - ); + let sample = sample_with_retry(INITIAL_SAMPLE_ATTEMPTS, current_vpn_state, || { + std::thread::sleep(INITIAL_SAMPLE_RETRY_DELAY) + }); match sample { - Ok(servers) => emit_servers(interface, tx, dedup, servers), + Ok(detected) => emit_detected(interface, tx, dedup, detected), Err(e) => { log::error!( - "initial DNS read for {interface} failed after {INITIAL_SAMPLE_ATTEMPTS} \ + "initial DNS-model read failed after {INITIAL_SAMPLE_ATTEMPTS} \ attempts: {e}; auto-apply will start on the next DNS/network change" ); true // stay alive; a later notification can still recover @@ -209,19 +225,29 @@ fn emit_initial(interface: &str, tx: &Sender, dedup: &mut Deduper) -> } } -/// Dedup a freshly-read server list and send the resulting event. Returns -/// `false` if the receiver has been dropped. -fn emit_servers( +/// Dedup a freshly-read detection and send the resulting event. Returns `false` +/// if the receiver has been dropped. The `interface_name` carried in `Up` is +/// advisory on macOS (nothing keys on it); the demote-target rides along so the +/// backend can demote the hijacked default to it. +fn emit_detected( interface: &str, tx: &Sender, dedup: &mut Deduper, - servers: Vec, + detected: Detected, ) -> bool { - let event = match dedup.decide(&servers) { - Emit::Up => VpnEvent::Up(VpnInfo { - interface_name: interface.to_string(), - dns_servers: servers, - }), + let event = match dedup.decide(&detected) { + Emit::Up => match detected { + Detected::Up { + corp_dns, + demote_target, + } => VpnEvent::Up(VpnInfo { + interface_name: interface.to_string(), + dns_servers: corp_dns, + demote_target: Some(demote_target), + }), + // `decide` only returns `Up` for a `Detected::Up`. + Detected::Down => unreachable!("decide returned Up for a Down detection"), + }, Emit::Down => VpnEvent::Down { interface_name: interface.to_string(), }, diff --git a/splitway-daemon/src/detector/mod.rs b/splitway-daemon/src/detector/mod.rs index ed80726..1eb7531 100644 --- a/splitway-daemon/src/detector/mod.rs +++ b/splitway-daemon/src/detector/mod.rs @@ -4,6 +4,15 @@ mod linux; #[cfg(target_os = "macos")] mod macos; +/// Re-export the macOS dynamic-store dump parsers so the macOS DNS backend can +/// reuse them for the demote (it reads the same `scutil` dump shape to find the +/// primary service and its current DNS). Kept here, behind the platform gate, so +/// the two macOS modules share one parser rather than duplicating it. +#[cfg(target_os = "macos")] +pub(crate) use macos::{ + parse_array_field as macos_parse_array_field, parse_scalar_field as macos_parse_scalar_field, +}; + #[cfg(target_os = "windows")] mod windows; diff --git a/splitway-shared/src/config/mod.rs b/splitway-shared/src/config/mod.rs index a16392e..7d705eb 100755 --- a/splitway-shared/src/config/mod.rs +++ b/splitway-shared/src/config/mod.rs @@ -50,6 +50,7 @@ pub fn create_empty_config_at(path: &Path) -> Result<(), ConfigParseError> { enabled: default_enabled(), vpn_backend: VpnBackend::default(), openvpn: OpenVpnConfig::default(), + fallback_dns: None, }; save_config_to(path, &empty_config) } @@ -145,6 +146,17 @@ pub fn config_file_path() -> PathBuf { config_folder_path().join("config.json") } +/// Whether `s` is a bare IPv4/IPv6 address literal — not a hostname, not empty, +/// and with no surrounding or embedded whitespace. Used to validate resolver +/// addresses before they reach a platform DNS command that interpolates them +/// verbatim (e.g. the macOS `scutil` demote script: a non-IP value would malform +/// the script rather than fail cleanly). `IpAddr` parsing is strict — it rejects +/// trailing/leading spaces and embedded newlines — so this doubles as an +/// injection guard. Pure; unit-tested without a live system. +pub fn is_ip_literal(s: &str) -> bool { + s.parse::().is_ok() +} + fn config_folder_path() -> PathBuf { // Resolve without panicking — this runs inside a long-lived daemon that // may be a systemd service where HOME is not guaranteed. Prefer @@ -228,6 +240,16 @@ pub struct LocalConfig { /// pre-3c configs parsing; ignored unless `vpn_backend = openvpn`. #[serde(default)] pub openvpn: OpenVpnConfig, + /// Optional override for the off-tunnel fallback resolver used on platforms + /// that demote a hijacked system default (macOS — see + /// [`crate::platform::VpnInfo::demote_target`]). `None` (the default, and + /// always on Linux) means "use the physical primary interface's own DHCP + /// resolver", which the detector discovers; `Some(servers)` pins a specific + /// public resolver (e.g. `["1.1.1.1"]`) for non-corp DNS instead. Ignored on + /// platforms that do not demote. `#[serde(default)]` keeps older configs + /// (without this field) parsing unchanged. + #[serde(default)] + pub fallback_dns: Option>, } #[derive(Debug)] @@ -302,6 +324,8 @@ mod tests { management: "127.0.0.1:7505".to_string(), management_password_file: Some("/etc/splitway/mgmt.pass".to_string()), }, + // A non-default value so the round-trip exercises the new field. + fallback_dns: Some(vec!["192.0.2.1".to_string()]), }; let json = serde_json::to_string(&config).unwrap(); @@ -373,6 +397,21 @@ mod tests { assert!(!leftover_temp, "atomic_write left a temp file behind"); } + #[test] + fn is_ip_literal_accepts_addresses_and_rejects_everything_else() { + assert!(is_ip_literal("192.0.2.1")); + assert!(is_ip_literal("2001:db8::1")); + // Rejected: a hostname, an empty element, and — critically for the + // unescaped `scutil` script — embedded whitespace or a newline. + assert!(!is_ip_literal("example.com")); + assert!(!is_ip_literal("")); + assert!(!is_ip_literal("192.0.2.1 q")); + assert!(!is_ip_literal(" 192.0.2.1")); + assert!(!is_ip_literal( + "192.0.2.1\nset State:/Network/Service/x/DNS" + )); + } + #[test] fn save_config_to_round_trips() { let path = temp_config("save-round-trip"); @@ -382,6 +421,7 @@ mod tests { enabled: true, vpn_backend: VpnBackend::default(), openvpn: OpenVpnConfig::default(), + fallback_dns: None, }; save_config_to(&path, &config).unwrap(); let loaded = load_config_from(&path).unwrap(); diff --git a/splitway-shared/src/platform.rs b/splitway-shared/src/platform.rs index d1b34ab..319b808 100755 --- a/splitway-shared/src/platform.rs +++ b/splitway-shared/src/platform.rs @@ -24,6 +24,21 @@ pub enum PlatformError { pub struct VpnInfo { pub interface_name: String, pub dns_servers: Vec, + /// The resolver to fall back to for *non*-routed (off-tunnel) DNS when the + /// VPN client has hijacked the system **default** resolver rather than + /// scoping its DNS to the tunnel interface (observed on macOS: the corp + /// resolver is registered as the global default, so every query — not just + /// corp ones — would otherwise traverse the tunnel). + /// + /// `Some(servers)` means the backend must, in addition to scoping the corp + /// domains, **demote** the system default to these servers so non-corp DNS + /// resolves off-tunnel. Typically the physical primary interface's own DHCP + /// resolver, surfaced by the detector that found the default was overridden. + /// + /// `None` (the Linux case, and macOS when no default override is detected) + /// means scope-only: the platform already keeps the default off-tunnel, so + /// no demote is needed. Additive — backends that ignore it are unaffected. + pub demote_target: Option>, } #[derive(Debug, Clone)]