diff --git a/README.md b/README.md index 54d2035..c8ee4ca 100755 --- a/README.md +++ b/README.md @@ -418,6 +418,29 @@ equivalently, add the group via your own `users.users..extraGroups`. See the security note under [Using it under niri](#using-it-under-niri-wayland), then that section for binding it to a key. +#### macOS (`Splitway.app`) + +On macOS the GUI ships as a self-installing app — **no Terminal, no Homebrew, no +code signing**. Build it locally (it is ad-hoc/unsigned, `.app` only): + +```sh +bash splitway-gui-tauri/scripts/build-macos-app.sh +cp -R target/release/bundle/macos/Splitway.app /Applications/ +``` + +The wrapper pulls its off-PATH toolchain (`cargo-tauri`, `node`, `esbuild`, …) +from nixpkgs via `nix shell`, so it just runs. A locally-built `.app` carries no +quarantine flag, so it launches from `/Applications` with no Gatekeeper prompt. + +Open it and click **Install & start the Splitway service**: one native password +prompt installs the `splitway-daemon`/`splitway` helpers to `/usr/local/bin`, +creates the `splitway` group and adds you to it, and bootstraps the root +LaunchDaemon with a group-reachable socket — the app then shows **Connected** (no +re-login needed if you launch it after the install; an app left open across the +install reconnects after a relaunch). A discreet **Stop the Splitway service** +footer link reverses it (the daemon reverts `/etc/resolver` and won't relaunch at +boot). See [docs/design/macos-self-install.md](docs/design/macos-self-install.md). + ### Using it under niri (Wayland) niri is a tiling Wayland compositor with **no system tray**, so Splitway is a diff --git a/ROADMAP.md b/ROADMAP.md index e1e6ade..200735e 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -184,7 +184,7 @@ reimplemented per frontend: - **7a — `splitway-gui-core`** (done): extract the framework-agnostic GUI logic (the pure view-model **and** the truth-contract orchestration) into a crate depending on `splitway-shared` only, so both egui and the future Tauri backend - drive one `GuiCore`. See [`docs/design/gui-core-extraction.md`](design/gui-core-extraction.md). + drive one `GuiCore`. See [`docs/design/gui-core-extraction.md`](docs/design/gui-core-extraction.md). - **7b — Tauri shell + read-only view** (done): the `splitway-gui-tauri` backend hosts `GuiCore` and pushes the full view-model to a vanilla-TS frontend that renders it read-only (no mutations — those are 7c). gui-core gained `Verify` @@ -209,7 +209,7 @@ reimplemented per frontend: store. Frozen-on-malformed mutations are rejected with an on-disk-fix message and the frozen state is shown prominently. No protocol change (the verbs already exist at v6); egui stays a read/write reference, untouched. See - [`docs/design/tauri-mutations.md`](design/tauri-mutations.md). + [`docs/design/tauri-mutations.md`](docs/design/tauri-mutations.md). - **7d — visual design + window behavior** (done): the approved Variant B design as the real Tauri UI — full-window layout, the simplified interface-centric model (interface + domains; DNS auto-derived and shown read-only; no vpn-name/backend @@ -220,7 +220,7 @@ reimplemented per frontend: authorized additive protocol bump (v6 → **v7**): `StatusInfo.detected_dns` exposes the selected interface's detected DNS independent of apply state, so the DNS readout is honest in the empty/disabled states too. See - [`docs/design/tauri-design-window.md`](design/tauri-design-window.md). (Manual-DNS + [`docs/design/tauri-design-window.md`](docs/design/tauri-design-window.md). (Manual-DNS override — for VPNs that connect but push no DNS — is deferred as a real future daemon feature, not built here.) - **7d-2 — bundling**: Nix packaging (two-stage frontend + Rust, `wrapGAppsHook3`, @@ -229,6 +229,42 @@ reimplemented per frontend: `packages..splitway-gui`, bundling the IBM Plex OFL `woff2`, and the README GUI-install section. Split from 7d because its real proof — the *built* binary rendering for a fresh in-group niri user — is machine-bound. +- **7d-3 — macOS self-install**: the macOS counterpart of 7d-2's Linux bundling. + An ad-hoc/unsigned `Splitway.app` (`.app` only — no signing, notarization, + `.dmg`/`.pkg`, or `SMAppService`) that bundles the `splitway-daemon` + `splitway` + helpers, a GUI LaunchDaemon plist (carrying `--socket-group splitway`), and a + `bootstrap.sh`. Two health-keyed Tauri commands escalate via `osascript … with + administrator privileges` (one native password prompt) to install/start + (`NotRunning` → Install button) and disable (footer link) the root daemon — no + terminal. The bundle path is additive (the Tauri bundler is invoked only by the + build wrapper; `cargo build` / `nix build` never read `bundle`), and the commands + keep the truth contract (do the work → refresh-now → never touch the VM). Split + from 7d-2 because its real proof — the built `.app` driving the live install on + macOS — is machine-bound. See + [`docs/design/macos-self-install.md`](docs/design/macos-self-install.md). (Homebrew — + installing the same `.app` + binaries, with no competing `service` block — is the + next phase.) +- **7d-4 — macOS DNS privacy (demote + scope)**: the macOS backend reaches DNS + parity with Linux. The previous backend only *scoped* the corp domains via + `/etc/resolver`; that is insufficient against a VPN client that hijacks the + system **default** resolver (the corp resolver is the global default, scoped to + no `utun`, so non-corp DNS would also traverse the tunnel). This phase adds the + **demote**: snapshot the primary network service's DNS, overwrite it with an + off-tunnel fallback (the physical interface's own DHCP resolver, or a + configured `fallback_dns` override), and restore it on every revert path — + transactional and reversible (an on-disk snapshot survives an unclean exit). + Detection is rewritten to be **structural and vendor-neutral**: it reads the + per-service DNS model (a service whose resolver differs from the physical + link's is the VPN) rather than filtering `scutil --dns` by a `utun` — and reads + the VPN signal from the VPN's *own* service, not the global default Splitway + mutates, so the demote does not cause detection to oscillate. The state machine + is decoupled from `vpn_name` on macOS (gated via `reverts_globally()`); Linux + stays interface-keyed and unchanged. DNS only — no IP-route manipulation (the + client already splits IP; same boundary as Linux). The GUI interface-picker + becomes a benign no-op on macOS (removal is a later GUI phase). Implementation + + synthetic-fixture tests land here; the live packet-level / reconnect / revert + acceptance is machine-bound and verified separately. See + [`docs/design/macos-dns-privacy.md`](docs/design/macos-dns-privacy.md). ### Phase 8 — feature freeze + hardening diff --git a/docs/design/macos-dns-privacy.md b/docs/design/macos-dns-privacy.md new file mode 100644 index 0000000..e718803 --- /dev/null +++ b/docs/design/macos-dns-privacy.md @@ -0,0 +1,233 @@ +# macOS DNS privacy — demote the hijacked default + scope the corp domains + +The macOS backend used to do only half of split-DNS: it wrote +`/etc/resolver/` files so the configured corp domains resolve via the +corp DNS. That is the **scope** half. It is sufficient only when the VPN client +*scopes* its DNS to the tunnel. The observed corporate VPN does the opposite — a +**global DNS hijack**: it registers the corp resolver as the system **default**, +so every query that is *not* a configured corp domain would also go to the corp +resolver, over the tunnel. That is the privacy leak Splitway exists to prevent. + +This phase adds the missing half — **demote** — so macOS reaches parity with the +Linux build: the corp resolver sees only the configured corp domains; everything +else leaves via the normal (physical) path and never traverses the tunnel. + +> **Detection is structural and vendor-neutral.** Nothing in the code, tests, or +> this document names a VPN product or its services. Detection keys on the +> *shape* of the DNS configuration (a service whose resolver differs from the +> physical link's), never on a vendor string — so it generalises across clients. + +## What the established facts were (proven on the author's Mac) + +- **The corp resolver is the system default, not tunnel-scoped.** There is no + resolver scoped to any `utun`; the active tunnel `utun` index even varies + between sessions. So the previous interface-keyed detector (filter + `scutil --dns` by a chosen `utun`) found nothing, for any interface choice. +- **Demote holds.** Overwriting the primary network service's `ServerAddresses` + to a different resolver stuck (the client did not re-assert in a tight loop). +- **Demote is real, not cosmetic.** With the default demoted, no DNS for public + names traversed the tunnel — the client does not transparently intercept `:53`. +- **`/etc/resolver/` is immune.** The client never touches those files; a + scoped resolver there takes precedence over the default for its domain. +- **IP is already split by the client** — non-corp IPs route off-tunnel — so + Splitway does **no** IP-route manipulation on macOS (unlike Linux's + default-route demote). Splitway governs DNS only; see the boundary below. + +(Synthetic stand-ins used throughout: corp DNS `192.0.2.53`, physical DHCP DNS +`198.51.100.1`, public-resolver override `203.0.113.9`, corp domains +`corp.example.com` / `jira.example.com`, interfaces `en0` / `utun0`.) + +## The mechanism: demote + scope, transactional and reversible + +On apply (VPN up, with corp domains configured), the macOS backend does both: + +1. **Scope** — write `/etc/resolver/` → corp DNS for each corp domain + (unchanged from before; on-tunnel, intended). Transactional: a mid-write + failure restores every file to its prior bytes. +2. **Demote** — overwrite the primary network service's DNS + (`State:/Network/Service//DNS` `ServerAddresses`) with an off-tunnel + **fallback** resolver, so non-corp DNS resolves off-tunnel. The prior value is + **snapshotted to disk first** so it can be restored exactly. + +Net effect: corp domains → corp DNS (on-tunnel); everything else → fallback DNS +(off-tunnel, invisible to the corp resolver). + +The two steps are transactional **across both**: if the demote fails, the +resolver scope just written is rolled back, so the system is never left +half-changed (scoped but with the default still hijacked). The apply then +surfaces the error rather than recording success. + +On revert (VPN down / disable / stop / shutdown) the backend removes every +managed `/etc/resolver` file **and** restores the demoted default from the +snapshot (then clears it). Restore runs on every exit path — the daemon already +reverts on `SIGTERM` (what `launchctl bootout` sends). + +### The fallback resolver + +The off-tunnel fallback defaults to the **physical primary interface's own DHCP +resolver** (the resolver that interface would use absent any override), which the +detector discovers. A config override — `fallback_dns` in the daemon config — +pins a specific public resolver instead (e.g. `["203.0.113.9"]`). The state +machine folds the override (if set) over the detector's value before handing the +effective fallback to the backend. The override is a root-config-file-only field +(the GUI does not edit it — out of scope this phase). + +## Structural, vendor-neutral detection + +Detection reads the SystemConfiguration dynamic store (via `scutil` in script +mode) and decides **structurally**: + +- the **primary interface** — `State:/Network/Global/IPv4` `PrimaryInterface` + (e.g. `en0`); +- every network service's DNS entry — `State:/Network/Service/.*/DNS` + (`InterfaceName` + `ServerAddresses`). + +The **physical service** is anchored authoritatively by the **primary service +id** (`State:/Network/Global/IPv4` `PrimaryService`), falling back to the primary +interface name only when the id is unknown — because a VPN service can *also* +report the primary interface name, so matching on the interface alone could pick +the wrong service and invert corp/fallback. Its resolver is the demote-target. A +**VPN service** is any service *other than the physical one* (compared by id) +whose DNS differs from the physical resolver — i.e. a non-physical resolver is in +play. VPN is **up** iff such a service exists; its resolver is the corp DNS. The +decision is a set comparison (order-insensitive), and it never references a +`utun` name or a vendor string. + +When Splitway demotes the physical service, it **re-adds that service's +`InterfaceName`** (the demote write would otherwise drop it, since it rebuilds a +minimal DNS dict). Keeping `InterfaceName` — and anchoring on the service id — +means the physical service stays identifiable on the next detection round, so our +own demote never makes detection lose the physical service (which would invert +corp/fallback or undo the demote). + +### Why detection reads per-service DNS, not `State:/Network/Global/DNS` + +This is load-bearing for stability. Splitway's *demote* overwrites the physical +primary service's DNS, which changes the **global default**. A detector keyed on +`State:/Network/Global/DNS` would therefore see the global default become the +fallback the moment our demote took effect, conclude "no VPN" (global == the +physical resolver), revert → the VPN's default returns → re-demote → **oscillation**. + +Reading the VPN's corp DNS from its *own* service entry — which our demote does +**not** touch — keeps detection stable while the demote is in effect: the verdict +is unchanged before and after our own write. A unit test +(`detection_survives_our_own_demote_of_the_physical_service`) pins this. + +## Decoupling the state machine from `vpn_name` (macOS only) + +The state machine used to gate apply/revert on +`info.interface_name == config.vpn_name`. On macOS there is no stable, DNS-scoped +VPN interface to pin (the active `utun` varies), so that gate would never pass. +The gate is now branched on the backend's existing `reverts_globally()` seam: + +- **global-revert backend (macOS)** → the interface gate is skipped; apply is + driven by the DNS-model detection the detector already decided. The advisory + `interface_name` rides along in `VpnInfo` but nothing keys on it. +- **per-interface backend (Linux)** → unchanged: the gate still requires + `interface_name == vpn_name`. + +The same branch covers the two read-only projections that used the gate +(`status().detected_dns` and `routing_state()`), so the macOS status readout is +honest. Linux behaviour is byte-for-byte unchanged (its `MockBackend` defaults +`reverts_globally` to false; all existing Linux state tests pass untouched). + +## Reconcile on event + +Reconnect / Wi-Fi toggle / sleep-wake / re-auth can re-install the corp default +or change the physical DHCP resolver. The SCDynamicStore watch fires on the +relevant DNS keys; the detector re-reads the model and re-emits `Up` whenever the +corp DNS **or** the demote-target changed (the watcher dedups only genuine +no-ops). The applied snapshot now also tracks the demote-target, so a change to +it forces a re-apply (re-demote to the new fallback) rather than being treated as +already converged. This is purely event-driven — no busy loop, no timer; one +re-apply per genuine change. + +Because the observed client did **not** re-assert in a tight loop, no +sub-second re-apply guard is needed; a re-assert that arrives as a DNS-key change +is handled by the normal event path. The demote is idempotent (re-setting the +same fallback), and the snapshot is captured only on the *first* demote of a +given service, so a re-apply never overwrites the original prior state with our +own fallback. + +**Primary-service change.** If the primary network service itself changes while +the VPN stays up (e.g. Wi-Fi → Ethernet) and a demote snapshot already exists for +the *old* service, the demote first **restores the old service** from its +snapshot (so it is not left pinned to Splitway's fallback), then snapshots and +demotes the new primary. Exactly one service is ever demoted at a time, and no +previous primary is stranded on the fallback. + +## Reversibility (the operational-safety contract) + +- The pre-demote primary-service DNS is **snapshotted to disk** before the + overwrite (`/var/run/splitway/dns-demote.snapshot`), so an unclean exit — the + daemon `SIGKILL`ed between demote and a later revert — can still be undone on + the next start. The snapshot uses `atomic_write` (intact on a crash mid-write). +- Restore rewrites exactly the snapshotted servers, or removes the key entirely + when the service had no explicit prior DNS (so SystemConfiguration repopulates + it from the real source), then clears the snapshot. +- Every system-network mutation is captured-before-write and rolled back on any + failure path, mirroring the `/etc/resolver` apply's existing discipline — the + machine is never left with a broken or half-demoted resolver. + +## Testability + +All `scutil` contact goes through two seams so the logic is unit-tested without +touching the live system: + +- detection parses the dynamic-store dumps with **pure** functions over the + synthetic-fixture dump shapes; the structural decision is a separate pure step; +- the demote/restore go through an injected `ScutilRunner` (the real impl shells + out; tests inject a fake that **captures the exact script issued** and returns + canned state) and a `SnapshotStore` (real = on-disk; tests = in-memory). The + `apply_with` / `revert_with` wiring (including the rollback-on-demote-failure) + is tested with both seams faked. + +## Scope boundary — DNS only, not IP routing + +Splitway governs **DNS**, not IP routing. The observed client's split-tunnel +already keeps non-corp **IP** traffic off the tunnel, so macOS does no route +manipulation. A full-tunnel / include-routes VPN that carried IP traffic through +the corp tunnel is **out of scope** (the same boundary as Linux): Splitway would +still split the DNS, but the IP path is the VPN client's concern. This is +deliberate and documented so the boundary is not mistaken for a gap. + +## What is not built here + +- **GUI changes** (interface-picker removal, a corp-domains / fallback-DNS UI) — + a later phase. The macOS daemon no longer depends on the *value* of `vpn_name` + for correctness — the detector auto-detects and ignores whatever is picked — so + the picker is a value-agnostic no-op there. One residual dependency remains, + though: `vpn_name` is still load-bearing as the **arming switch**. `arm_watch` + bails out when `vpn_name` is empty (`DetectorHealth::Inactive`, no SCDynamicStore + watch, no initial sample), and a fresh macOS self-install boots exactly there + (`bootstrap.sh` starts the daemon, which self-creates its config with an empty + `vpn_name`), so no detection runs until the user picks *some* interface (any + value works). Today the GUI's platform-agnostic main stage pushes the user + through the picker, so the flow works in practice; a future picker-removal / + auto-detection phase must also branch `arm_watch`'s empty-name gate (e.g. on + `reverts_globally()`) or it would silently disable macOS detection. The state + machine is strictly daemon-side here. +- **Homebrew packaging** — a later phase; nothing ships until the live + acceptance below passes. +- **The live verification itself** — see below. This phase is implementation + + synthetic-fixture tests only. + +## Deferred — live acceptance (run on the real VPN, not in this phase) + +1. Packet-level, both interfaces, fresh random subdomains: public queries on the + physical interface, **zero on the tunnel**; a corp host still resolves (via + `/etc/resolver`). +2. Event robustness: reconnect / Wi-Fi toggle / sleep-wake → the demote re-holds. +3. Clean revert: stop / VPN-down restores the machine exactly as found. +4. `status` / `verify` / `check ` / `check ` report the + true state. + +## Links + +- [socket-group.md](socket-group.md) — the unprivileged-GUI access model. +- [macos-self-install.md](macos-self-install.md) — how the macOS daemon is + installed/run (the privileged bootstrap this DNS work runs under). +- [linux-default-route-catch-all.md](linux-default-route-catch-all.md) — the + Linux analogue (demote the link's DNS default-route so split-DNS holds). +- [architecture.md](../architecture.md) — the truth contract and the DNS-only + boundary. diff --git a/docs/design/macos-self-install.md b/docs/design/macos-self-install.md new file mode 100644 index 0000000..7427239 --- /dev/null +++ b/docs/design/macos-self-install.md @@ -0,0 +1,220 @@ +# macOS self-install — one-click privileged daemon bootstrap from Splitway.app + +The macOS daemon must run as **root** (it writes `/etc/resolver/` and +flushes the DNS cache). Before this, bringing it up meant a terminal ritual: +`sudo launchctl load …` plus a hand-written config. This feature makes the user +double-click `Splitway.app`, click one button, authenticate once via the native +macOS password dialog, and have the root daemon installed + running — no terminal. + +## The agreement + +`Splitway.app` is built locally from source and ships **unsigned** (ad-hoc +identity `-`): no Apple Developer account, no notarization, `.app` only (no +`.dmg`/`.pkg`). A locally-built `.app` carries no `com.apple.quarantine`, so it +launches from `/Applications` without Gatekeeper friction — which is exactly what +makes the no-signing path viable. + +The app installs the daemon itself through two Tauri commands keyed to the +existing health states: + +- **`install_service`** — offered when `Health == NotRunning` (no socket). It + escalates via `osascript`'s `do shell script … with administrator privileges` + (one native password prompt) to run the bundled `bootstrap.sh install` as root. + The script, idempotently: installs `splitway-daemon` + `splitway` from the app + bundle to `/usr/local/bin` (`755`, quarantine stripped); ensures a `splitway` + group and adds the console user; installs the GUI LaunchDaemon plist (carrying + `--socket-group splitway`) to `/Library/LaunchDaemons`; and + `launchctl bootout` → `bootstrap` → `enable`s it. +- **`disable_service`** — a discreet footer link once connected. Runs + `bootstrap.sh disable`: `launchctl bootout` (SIGTERM → the daemon reverts + `/etc/resolver` before exit) and removes the plist so it will not relaunch. + +Both keep the **truth contract** ([architecture.md](../architecture.md) §2): the +command does the privileged work, fires refresh-now, and returns a +`Result<(), String>` — it never touches the view-model. The real health +(`NotRunning` → `PermissionDenied`/`Connected`, or back to `NotRunning`) flows +back only through the next `view-model-changed`, exactly as for the mutation +commands. No optimistic flips. + +A third command, **`host_platform`**, lets the frontend branch the remediation +copy: macOS gets the Install button / sign-out guidance; Linux keeps its +`systemctl` / `usermod` copy-paste commands. This is frontend presentation only — +no view-model field is added, so the bindings contract is untouched. + +## Why this shape + +- **`osascript` admin escalation, not `SMAppService`.** `SMAppService` needs code + signing + notarization + a user-approved Login Item — all rejected here. + `osascript … with administrator privileges` is the supported, signing-free way + to get one password prompt and run a fixed root command. +- **Not the deprecated `AuthorizationExecuteWithPrivileges`**, and not a + `brew services` wrapper (a Finder-launched `.app` has a minimal `PATH`, and the + brew prefix differs Intel vs ARM — Homebrew is a later phase anyway). +- **The escalated command is inert.** It is a fixed `/bin/bash