Conversation
…ed install/disable Ship Splitway as a self-installing macOS app so a user double-clicks Splitway.app, clicks one button, authenticates once via the native password dialog, and the root split-DNS daemon is installed and running — no terminal. - Bundle: an ad-hoc/unsigned `Splitway.app` (`.app` only — no signing, notarization, .dmg/.pkg, or SMAppService) embedding the splitway-daemon + splitway helpers, a GUI LaunchDaemon plist (carrying --socket-group splitway), and bootstrap.sh. The bundle path is additive: the Tauri bundler is invoked only by scripts/build-macos-app.sh via a tauri.bundle.macos.json overlay applied with --config, so `cargo build` / `nix build` / `nix flake check` never read it and the Linux build is untouched. - Commands (bridge.rs): install_service / disable_service escalate via `osascript ... with administrator privileges` (one prompt) to run the inert, idempotent bootstrap.sh as root; host_platform lets the frontend branch copy. They keep the truth contract — do the work, fire refresh-now, never touch the VM; real health flows back through view-model-changed. No capability change (custom commands are not ACL-gated). The escalated command is fixed apart from the bundle-derived resource path, which is escape-safe + `quoted form of`'d (unit-tested via the pure build_admin_applescript). - Frontend: the NotRunning blocker offers an Install button on macOS (vs the Linux systemctl line); PermissionDenied guides toward sign-out (not usermod); a discreet footer link disables via a two-click arm (WKWebView suppresses window.confirm). Platform-branched, no view-model shape change. - bootstrap.sh hardening: pins a system PATH; installs the root-run daemon binary to a root:wheel 0755 /usr/local/bin and refuses if the dir cannot be made root-owned (closes the launchd unsafe-binary-location escalation on the Homebrew-on-Intel layout); settles + retries the bootout->bootstrap relaunch so a re-install never intermittently leaves the daemon stopped. Verified end-to-end on macOS against a live VPN: install (socket 0660 root:splitway, no quarantine), GUI Connected, interface/domain mutations through the group socket, idempotent re-install, and disable (reverts /etc/resolver, removes the plist). The re-login group gotcha did not manifest here — a freshly launched app sees the new membership immediately (documented). Homebrew packaging (installing the same .app, no competing service block) is the next phase. See docs/design/macos-self-install.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…d app_id Prepares the systemd unit and the egui GUI for the deb/rpm/pacman packages. - splitway.service: ExecStart now passes `--config /var/lib/splitway/config.json` and the unit declares StateDirectory=splitway (0700), mirroring nix/module.nix. The daemon creates its config under the persistent, daemon-owned state dir on first run instead of falling back to /root/.config (and logging a warning). - Add the same --config to the commented socket-group opt-in ExecStart override. The bare `ExecStart=` reset fully replaces the command, so without this an opt-in user would silently drop --config and reintroduce the /root/.config fallback this commit removes. (Minor deviation from the "keep the block untouched" plan, for correctness/consistency with the main ExecStart and nix/module.nix.) - splitway-gui: set the ViewportBuilder app_id to io.github.stslex.splitway so Wayland compositors map the window to the packaged .desktop entry + hicolor icon (shipped under that basename by the GUI package). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
Adds the MIT LICENSE (the repo had none) and declares `license = "MIT"` on every workspace crate, then defines the deb + rpm metadata for the core `splitway` package on splitway-daemon. The package ships both binaries — `splitway-daemon` and the `splitway` CLI (from splitway-cli) — plus the systemd unit and README/LICENSE. It is built musl-static (*-unknown-linux-musl) so it has no shared-library dependencies and installs on any glibc/musl baseline; the desktop GUI is a separate package (next commit) that Depends on this one. - [package.metadata.deb]: name=splitway, binaries to /usr/bin, copyright from the MIT text, Recommends network-manager + systemd-resolved (not Depends), empty Depends (static). An (empty) maintainer-scripts dir enables cargo-deb's systemd-units integration to generate the postinst/postrm: enable+start on install, restart on upgrade, stop on remove, daemon-reload. - [package.metadata.generate-rpm]: same layout, auto-req disabled (static), weak-dep Recommends, raw /bin/sh systemd scriptlets (cargo-generate-rpm does not expand %systemd_* macros). Asset-path note (verified by building both packages with cargo-deb 3.7.0 and cargo-generate-rpm 0.21.0): cargo-deb resolves non-`target/` asset paths relative to THIS crate's manifest dir (so workspace-root files use `../`), while cargo-generate-rpm resolves relative to the invocation dir (workspace root, bare paths) — hence the intentional path skew between the two blocks. Version is stamped at the packaging layer (cargo deb --deb-version / cargo generate-rpm --set-metadata) so dev builds get <ver>~dev.<utc>.<sha> without a non-semver string in Cargo.toml. CI builds per-triple with --target (remaps target/release -> target/<triple>/release and stamps the arch). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
…opt-in A separate `splitway-gui` package for the egui desktop GUI (glibc, dynamic), which Depends on the musl-static core `splitway` (>=, the IPC compat contract). - .desktop (io.github.stslex.splitway.desktop, Exec=splitway-gui) + hicolor icons rasterized from assets/icon/splitway-icon.svg (8 PNG sizes + scalable SVG, basename = app_id) via packaging/icons/generate-hicolor.sh. The icon tree is committed so every packaging path ships it without a rasterizer; the .desktop passes desktop-file-validate. - Dependencies (both formats, verified by building real packages): the eframe/glow windowing libs are DLOPEN'd by winit/glow at runtime, so they are absent from the ELF DT_NEEDED and neither cargo-deb's $auto nor cargo-generate-rpm's auto-req can detect them — they MUST be hardcoded. deb: libgl1, libx11-6, libxcursor1, libxi6, libxrandr2, libwayland-client0, libxkbcommon0, libc6 (>= 2.31 floor). rpm: mesa-libGL, libX11, libXcursor, libXi, libXrandr, libwayland-client, libxkbcommon (auto-req still pins the glibc floor from libc/libgcc sonames). Recommends an XDG desktop portal + backend for rfd's file dialog. - Socket-group opt-in (security-sensitive): the maintainer scripts create an EMPTY `splitway` group and install a service drop-in switching the daemon to group-socket mode (0660 root:splitway, dir 0750). EMPTY-GROUP INVARIANT: with no members the posture is identical to the default 0600 root-only; the scripts NEVER add a user — the only grant is a human running `usermod -aG splitway <user>` + re-login. postinst installs the drop-in + reloads + restarts; postrm removes it, groupdel only if empty, reverting to root-only. A loud first-install message prints the exact opt-in one-liner. Validated by building both packages with cargo-deb 3.7.0 / cargo-generate-rpm 0.21.0 and inspecting deps, files, and scriptlets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
…tests New packaging.yml (no secrets — gates the PR). The publish/sign job is added in the next commit. - meta: compute the channel version once (compute-version.sh). Release (push to master) = clean <X.Y.Z>; dev/PR/dispatch = <X.Y.Z>~dev.<utc>.<sha>, which sorts below the release in dpkg and rpm. - build-core (amd64 + arm64): `cross` builds splitway-daemon + splitway CLI musl-static, asserts `file` reports "statically linked", then cargo-deb / cargo-generate-rpm with --target (remaps target/release + sets arch) and the stamped version. Emits deb + rpm + tarball. - build-gui (amd64 + arm64): builds the egui binary INSIDE debian:bullseye (glibc 2.31 floor, so the libc6 (>= 2.31) / rpm GLIBC requires are true), arm64 on a native arm64 runner (no QEMU). Rewrites the core-dependency floor to the built version for dev channels, then packages on the host. - test-install (debian:bookworm, ubuntu:22.04, fedora:latest): installs the built artifacts directly; asserts the binaries run, the unit validates (systemd-analyze verify), the GUI pulls splitway + the GL deps, the empty `splitway` group exists with no members, and the .desktop validates. - test-signed-repo: generates a THROWAWAY gpg key, builds + signs local apt (build-apt-repo.sh) and dnf (build-dnf-repo.sh) repos from the artifacts, serves them over localhost, and installs with signature verification ON — proving metadata + signing + verify end-to-end with no production secret. (Those two repo scripts are reused by the real publish job next commit.) - test-arm64-smoke: best-effort arm64 deb install under QEMU. Validated locally: actionlint clean (incl. shellcheck of run blocks); all helper scripts shellcheck-clean; the deb/rpm builds + --target/--deb-version/ --set-metadata flags exercised against cargo-deb 3.7.0 / cargo-generate-rpm 0.21.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
Adds the publish-pages job (secrets) and the signing plumbing reused by the
PR-time ephemeral-key test.
- publish-pages: runs on push to master (release) / dev (dev) and dispatch,
NEVER on pull_request, gated behind the build + test jobs. Serialized by a
single `pages-deploy` concurrency group (cancel-in-progress: false) so two
deploys queue rather than clobber. Imports the RSA signing key, checks out
(or bootstraps) the persistent gh-pages branch, drops the artifacts into the
correct channel pool, regenerates + signs ONLY that channel's apt + dnf
metadata, publishes the armored pubkey to splitway.gpg, renders index.html,
and commits + pushes — MERGE, never wipe, so old versions and the other
channel survive. Release additionally attaches the tarballs to the v<ver>
GitHub Release. A post-deploy smoke waits for Pages to go live, then installs
from the real repo with signature verification ON (apt + dnf).
- build-apt-repo.sh / build-dnf-repo.sh: optional SPLITWAY_GPG_PASSFILE feeds
the real key's passphrase via loopback (never on a command line); unset for
the passphrase-less ephemeral key.
- render-index.sh: the Pages landing page with per-distro, per-channel
add-repo snippets + the key fingerprint.
- RSA fix: the throwaway test key (and, by requirement, GPG_PRIVATE_KEY) is RSA
— rpm --addsign only produces a verifiable signature with RSA; an EdDSA key
silently yields no RPMTAG_RSAHEADER (found + fixed via local signing tests).
Validated locally: apt InRelease + Release.gpg verify Good; per-arch Packages
filtering correct; dnf repomd.xml.asc verifies; RSA-signed rpm passes rpm -K
("digests signatures OK"). actionlint + shellcheck clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
Closes Arch via a self-hosted, GPG-signed pacman repo (reusing the apt/dnf key + Pages), since AUR registration is disabled. x86_64 for the hosted repo; aarch64 users use the in-repo splitway-bin PKGBUILD. PKGBUILDs in packaging/aur/ (also usable now via `makepkg -si`): - splitway: source build of daemon + CLI from the release tag; makedepends cargo; ships both binaries + unit; splitway.install prints the enable hint (Arch policy: no auto-enable); optdepends networkmanager / systemd-resolvconf. - splitway-bin: prebuilt from the release tarball (x86_64 + aarch64); provides/conflicts splitway. - splitway-gui: source egui GUI; depends splitway + libglvnd/libxkbcommon/ wayland/libx11/libxcursor/libxi/libxrandr; optdepends xdg-desktop-portal; splitway-gui.install mirrors the deb/rpm empty-group + drop-in invariant. CI (packaging.yml): - build-arch: archlinux container, non-root makepkg of the two source PKGBUILDs from THIS checkout (clean version — pacman has no ~dev channel because vercmp does not treat ~ as a pre-release marker), validates .SRCINFO + namcap (advisory). x86_64 .pkg.tar.zst artifacts. - test-arch: pacman -U local install (asserts binaries + unit + empty group), then a throwaway-RSA-key signed repo — repo-add --sign, pacman-key --add/--lsign, SigLevel = Required DatabaseOptional, pacman -Sy with verification ON. - publish-pages (release only): copies the .pkg.tar.zst into the persistent arch/release/x86_64 subtree, detach-signs each with the real key, repo-add (in a container) incrementally (old packages preserved), replaces Pages- hostile db/files symlinks with real signed copies, reuses splitway.gpg. Post-deploy: live `pacman -Sy splitway` with verification ON. - render-index.sh: Arch section (signed repo primary, makepkg alternative, AUR pending). DEFERRED (not here): the automated AUR ssh push — blocked on AUR registration reopening. The in-repo PKGBUILDs are the bridge. Validated locally: all PKGBUILD/.install bash-syntax-clean; actionlint clean; index renders with $arch left literal for pacman. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
Docs only — no code. - README: Install sections for apt / dnf (both channels) and Arch (signed pacman repo primary, in-repo PKGBUILD makepkg alternative; AUR pending), with the key-verification note. NixOS section left as-is. - packaging/README.md: a Distribution packages (deb / rpm / pacman) section — the two-package split, dev vs release channels + ~dev versioning, the glibc 2.31 floor, the dlopen'd GL deps, the GUI socket-group drop-in + empty-group invariant, and the pacman specifics (x86_64-only, no ~dev channel, detached signing, repo-add incrementality). - docs/design/linux-distro-packaging.md: the durable record — decisions 1-7, the channel/version topology, dep lists, signing + merge mechanics across all three formats (incl. the RSA-not-EdDSA and cargo-deb-vs-generate-rpm path-resolution gotchas), the two-layer test design, the pacman-now/AUR-later Arch strategy, and the signing key. - ROADMAP.md: Phase 6 marked done (Linux), noting the two-package design supersedes the original one-package sketch; macOS Homebrew + the automated AUR push remain deferred. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
No behavior change for the current (validated) inputs — robustness only, flagged by an adversarial self-review of the diff: - build-apt-repo.sh: `break` after the first Architecture line in the per-arch Packages filter (the canonical value; a valid stanza has exactly one). - packaging.yml: `g` flag on the GUI core-dep-floor seds (defensive if more `splitway (>=` references are ever added). - packaging.yml: comment that test-arm64-smoke is intentionally not a publish gate (best-effort under QEMU, continue-on-error). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018GKGiqawfPb5fzMGHisrrF
Resolves the Codex + self-review findings on the Phase 6 packaging PR. Build-blocking CI fixes: - static-linkage gate: assert no dynamic interpreter instead of requiring the literal "statically linked" (musl x86_64 is "static-pie linked") — unblocks build-core (amd64) and the whole deb/rpm pipeline - test-arch: read the GPG fingerprint from fpr field 10, not 5 (5 is empty) - AUR splitway-bin: rename splitway.install -> splitway-bin.install so install="$pkgname.install" resolves (makepkg no longer aborts) - build-gui: CARGO_NET_RETRY + a retrying `cargo fetch --locked` before the offline build, hardening the transient arm64 crates.io failure Supply-chain / correctness: - apt: dearmor the published (armored) key into the binary keyring at every consumption site (works on every apt version; the published file stays armored for rpm --import / pacman-key) - publish: fail-closed signature verification (apt InRelease+Release.gpg, dnf repomd + per-rpm RSA header, pacman db+pkg sigs) BEFORE the gh-pages push - apt Release: Valid-Until (APT_VALID_DAYS, default 90d; LC_ALL=C) for freeze/replay protection - dev-floor sed: anchor to the dependency lines (leave the comment), and assert the floor was actually stamped - signed-repo tests (apt/dnf/pacman) now install splitway-gui too, exercising the GUI package under signature enforcement Arch / consistency / docs: - splitway-gui PKGBUILD: splitway>=$pkgver floor + hicolor-icon-theme, desktop-file-utils - new check-pkgver-sync.sh (meta job): daemon version == every PKGBUILD pkgver - render-index/README: dearmor --yes for apt, pacman-key --init, dev-channel keyring install - splitway-bin SKIP digest documented + tracked for the deferred AUR-push phase Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…oling Follow-up to the second Codex pass on PR #36. - splitway-bin: `provides=("splitway=$pkgver")` (was unversioned). pacman only satisfies a versioned dependency from a versioned provision, so splitway-gui's new `splitway>=$pkgver` floor could not install against the prebuilt core. - test-signed-repo: install `rpm` in the tooling step. build-dnf-repo.sh needs `rpm --addsign`; it passes today only because ubuntu-latest pre-installs rpm — make it explicit (matches the publish job) so the dnf signing path is robust. - packaging/README.md: document the one-time config relocation for users who ran the daemon by hand as root before packaging (old XDG fallback /root/.config/splitway -> /var/lib/splitway). No maintainer-script migration: no published package used the old path, it is config-not-read (not data loss), and the deb core postinst is cargo-deb-generated (#DEBHELPER#) from an empty dir, so a hand-written migration would risk breaking systemd enablement that the docker install-test (no systemd) could not catch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `meta` gate (check-pkgver-sync.sh) requires every packaging/aur/*/PKGBUILD `pkgver=` to equal the daemon version, but release.yml's bump-version job only bumped splitway-daemon/Cargo.toml. So the first post-release auto-bump would leave the three PKGBUILDs behind and fail every later packaging run in `meta` until a human hand-edited all three. - Add packaging/ci/sync-pkgver.sh: the write side of the lockstep invariant (symmetric with check-pkgver-sync.sh, same daemon-version read). It stamps the daemon version into each PKGBUILD `pkgver=` and resets `pkgrel=1` (Arch convention on a version change). The `$pkgver`-derived fields (source URLs, provides, depends floor) follow automatically. - release.yml bump-version: run sync-pkgver.sh after the Cargo.toml bump and `git add` the PKGBUILDs into the same commit. Verified locally: no-op when already in sync (no diff), correct bump on a simulated 0.0.5 -> 0.0.6, and check-pkgver-sync.sh passes against the result. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review) Two findings from Codex's re-review of b326036: - P2 (splitway-gui/Cargo.toml): the deb declared libgl1 but not libegl1, and the rpm required mesa-libGL but not mesa-libEGL. glow/glutin creates its GL context via EGL on Wayland (glutin_egl_sys is in Cargo.lock; GLX on X11), and libgl1 does NOT pull libegl1 — so a minimal Wayland-only install would succeed and then fail at GL-context creation. Add libegl1 (deb) and mesa-libEGL (rpm); both pull the libglvnd EGL loader, mirroring how libgl1/mesa-libGL pull GLX. Arch is unaffected (libglvnd already provides libGL + libEGL). Doc + comments updated. - P1 (packaging.yml test-arch): the local `pacman -U` smoke preinstalled the GL/X11/wayland libs but not hicolor-icon-theme/desktop-file-utils, which the splitway-gui PKGBUILD declares as hard deps. `pacman -U` on local files only auto-resolves deps from a synced repo DB, so the smoke must not lean on that — preinstall the full declared set explicitly, matching the PKGBUILD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iew round 3) Codex re-review (1) + self-review (4) of 6de36d7: - P2 (Codex, splitway-gui): add the X11 xkb runtime lib. winit's default `x11` feature has xkbcommon-dl dlopen libxkbcommon-x11.so (confirmed in Cargo.lock), a SEPARATE package from libxkbcommon on Debian (libxkbcommon-x11-0), Fedora (libxkbcommon-x11) AND Arch (libxkbcommon-x11 — verified via the Arch package API: it provides libxkbcommon-x11.so and is NOT bundled in the base libxkbcommon). Without it an X11 session can fail to load the lib before the window opens. Added to the deb Depends, rpm Requires, the Arch GUI PKGBUILD depends, and both Arch CI preinstall lists. - P2 (postinst): the groupadd under `set -e` was the only unguarded mutating step. Keep the abort (the drop-in is meaningless without the group) but make it deliberate + diagnosable with an explicit error message + exit. - P2 latent (version reads): `grep '^version' … | head -1` SIGPIPEs grep under `set -o pipefail` if a second match ever appears. Replaced with a SIGPIPE-free `awk -F'"' '/^version/{print $2; exit}'` in compute-version.sh, check-pkgver-sync.sh (+ the pkgver read), sync-pkgver.sh and release.yml (x3). - P2 (splitway-bin): surface the same networkmanager / systemd-resolvconf optdepends as the source splitway PKGBUILD (identical daemon, same prereqs). - P2 (README niri): the interim egui GUI now sets `.with_app_id(...)` and the packaged splitway-gui ships the matching .desktop + icons, so the app-id window rule is no longer Tauri-only; clarified the packaged GUI is the egui build. Also quoted "$GITHUB_OUTPUT" in the release.yml blocks touched above (pre-existing SC2086). actionlint + shellcheck clean; awk reads verified to yield 0.0.5 and check-pkgver-sync passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…emon (Codex P2) The post-release auto-bump stamped the AUR PKGBUILDs with the *next* (unreleased) daemon version, so `makepkg -si` from master pointed at a `v$pkgver` tag / release assets that do not exist yet. - release.yml: run sync-pkgver.sh BEFORE the daemon bump so the PKGBUILDs pin the just-released version; bump the daemon afterwards for the next cycle. - check-pkgver-sync.sh: validate the pinned pkgver names a release tag that EXISTS (all three PKGBUILDs agreeing), with a no-tags bootstrap fallback — instead of requiring equality with the in-tree daemon version. - packaging.yml: fetch-depth: 0 on the meta checkout so the gate sees tags. - docs/design: record the invariant and why the gate checks existence, not "latest" (avoids spurious dev/PR and release-window failures). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EKokH93URrCizdQdVpNCNM
… daemon (Codex P2) Follow-up to pinning the AUR pkgver to the released tag: the daemon version now intentionally runs ahead of the committed pkgver. build-arch pre-placed the checkout tarball as <pkg>-$VERSION.tar.gz (daemon version), but the PKGBUILD's source= expects <pkg>-$pkgver.tar.gz. When the two diverge makepkg can't find the local archive and silently downloads the old v$pkgver tag, building stale code while the smoke test still passes. Read pkgver from each PKGBUILD and key both the archive filename and the git-archive --prefix on it, so the pre-placed checkout is always the source makepkg uses. Drop the now-unused VERSION env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EKokH93URrCizdQdVpNCNM
…inned tag (Codex P1) The committed PKGBUILD pkgver intentionally lags to the last released tag (so a user's `makepkg -si` resolves an existing tag), but the hosted pacman repo must ship the version being released. The prior approach keyed the build on the committed pkgver, so a release push would publish Arch packages under the PREVIOUS version and Arch users would never receive the new one. build-arch now stamps the daemon/meta version into the EPHEMERAL PKGBUILD (never committed) and pre-places the matching <pkgname>-<VERSION>.tar.gz so makepkg builds THIS checkout as VERSION — no tag download, correct published version. The committed PKGBUILDs and the check-pkgver-sync.sh gate are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EKokH93URrCizdQdVpNCNM
`pacman -Sy <pkg>` refreshes the sync databases without upgrading, leaving the DB ahead of installed packages and resolving deps from a partial-upgrade state. Use `-Syu` in the hosted-index and README install snippets. The CI smoke tests keep `-Sy` (throwaway containers, not user systems). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01EKokH93URrCizdQdVpNCNM
origin/master carried splitway-daemon Cargo.toml=0.0.6 with a stale Cargo.lock=0.0.5: release.yml's post-release auto-bump rewrote the toml but never relocked. The dev->master merge tree inherited that drift, so `cargo fetch --locked` (build-arch's makepkg prepare) and `cargo build --frozen` (build-gui) aborted with "cannot update the lock file ... because --locked was passed" — the red build-arch / build-gui checks on PR #37. - Sync this branch's Cargo.lock to daemon 0.0.6 (merged from master), so the merge tree is internally consistent and the packaging jobs build. - release.yml bump-version: add a toolchain, then `cargo update -p splitway-daemon --precise "$NEW_VERSION"` and stage Cargo.lock in the bump commit, so master never drifts again. - The bot bump commit gets `[skip ci]`: a GITHUB_TOKEN push already does not re-trigger workflows, but the marker makes it explicit (and holds if the push ever moves to a PAT), so the next-cycle bump can never publish an unreleased X.Y.(Z+1) to the stable channels (Codex P1). - ci.yml: a `cargo metadata --locked` guard fails loudly on any future Cargo.lock drift instead of letting it surface as a packaging abort. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The post-deploy steps poll the live Pages URL for 10 min and `exit 1` on timeout. On the first (root-commit) gh-pages deploy — Pages not yet enabled/propagated — the package is never served in the window, so the whole publish-pages job is reported failed even though the signed deploy already pushed and verified cleanly. Downgrade both liveness timeouts (apt/dnf smoke and the pacman smoke) to `::warning` + `exit 0`, skipping the live install. The pre-push "Verify all signatures (fail closed)" step still gates integrity before the push, and the actual install stays fatal once the repo IS live — only the propagation-timeout path is now non-fatal, so a verified publish is never reported as failed. Note: enabling GitHub Pages for the gh-pages branch is a one-time repo setting (Settings -> Pages -> Branch: gh-pages); nothing in CI can do it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
build-dnf-repo.sh: the single PASS_OPT string relied on word-splitting (SC2086 disabled) at the gpg call. Split it: pass_opt[] (array) for the direct gpg detach-sign — space-robust, mirroring build-apt-repo.sh's gpg_sign — and pass_macro (plain string, byte-identical to the original) for the rpm __gpg_sign_cmd macro, which rpm tokenizes itself and cannot take a bash array. The passphrase FILE path is mktemp-derived and never has spaces, so the macro keeps the proven form (zero regression on the real-key publish path, which has never run in CI). compute-version.sh: append GITHUB_RUN_NUMBER to the dev pkgver -> `<ver>~dev.<utc>.<run>.<sha>`. Two pushes in the same UTC second now order monotonically (run number is compared numerically) instead of falling back to the non-monotonic lexical short-sha compare. Defaults to 0 outside Actions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The design doc's GUI dependency lists had drifted from the manifests: - Debian: add libegl1 + libxkbcommon-x11-0 - Fedora: add mesa-libEGL + libxkbcommon-x11 - Arch: add libxkbcommon-x11 …with a note on why EGL is separate from GL (EGL context on Wayland) and xkbcommon-x11 separate from xkbcommon (winit x11 dlopen), and that the list is the windowing-lib subset (the PKGBUILD also carries the icon / desktop-file install-hook deps). Also update the now-stale dev version format `<X.Y.Z>~dev.<utc>.<sha>` -> `<X.Y.Z>~dev.<utc>.<run>.<sha>` in the channel-topology table and in packaging/README.md to match compute-version.sh. splitway-gui/Cargo.toml: document the rpm Recommends asymmetry — the deb offers gtk|wlr|kde but a cargo-generate-rpm `key="*"` table has no boolean-OR, so it names the GTK portal (a weak dep that works on most desktops); a passthrough rich-dep is left out until verified to emit correctly through cargo-generate-rpm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The post-release auto-bump only updates master, so after each release dev falls one version behind (exactly the drift that left this branch at 0.0.5 while master was at 0.0.6). Add a `sync-dev` job (`needs: bump-version`) that opens — or refreshes — a PR merging master back into dev so dev tracks the next dev cycle automatically. A PR, not a direct push: dev follows the branch→PR workflow and may be protected. The job no-ops when dev already contains master, reuses an open sync PR instead of duplicating, and enables auto-merge best-effort (lands once mergeable where the repo allows it, otherwise waits for a one-click merge). The PR is opened by GITHUB_TOKEN, which does not trigger pull_request CI on itself — documented inline, with the PAT escape hatch for anyone who wants required checks to gate the sync merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The new `Verify Cargo.lock` step resolves against the crates.io index on a cold runner, so a transient blip (observed: `OpenSSL ... unexpected eof` fetching the tauri-codegen index entry) could flake it — unlike the rest of this repo's network steps, which already retry. Wrap it to match that pattern: CARGO_NET_RETRY=10 plus an outer retry that rides out a transient registry error, while still failing FAST on a real drift (the `--locked` refusal, which no retry can fix) with a clear relock hint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The source PKGBUILDs fetch the `v$pkgver` tag archive (generated the instant the tag exists), but splitway-bin fetches the release ASSET tarballs (splitway-$pkgver-linux-*.tar.gz) that packaging.yml attaches later and independently of release.yml's bump. So stamping splitway-bin at tag time could point it at not-yet (or, on a failed upload, never-) uploaded assets while check-pkgver-sync.sh still passes on tag existence alone (Codex P2). - sync-pkgver.sh: skip splitway-bin — only advance the tag-archive source PKGBUILDs at release time. - check-pkgver-sync.sh: exclude splitway-bin from the shared-version / tag check; it is allowed to lag, and tag-existence is necessary-but-not- sufficient for an asset-based fetch. splitway-bin's pkgver + real sha256sums are both knowable only after the assets are published, so they are owned together by the deferred asset-aware AUR-push automation (already referenced in its PKGBUILD's TRUST ASSUMPTION note). splitway-bin is not on AUR yet, so there is no live 404 surface today. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…odex P2) The GUI dlopens libwayland-egl.so.1 (glutin's Wayland EGL platform binds the GL surface to the wl_surface via wl_egl_window_*), but the manifest declared only libwayland-client0 / libwayland-client — and the client lib does NOT pull the EGL platform lib. So on a minimal Wayland host the package installs and passes the `command -v` smoke tests, then fails at GL-context creation. Verified the binary dlopens it: `strings splitway-gui | grep libwayland-egl` -> "Library libwayland-egl.so could not be loaded." + libwayland-egl.so.1. - deb depends: add libwayland-egl1 - rpm requires: add libwayland-egl - Arch is already covered: the `wayland` package ships libwayland-egl.so.1 alongside libwayland-client. - Manifest comments + docs/design dep lists updated with the rationale (separate from libwayland-client0, same pattern as libegl1 vs libgl1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…p (Codex P2) The GUI maintainer scripts run groupadd/groupdel to create and remove the opt-in `splitway` socket group, but neither package declared the provider: groupadd is in `shadow-utils` (Fedora/RHEL) and `passwd` (Debian). On a minimal image lacking it: - rpm: %post fails NON-fatally, so the install "succeeds" but leaves the socket-group drop-in with no group — the daemon then exits on `--socket-group splitway` and users can't opt in (Codex P2). - deb: the postinst `exit 1`s, so the install fails loudly instead. Declare the dep on both for parity (present on normal systems, but now guaranteed on minimal ones): - rpm `[requires]`: add `shadow-utils` - deb `depends`: add `passwd` Manifest comments + the design-doc socket-group note updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # Cargo.lock # packaging/aur/splitway-gui/PKGBUILD # packaging/aur/splitway/PKGBUILD # splitway-daemon/Cargo.toml
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bfc4180a35
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5055580c3d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ROADMAP.md linked [`docs/design/X.md`](design/X.md) — the target dropped the docs/ prefix, so it resolved to a non-existent top-level design/ directory (a dead link). Codex flagged the macos-self-install entry; the same bug was in three sibling links (gui-core-extraction, tauri-mutations, tauri-design-window). Fix all four so the targets match the displayed docs/design/ paths. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eca3d2a08d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… (Codex P2) assert_root_only_path validated the LEXICAL parent of /usr/local/bin, so a symlinked install dir (e.g. /usr/local/bin -> /Users/alice/bin) passed the check while the real target lived in a user-writable tree — letting that user rename/replace the target and have launchd exec their binary as root. The lexical parent being root-only already stops a non-root user from swapping the symlink itself; this closes the other half by resolving the physical path (cd && pwd -P) and verifying the real target's ancestor chain too. A non-symlink layout resolves to the same path, so the extra check is a no-op there. Design doc updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ocker (Codex P2) The macOS PermissionDenied blocker assumed the in-app installer had run (user added to the splitway group, just needs a re-login) and offered only sign-out guidance. A user on the manual/sudo daemon — whose socket is root-only (no --socket-group splitway) — also lands in PermissionDenied, where signing out can never grant access; they were stuck with no path forward. The view-model can't tell the two cases apart, so macOS now offers BOTH: the sign-out guidance and the idempotent Install/repair action (re-running the installer migrates the manual daemon to the group-reachable socket). Body reworded to be accurate for both paths; stage test updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3a27b7cfa9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… corp domains The macOS backend only scoped the corp domains via /etc/resolver, which is insufficient against a VPN client that hijacks the system *default* resolver (corp DNS is the global default, scoped to no utun, so non-corp DNS would also traverse the tunnel). This adds the missing demote, reaching DNS parity with Linux: the corp resolver sees only the configured corp domains; everything else resolves off-tunnel. - Detection is rewritten to be structural and vendor-neutral: it reads the per-service DNS model (a service whose resolver differs from the physical link's is the VPN) instead of filtering `scutil --dns` by a utun. It reads the VPN signal from the VPN's own service, NOT the global default Splitway mutates, so our own demote cannot make detection oscillate. - apply_rules demotes the primary service's DNS to an off-tunnel fallback (the physical interface's DHCP resolver, or a configured `fallback_dns` override), snapshotting the prior value to disk first; revert_rules restores it on every exit path. Transactional across scope+demote (a demote failure rolls back the scope); the on-disk snapshot survives an unclean exit. - The state machine is decoupled from `vpn_name` on macOS via the existing reverts_globally() seam; Linux stays interface-keyed and unchanged. - New seams (ScutilRunner, SnapshotStore) make the demote/restore wiring unit-testable without touching the live system; detection parsing + decision are pure. Tests use synthetic fixtures only. DNS only — no IP-route manipulation (the client already splits IP; same boundary as Linux). The GUI interface-picker becomes a benign no-op on macOS (removal is a later GUI phase). Live packet/reconnect/revert acceptance is machine-bound and verified separately. See docs/design/macos-dns-privacy.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…eholder The scrub guard's internal-TLD check (check #1 in scripts/check-no-leaked-infra.sh) flags any label followed by `.corp` (e.g. `jira.corp`), so the placeholder `jira.corp.example.com` tripped it in CI even though it is synthetic. Plain `corp.example.com` is fine (corp is the leading label, no preceding label+dot). Swap the second corp test domain to `jira.example.com`, which is equally synthetic and passes the guard. Test/doc-only; no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…a primary-service change (Codex P1/P2)
Two correctness fixes from PR review of the macOS demote:
P1 — the demote rebuilt the primary service's DNS dict with `d.init` +
`ServerAddresses` only, dropping `InterfaceName`. The detector identified the
physical service by `InterfaceName == PrimaryInterface`, so the next detection
round after our own demote could fail to find the physical service (or pick the
VPN service if it also reported the primary interface), inverting corp/fallback
or undoing the demote — the exact oscillation the per-service model exists to
prevent. Now:
- the demote re-adds the captured `InterfaceName` (`build_set_dns_script` takes
it; the snapshot stores it);
- detection anchors the physical service on the authoritative `PrimaryService`
id (falling back to the interface name only when the id is unknown) and
excludes it by id from the VPN-service search, so a VPN service that also
reports the primary interface can never be mistaken for physical.
P2 — when the primary network service changes while the VPN stays up (e.g. Wi-Fi
→ Ethernet) and a snapshot already existed, `demote` skipped snapshotting the new
key but still overwrote it; a later `restore` only replayed the old snapshot,
leaving the new primary pinned to the fallback. Now a demote whose primary key
differs from the snapshot restores the old service first, then snapshots and
demotes the new one — exactly one service demoted at a time, none stranded.
Snapshot format gains a self-describing `<tag>\t<value>` form (key/iface/server)
so the optional interface line is unambiguous. New unit tests cover interface
preservation, service-id anchoring (incl. a VPN sharing the primary interface),
the interface-name fallback, and the primary-service-change restore-then-snapshot
path. Tests/synthetic-fixtures only; four Cargo checks + scrub green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+ DNS-dict preservation (Codex P1/P1/P2) Three correctness fixes from PR review of the macOS DNS privacy work. P1 — `scutil list State:/Network/Service/.*/DNS` does NOT print bare keys: each match is a prefixed row (`subKey [n] = State:/.../DNS`). `read_dns_model` passed the whole row to `show`, so every per-service read returned `No such key`, the service model stayed empty, and `current_vpn_state()` reported `Down` even with the VPN DNS present — the detector was effectively inert. Parse the key out of each row (new `parse_list_key`, unit-tested) and filter to `State:/Network/Service/` keys before showing them. P1 — `decide` treated any non-physical service whose DNS differs from the physical resolver as a VPN. On a multi-homed Mac (Ethernet primary while Wi-Fi stays associated with its own DHCP DNS) the secondary service differs, so with no VPN running the daemon would apply the corp domains to the secondary resolver and report VPN up. The differing service must now also be the default-resolver hijacker (`is_default_resolver_hijacker`): it rides the primary interface's default route, runs on a tunnel pseudo-interface (utun/ppp/ipsec/tun/tap), or is unscoped (no interface). A service bound to a distinct hardware interface (a second Ethernet/Wi-Fi/cellular link, a VM/Thunderbolt bridge) is a parallel network and is excluded. The hijacker filter precedes the difference check, so a real VPN is still found when a secondary network is also present, regardless of order. Matching is by interface kind (the stable BSD driver prefix), never a vendor string, so detection stays vendor-neutral and does not read the mutable global default (no oscillation under our own demote). P2 — `build_set_dns_script` used `d.init`, starting from an empty dictionary, so the `set` replaced the whole service DNS dict and dropped DHCP/VPN-provided fields (`SearchDomains`, `DomainName`, `SupplementalMatchDomains`, `SearchOrder`) both while demoted and on restore, breaking local search-domain behaviour until the next network reconfiguration. The script now `get`s the service's current dict into the working buffer before overriding only `ServerAddresses`, so unmanaged fields ride through; `InterfaceName` is preserved by `get` (still re-added as a belt-and-suspenders). Restore of a service that had no prior `ServerAddresses` now `d.remove`s only our override after the `get` instead of removing the whole key, so the rest of the dict survives. New/updated unit tests cover list-row parsing, the hijacker predicate (incl. a secondary physical network, a tunnel VPN alongside one, and an unscoped hijacker), and the get-before-override / remove-only-our-servers script shapes. Tests/synthetic-fixtures only (RFC 5737 placeholders). Verified: fmt, Linux build/clippy/test, and x86_64-apple-darwin cargo check + clippy -D warnings (--all-targets) all green; macOS test execution runs on CI. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lush cache on restore-only revert (Codex P2/P2)
Two more P2 findings from the re-review of the macOS demote.
P2 — the same-service snapshot could go stale. When the same primary service
stayed active but its DHCP resolver changed (a new lease) and the watcher
re-emitted Up, `demote` kept the original snapshot, so a later `restore` wrote the
pre-change resolver back over the current DHCP one — disconnecting the VPN could
leave macOS on the previous network's DNS. The same-service branch now reads the
service's live DNS: if it differs from our fallback (the real resolver changed
under us) it refreshes the snapshot to the current resolver; if it still shows our
fallback it keeps the original (so we never snapshot our own fallback as the
"prior").
P2 — a restore-only revert skipped the cache flush. `revert_rules` flushed only
when resolver files were removed, but a revert also restores any demoted default.
If the files were already gone (e.g. a prior partial revert removed them before a
failed `scutil` restore, then a retry succeeds) the default DNS changed while
`removed == 0`, so the stale cache kept serving demoted/default answers.
`demote::restore` now reports whether it restored a snapshot, `revert_with`
returns `RevertOutcome { removed, restored_default }`, and `revert_rules` flushes
when either changed.
New/updated unit tests cover same-service keep-vs-refresh and a revert that
reports a restore with zero files removed. Tests/synthetic-fixtures only (RFC 5737
placeholders). Verified: fmt and x86_64-apple-darwin cargo check + clippy
-D warnings (--all-targets) green; macOS test execution runs on CI.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…back + watch Global/IPv4 (Codex P2/P2) Two more P2 findings from the re-review. P2 — a demote-failure rollback during a re-apply wiped the working scope. `apply_with`'s rollback removed every managed `/etc/resolver` file, so if a re-apply's scope write succeeded but the demote then failed (a transient `scutil` hiccup), the previously-working split DNS — for unchanged AND dropped domains — was deleted until a later retry, a half-configured state. `apply_to_dir` now returns the journal of every change it made (files overwritten, created, and — newly journaled — pruned), and `apply_with` rolls back exactly that on a demote failure, restoring the prior scope file-for-file. `remove_managed` gained an optional journal parameter so a prune is undoable along with the writes. P2 — the watcher missed the Global IPv4 key that detection reads. `current_vpn_state` reads `State:/Network/Global/IPv4` for `PrimaryService`/ `PrimaryInterface`, but the watch only armed Global DNS plus the per-service/ interface patterns. macOS can switch the primary route between already-configured services by updating only `Global/IPv4` without touching any watched DNS key, leaving the daemon asleep with a stale primary-service decision and demote target. Add `State:/Network/Global/IPv4` to the watched keys. New test: a re-apply whose demote fails restores the overwritten file's prior content and re-creates the pruned file. Tests/synthetic-fixtures only (RFC 5737 placeholders). Verified: fmt and x86_64-apple-darwin cargo check + clippy -D warnings (--all-targets) green; macOS test execution runs on CI. `watch.rs` is FFI glue (intentionally not unit-tested). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tity + reconcile an orphaned demote at startup (Codex P2/P2) Two more P2 findings from the re-review. P2 — the secondary-network filter read `InterfaceName` from the wrong key. The per-service DNS entity does not reliably carry `InterfaceName` (its schema is the DNS fields), so a secondary Wi-Fi/Ethernet service often had none in its DNS dict → `read_dns_model` stored `None` → `is_default_resolver_hijacker(None)` treated it as an unscoped hijacker → a multi-homed Mac with no VPN could be detected VPN-up and point the corp domains at the secondary DHCP resolver. Read the interface from the service's `State:/Network/Service/<id>/IPv4` (then IPv6) entity — the reliable binding — falling back to the DNS dict only if neither names one. P2 — an unclean exit could strand a demote. The on-disk demote snapshot survives a SIGKILL, but a suppressed initial `Down` sample (`last == None`) meant that if the daemon was killed while demoted and the VPN went down before restart, nothing restored the default resolver or cleared the snapshot — it stayed pinned to the off-tunnel fallback until the next VPN up→down cycle. `run_state` now reconciles orphaned persisted state once at startup, before arming the watch, for global-revert backends (macOS): an idempotent revert that clears any leftover snapshot and restores the prior default, after which the watch's initial `Up` (if any) re-applies. Per-interface backends (Linux) keep no such cross-restart state and skip it; `applied` stays `None` (this only clears a prior process's state). New daemon-core tests (run on Linux): the startup cleanup reverts for a global-revert backend and is a no-op for a per-interface one. The detector interface read is I/O glue (untested, like the rest of `read_dns_model`). Verified: fmt, Linux clippy + daemon tests (174 pass), and x86_64-apple-darwin clippy -D warnings (--all-targets). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…change keeps the real prior + retry a failed startup demote cleanup (Codex P2/P2) Two more P2 findings, both follow-ups on the previous re-review fixes. P2 — a `fallback_dns` change while demoted could pin the wrong default. The same-service snapshot refresh compared the service's live DNS against the NEW `fallback`. When `fallback_dns` changed (F1→F2) while the VPN was active, the service still showed our PREVIOUS fallback F1; since F1 != F2 the refresh captured F1 as `prior_servers`, so a later restore wrote our old fallback instead of the original DHCP resolver. `DemoteSnapshot` now records `installed_fallback` (the fallback we actually wrote, updated only after a successful write); the refresh compares the live DNS against that, so a previous fallback we installed is no longer mistaken for a DHCP update. A genuine DHCP change is still adopted. P2 — a failed startup cleanup was never retried. The startup reconcile of orphaned demote state only logged on failure; with `applied == None`, later reconciles take `revert()`'s no-op path, so a transient failure left the machine pinned to the fallback until a full VPN up→down cycle. A `pending_global_cleanup` flag now records the failure, and every `revert()` retries the global cleanup until it succeeds (cleared on success, or once an apply establishes our own state). New tests: a `fallback_dns` change keeps the original prior (macOS, darwin cross-check) and a failed startup cleanup is retried on the next reconcile (daemon core, runs on Linux). Verified: fmt, Linux clippy + daemon tests (175 pass), and x86_64-apple-darwin clippy -D warnings (--all-targets). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…al prior when the installed-fallback record was lost (Codex P2/P2) Two reversibility gaps in the demote path, both around the installed_fallback bookkeeping added in 4a8a1b9: - Removing (or changing) `fallback_dns` while the VPN stays up did not re-demote to the real DHCP resolver. After a demote the detector reads the demoted value back as `demote_target`, so `effective_demote_target`'s fallthrough keeps returning the override and `desired()` stays equal to the applied target. Treat a `fallback_dns` change as re-arm-worthy in `watch_settings_changed`: the re-arm reverts first (restoring the real prior DHCP resolver from the demote snapshot) then re-samples, so the next demote targets the correct resolver. Harmless on Linux (ignores fallback_dns). - A same-service re-demote could capture our own fallback as `prior_servers` when the post-write `installed_fallback` record had been lost (e.g. a transient /var/run write error after the `set` succeeded): the live servers then differ from the empty recorded fallback and look like a DHCP update. Also treat the live servers as ours when they equal the `fallback` about to be (re)installed, so a retry is recognised regardless of whether the installed_fallback record persisted. Adds a regression test for each. Verified with `cargo fmt` and `cargo clippy --target x86_64-apple-darwin -p splitway-daemon --all-targets -- -D warnings`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019HLLqoCWZb1fAqYPsw1hug
Follow-up on the startup orphaned-state retry: `pending_global_cleanup`'s retry was only wired through `revert()`, which `shutdown()` skipped when `applied` is `None`. So if the macOS startup cleanup failed transiently and the daemon was stopped before any later reconcile/Down/disable called `revert()`, shutdown reported the system clean while a stale demote snapshot / default DNS could remain. `shutdown()` now calls `revert()` when `applied` is set OR a global cleanup is pending — `revert()` runs the pending cleanup and returns `Err` (-> unclean) if it still fails, so shutdown no longer falsely reports clean. New daemon-core tests (run on Linux): shutdown runs the pending cleanup when nothing is applied, and reports unclean when it still fails. Verified: fmt, Linux clippy + daemon tests (177 pass), and x86_64-apple-darwin clippy -D warnings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…the service IPv4/IPv6 keys (Codex P2/P2) - read_dns_model now propagates a `scutil show` error instead of skipping the service. A missing/vanished key is reported as `No such key` on stdout with exit 0 (→ Ok, then empty servers), so an Err here is a genuine command failure (spawn error or non-zero exit). Skipping it silently dropped a live service from the model — e.g. the VPN service read after a successful `list` — letting `decide` conclude "down" and revert the rules. Propagating makes the watcher keep the last known state on a transient failure. - The SCDynamicStore watch now also subscribes to `State:/Network/Service/.*/IPv4` and `/IPv6`. The detector reads each service's `InterfaceName` from those entities (`read_service_interface`) to classify physical vs. an unscoped default hijacker; a service gaining/losing its interface binding without a DNS or global change must re-trigger detection, else it stays classified from stale data (an unknown interface misread as a hijacker → false "VPN up"). Verified with `cargo fmt` and `cargo clippy --target x86_64-apple-darwin -p splitway-daemon --all-targets -- -D warnings`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019HLLqoCWZb1fAqYPsw1hug
…+ surface a pending demote-cleanup in status (Codex P2/P2) Two more P2s from the re-review. P2 (parser) — when `PrimaryService` is absent, `decide` anchored the physical service on the FIRST service matching the primary interface. A VPN service can also report that interface, so if `scutil list` returned it first, the physical service became the corp resolver and the VPN service became the DHCP resolver — inverting corp/fallback (apply corp domains to the off-tunnel DNS, demote the default to the corp DNS, a leak). The no-`PrimaryService` fallback now anchors only when EXACTLY ONE DNS service reports the primary interface; more than one is ambiguous -> conservative Down. P2 (state) — `routing_state()` ignored the new `pending_global_cleanup` flag, so a restart that left an orphaned macOS demote pending reported `VpnDown`/`Disabled`/ `NoDomains` while the default DNS could still be demoted. The flag now joins `needs_resync`/`orphaned` in the out-of-sync check, so status reads `ApplyFailed` until the cleanup succeeds. Tests: an ambiguous primary-interface fallback is Down while the unambiguous one (VPN on a tunnel) is still Up (parser, darwin cross-check); status reads `ApplyFailed` while a global cleanup is pending (daemon core, runs on Linux). Verified: fmt, Linux clippy + daemon tests (178 pass), and x86_64-apple-darwin clippy -D warnings. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…util demote (P2) The configured `fallback_dns` override is folded into the demote-target and, on macOS, appended verbatim into the root `scutil` demote script. A non-IP entry (a typo, a hostname, whitespace, or a newline) would malform that script rather than fail cleanly — sitting against the "a failed apply must never leave the system half-configured" bar. - Add `splitway_shared::config::is_ip_literal` (strict `IpAddr` parse — rejects surrounding/embedded whitespace and newlines, so it doubles as an injection guard). Pure and unit-tested. - `effective_demote_target` now accepts the override only when every entry is an IP literal; otherwise it logs and falls back to the detector's demote-target (the physical DHCP resolver), so a bad config degrades to auto-detection rather than a broken/mangled demote. Adds tests for the helper and for the override-rejection path. Verified with `cargo fmt` and `cargo clippy --target x86_64-apple-darwin -p splitway-daemon -p splitway-shared --all-targets -- -D warnings`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_019HLLqoCWZb1fAqYPsw1hug
feat(daemon): macOS DNS privacy — demote the hijacked default + scope corp domains (Phase 7d-4)
stslex
left a comment
There was a problem hiding this comment.
Full review of the dev → master release diff (34 files, +4,610/−325: macOS DNS-privacy demote/detector rework, macOS self-install, GUI blockers). Findings were each independently verified against the head checkout (a58c2d4) before posting; candidates that did not survive adversarial verification (e.g. a suspected GUI hostPlatform race and a suspected scoped-VPN detection regression — both refuted by sequencing/is_tunnel_interface handling respectively) are not reported. Existing Codex threads on bootstrap.sh hardening were not re-litigated.
Inline comments, most severe first:
demote.rssnapshot-refresh branch is dead in the default (nofallback_dns) config — a DHCP change while demoted later restores a stale resolver (P1).demote.rschanged-primary/restore re-creates a phantomState:DNS key for a torn-down service — can produce a falseUppointing corp domains at a dead resolver (P1).run_scutilmisses scutil's exit-0 stdout error mode — a failedsetis recorded as success;restore()then clears the snapshot after an unverified write (P2).- Post-
setbookkeeping failure indemote()makesapply_withroll back the resolver scope but not the demote — a half-configured state persisting until the next event (P2). - Successful applied-path revert never clears
pending_global_cleanup— status stuck on "out of sync" (P3, self-healing). - Design doc overstates the
vpn_namedecoupling — an emptyvpn_namestill disables macOS detection entirely (P3). - Duplicated helpers in
demote.rs(same_server_set,run_scutil) + a hand-rolled snapshot format whose no-serde rationale is void (P3, cleanup). - Two non-placeholder IPs (
1.1.1.1,1.2.3.4) against the CLAUDE.md redaction rule (P3, conventions).
Minor notes (no inline thread):
backend.rsrevert_with: ademote::restorefailure propagates afterremove_managedalready deleted resolver files, skippingflush_dns_cache()for that cycle; bounded (the retry flushes) but the old code had no fallible step between removal and flush.render.ts: the macOSTransientErrorblocker renders with neithercommandnoraction— a dead-end panel if the daemon wedges (NotRunning/PermissionDenied both offer install/repair); its title/body also duplicatenotRunningBlocker's Linux copy verbatim.demote()re-loads the snapshot store from disk three times per apply; threading the value through a local would drop two reads.- The launchd label /
/usr/local/binpath /splitwaygroup are independently hard-coded across the plist,bootstrap.sh,build-macos-app.sh, andtauri.bundle.macos.json, held in sync only by comments — worth a single source of truth before these drift.
The startup global cleanup (revert-on-boot while a VPN is up) was examined and accepted as the documented, bounded trade-off — only note is that the crash-restart-while-up window is acknowledged in a code comment but not in the design doc.
Generated by Claude Code
… scutil seam (review P1/P1/P2/P2/P3) Five coupled review findings on the phase-7d demote (backend/macos): - P1 stale prior: the same-service refresh keyed on `current != fallback`, which is always false in the DEFAULT config (fallback == demote_target == the physical resolver), so a DHCP change while demoted never refreshed the snapshot and a later restore pinned a stale/dead resolver. The refresh now keys on the previously-recorded `installed_fallback` (using the about-to- install fallback only as a backstop when that record is empty), and rejects a `current` equal to the corp DNS (a hijacker rewriting the physical service between samples) -- corp_dns is threaded in from apply_with's scope servers. - P1 phantom restore: restoring a departed service (configd tore its keys down on a Wi-Fi->Ethernet switch) re-created a phantom State: DNS key the detector read as an unscoped hijacker -> a false "VPN up" at a dead resolver. restore_snapshot now skips the write when the key is gone (new ScutilRunner::key_exists), covering both restore() and the primary-changed branch. - P2 scutil stdout errors: scutil script mode reports a failed `set` on stdout with exit 0, so a failed demote was recorded as success. RealScutil::run_script now scans stdout (pure, tested `scutil_set_error`): success is silent, only the benign `No such key` from the first-time `get` is allowed; anything else fails. - P2 inverse half-state: the post-write installed_fallback save propagated Err after the demote's set already took effect, so apply_with rolled back only the /etc/resolver scope -> default demoted but corp domains unscoped (a leak). That save is now best-effort; the demote stays applied and the dedup hint is re-derived on the next re-demote (the same-service branch already tolerates a lost record). - P3 cleanup: share parser::same_set and detector::scutil_script (drop the demote copies, via pub(crate) re-exports) and serde-JSON the snapshot (drop the hand-rolled tab serializer). Also redacts the demote garbage-test fixture IP to RFC 5737. Tests added for each new behaviour; the macOS-gated code type-checks under cargo clippy --target x86_64-apple-darwin and its tests run on CI's macos runner. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sn5f9GAgWkv6RSZVP5LDEv
…h revert (review P3) An apply-fail -> revert-success sequence on a global-revert backend (macOS) left pending_global_cleanup set, so routing_state() reported ApplyFailed on a system the revert had actually cleaned, until some later event ran the applied==None branch. The applied-path revert-success arm now clears the flag when the backend reverts globally (its revert_rules restores the demote snapshot the flag tracks), mirroring the existing apply-success clear. Test added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sn5f9GAgWkv6RSZVP5LDEv
…sables detection (review P3) The "no longer depends on vpn_name" note was overstated: the value is ignored, but vpn_name is still load-bearing as arm_watch's arming switch (an empty name -> DetectorHealth::Inactive, no watch), and a fresh macOS self-install boots there. Reworded to state the residual dependency and the picker-removal follow-up caveat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sn5f9GAgWkv6RSZVP5LDEv
…s (review P3) config/mod.rs doc used a real resolver; completed the redaction across the phase-7d macOS surface (detector/macos/state.rs, backend/macos/resolver.rs) so every fixture uses an RFC 5737 placeholder, per CLAUDE.md's redaction policy. (The demote.rs garbage-test fixture was redacted in the demote commit.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sn5f9GAgWkv6RSZVP5LDEv
assert_root_only_path checked owner + mode bits but not macOS ACLs, which are invisible to the mode bits and survive chmod -- so a root:wheel 0755 ancestor with an ACL granting another user write/delete still lets them swap the root-run daemon directory. New acl_allows_nonroot_write parses `ls -lde` (splitting on the allow/deny keyword so a spaced principal name can't slip through) and flags any non-root/wheel `allow` entry carrying a write-class right, fail-closed. It now guards every ancestor; install strips BIN_DIR's own ACL with `chmod -N` and re-verifies. bash -n passes; the parser is checked against synthetic ls -lde samples. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Sn5f9GAgWkv6RSZVP5LDEv
No description provided.