feat(auth): direct-connect via JWT in Fabric Connect mode (+ fix #1333)#1400
feat(auth): direct-connect via JWT in Fabric Connect mode (+ fix #1333)#1400dawsontoth wants to merge 6 commits into
Conversation
Instead of routing every operation through the Fabric Connect proxy, mint a JWT via the proxy (the `create_authentication_tokens` operation, as used by `harper login`) and use it as a Bearer token to talk to the instance directly. The JWT is held in memory only — never persisted — and re-fetched via the proxy after a reload. When direct connect isn't reachable (e.g. CORS/mixed-content from cloud Studio to a local instance), we drop the token and fall back to routing everything through the proxy as before. - getInstanceClient: route via Bearer token when one is held; precedence is basic-auth / forceFabricConnect, then token-direct, then proxy, then cookie. - authStore: in-memory token store + establishFabricConnectAuth (proxy -> JWT -> direct, with proxy fallback and in-flight de-duplication). Cleared on logout / sign-out / flag-off. - instanceLayoutRoute: re-establish direct connect on entry (covers reload and first visit) while preserving the existing "wait for app auth / redirect to sign-in" behavior. - useInstanceLoginMutation: the Fabric Connect fallback now goes through the same establish path. Refs #1398 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces an in-memory Fabric Connect JWT authentication mechanism to allow direct Bearer token connections to instances, falling back to proxy routing if the direct connection is unreachable. It updates getInstanceClient to handle the token, integrates automatic connection establishment in instanceLayoutRoute, and adds corresponding unit tests. Feedback highlights two improvement opportunities: first, validating that the direct URL is defined and is not a proxy URL before attempting a direct connection to prevent redundant failures and security risks; second, tracking connection attempts in the route loader to avoid duplicate, expensive network requests when an instance is unreachable.
The cluster sign-in route guard decided whether to redirect away from the sign-in form by reading the `context.authentication` router-context snapshot. That snapshot lags the live authStore (authStore notifies its listeners asynchronously), so right after "Direct Sign In" cleared the connection it still showed the old user and bounced the user off the form and back into Fabric Connect. - Read live authStore state in both sign-in guards instead of the lagged context snapshot, and extract them as testable named functions. - On the sign-in page, drop the connected user (not just the Fabric Connect flag) so an abandoned sign-in isn't mislabeled as "Direct Connect". Fixes #1333 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes found by an adversarial review pass over the #1398/#1333 work: - HIGH: reset-password / Finish Setup broke when login fell back to Fabric Connect. The fallback no longer returned a working instanceClient, so the reset flow kept POSTing alter/add/delete user on a stale cookie-mode client (401). The Fabric Connect branch now hands back a client rebuilt from the established auth (Bearer-direct or proxy). - MEDIUM: the sign-in page cleared the connected user inside a render-phase useMemo keyed on the 10s-polled cluster object, so a poll tick could wipe a just-signed-in user. Moved the user reset into a once-per-id effect; the (idempotent) Fabric Connect flag clear stays in render so the client builds direct. - SECURITY (low): never send the instance Bearer JWT to the central-manager origin — only attempt direct connect when a concrete direct operations URL is known, otherwise go straight to the proxy. And stop logging the raw axios error (it carries the token in error.config.headers). - The establish direct probe now forces the operation token so a stale basic-auth entry can't shadow it (new getInstanceClient forceOperationToken). - The instance sign-in guard keys off the instance's own state, so the parent cluster's Fabric Connect flag no longer suppresses a redirect for a directly-connected instance. - The route guard no longer attempts establishment twice in one beforeLoad pass. Tests: cover logout token cleanup, the establish proxy-fallback / no-direct-URL / both-fail-cleanup paths, getInstanceClient disableFabricConnect + cluster proxy path + forceOperationToken, the instance-guard cluster-flag case, and the route-guard orchestration (incl. the no-double-establish guard). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Hardening from PR review: the direct-connect probe already only ran when an operations URL was provided, but a proxy-looking URL passed as operationsUrl would still have been used. Add isDirectOperationsUrl() so a URL containing the central-manager proxy paths (/HDBInstance/ or /Cluster/) can never receive the instance Bearer JWT — it falls straight through to proxy mode instead. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Previously an expired direct-connect operation token (Harper default ~1 day) would 401 every request until a full page reload re-minted it. Now: - create_authentication_tokens captures the refresh token too, kept in memory alongside the operation token. - A 401 from a direct Bearer request triggers recovery: exchange the refresh token for a new operation token directly at the instance (refresh_operation_ token, no proxy round-trip), falling back to a fresh proxy mint if the refresh token is rejected. The request is then replayed once with the new token. - Recovery is deduplicated across concurrent 401s and capped at a single retry per request to avoid loops. Closes the deferred item from the PR review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The instance-layout guard checked "am I already authenticated?" against the context.authentication router snapshot, which lags the live authStore (listeners notify asynchronously). Right after a direct sign-in — which sets the connection synchronously — the snapshot still showed no user, so a manager (update perm) fell through to the auto-Fabric-Connect branch, clobbering the just-established direct session. The user then saw "Fabric Connect" instead of "Direct Connect" back on the cluster list. Read live authStore state (authStore.getConnectionById) for the entity, matching the sign-in guards fixed for #1333. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
What & why
#1398 — direct-connect via JWT. Instead of routing every operation through the Fabric Connect proxy, Studio now:
create_authentication_tokens, the opharper loginuses).Authorization: Bearer <jwt>.refresh_operation_token, no proxy round-trip), falling back to a fresh proxy mint, then replays the request once.#1333 — Direct Sign In no longer falls back to Fabric Connect too aggressively. The cluster sign-in guard read the
context.authenticationrouter snapshot, which lags the live authStore (listeners notify asynchronously); right after "Direct Sign In" cleared the connection, the snapshot still showed the old user → bounced off the form → auto-Fabric-Connected. Both sign-in guards now read live authStore state, and the sign-in page resets the connected user (not just the flag) so an abandoned sign-in isn't mislabeled as "Direct Connect".Adversarial review
Ran a multi-agent adversarial review (6 lenses → skeptic verification); 15 confirmed findings, all addressed. Highlights:
useMemokeyed on the 10s-polled cluster object (could wipe a just-signed-in user) — moved to a once-per-id effect.isDirectOperationsUrl); stop logging the raw axios error (carried the token inerror.config.headers).beforeLoadpass.Both automated (Gemini) review threads were addressed and resolved.
Tests
~40 new tests: routing precedence (Bearer/proxy/basic/cookie/disableFabricConnect/cluster-proxy/forceOperationToken), establish direct/proxy-fallback/no-direct-URL/proxy-URL-guard/both-fail-cleanup/in-flight-dedupe, token recovery (refresh / proxy re-mint / both-fail / non-direct / dedupe) and the 401 interceptor (replay-once / no-token / non-401 / loop-guard), logout token cleanup, sign-in guards (live-state race + cluster-flag case + redirect target), and route-guard orchestration. Full suite green (1035 passing).
Verification & risk
Verified via typecheck + unit tests. Not browser-verified (needs a live cluster + Fabric Connect session). Main runtime assumption to confirm against a real cluster: the central-manager proxy forwards
create_authentication_tokens/refresh_operation_tokenand returns the tokens. If an op allowlist excludes them, step 1 fails and we silently fall back to proxy mode (functional, just not the new direct path).Fixes #1333
Refs #1398
🤖 Generated with Claude Code