Skip to content

feat(auth): direct-connect via JWT in Fabric Connect mode (+ fix #1333)#1400

Open
dawsontoth wants to merge 6 commits into
stagefrom
claude/suspicious-joliot-b97cd5
Open

feat(auth): direct-connect via JWT in Fabric Connect mode (+ fix #1333)#1400
dawsontoth wants to merge 6 commits into
stagefrom
claude/suspicious-joliot-b97cd5

Conversation

@dawsontoth

@dawsontoth dawsontoth commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

What & why

#1398 — direct-connect via JWT. Instead of routing every operation through the Fabric Connect proxy, Studio now:

  1. Mints a JWT (+ refresh token) via the proxy (create_authentication_tokens, the op harper login uses).
  2. Talks to the instance directly with Authorization: Bearer <jwt>.
  3. Holds the tokens in memory only — re-fetched via the proxy after a reload.
  4. Falls back to full proxy mode when direct connect isn't reachable (CORS / mixed-content / Safari-cloud-to-local). Because the JWT is a header (not a cookie), this also sidesteps the Safari cross-localhost cookie limitation.
  5. Recovers from mid-session token expiry: a 401 on a direct request exchanges the refresh token for a new operation token directly at the instance (refresh_operation_token, no proxy round-trip), falling back to a fresh proxy mint, then replays the request once.

Harper 5.1.14 has no token→cookie login, so the issue's "drop the JWT, keep only an httpOnly cookie" ideal would need a coordinated Harper server change. This implements the in-memory-JWT variant, structured so a future cookie swap stays localized.

#1333 — Direct Sign In no longer falls back to Fabric Connect too aggressively. The cluster sign-in guard read the context.authentication router snapshot, which lags the live authStore (listeners notify asynchronously); right after "Direct Sign In" cleared the connection, the snapshot still showed the old user → bounced off the form → auto-Fabric-Connected. Both sign-in guards now read live authStore state, and the sign-in page resets the connected user (not just the flag) so an abandoned sign-in isn't mislabeled as "Direct Connect".

Adversarial review

Ran a multi-agent adversarial review (6 lenses → skeptic verification); 15 confirmed findings, all addressed. Highlights:

  • HIGH: reset-password / Finish Setup broke in the Fabric Connect fallback (stale client) — the fallback now returns a client rebuilt from the established auth.
  • MED: sign-in page cleared the user inside a render-phase useMemo keyed on the 10s-polled cluster object (could wipe a just-signed-in user) — moved to a once-per-id effect.
  • Security (low): never send the Bearer JWT to the central-manager origin (only go direct with a concrete direct URL, and reject proxy-looking URLs via isDirectOperationsUrl); stop logging the raw axios error (carried the token in error.config.headers).
  • The establish probe forces the token so a stale basic-auth entry can't shadow it; the instance sign-in guard keys off the instance's own state; no double-establish in one beforeLoad pass.

Both automated (Gemini) review threads were addressed and resolved.

Tests

~40 new tests: routing precedence (Bearer/proxy/basic/cookie/disableFabricConnect/cluster-proxy/forceOperationToken), establish direct/proxy-fallback/no-direct-URL/proxy-URL-guard/both-fail-cleanup/in-flight-dedupe, token recovery (refresh / proxy re-mint / both-fail / non-direct / dedupe) and the 401 interceptor (replay-once / no-token / non-401 / loop-guard), logout token cleanup, sign-in guards (live-state race + cluster-flag case + redirect target), and route-guard orchestration. Full suite green (1035 passing).

Verification & risk

Verified via typecheck + unit tests. Not browser-verified (needs a live cluster + Fabric Connect session). Main runtime assumption to confirm against a real cluster: the central-manager proxy forwards create_authentication_tokens / refresh_operation_token and returns the tokens. If an op allowlist excludes them, step 1 fails and we silently fall back to proxy mode (functional, just not the new direct path).

Fixes #1333
Refs #1398

🤖 Generated with Claude Code

Instead of routing every operation through the Fabric Connect proxy, mint a
JWT via the proxy (the `create_authentication_tokens` operation, as used by
`harper login`) and use it as a Bearer token to talk to the instance directly.
The JWT is held in memory only — never persisted — and re-fetched via the proxy
after a reload. When direct connect isn't reachable (e.g. CORS/mixed-content
from cloud Studio to a local instance), we drop the token and fall back to
routing everything through the proxy as before.

- getInstanceClient: route via Bearer token when one is held; precedence is
  basic-auth / forceFabricConnect, then token-direct, then proxy, then cookie.
- authStore: in-memory token store + establishFabricConnectAuth (proxy -> JWT
  -> direct, with proxy fallback and in-flight de-duplication). Cleared on
  logout / sign-out / flag-off.
- instanceLayoutRoute: re-establish direct connect on entry (covers reload and
  first visit) while preserving the existing "wait for app auth / redirect to
  sign-in" behavior.
- useInstanceLoginMutation: the Fabric Connect fallback now goes through the
  same establish path.

Refs #1398

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an in-memory Fabric Connect JWT authentication mechanism to allow direct Bearer token connections to instances, falling back to proxy routing if the direct connection is unreachable. It updates getInstanceClient to handle the token, integrates automatic connection establishment in instanceLayoutRoute, and adds corresponding unit tests. Feedback highlights two improvement opportunities: first, validating that the direct URL is defined and is not a proxy URL before attempting a direct connection to prevent redundant failures and security risks; second, tracking connection attempts in the route loader to avoid duplicate, expensive network requests when an instance is unreachable.

Comment thread src/features/auth/store/authStore.ts Outdated
Comment thread src/features/instance/instanceLayoutRoute.ts
dawsontoth and others added 2 commits June 30, 2026 17:22
The cluster sign-in route guard decided whether to redirect away from the
sign-in form by reading the `context.authentication` router-context snapshot.
That snapshot lags the live authStore (authStore notifies its listeners
asynchronously), so right after "Direct Sign In" cleared the connection it
still showed the old user and bounced the user off the form and back into
Fabric Connect.

- Read live authStore state in both sign-in guards instead of the lagged
  context snapshot, and extract them as testable named functions.
- On the sign-in page, drop the connected user (not just the Fabric Connect
  flag) so an abandoned sign-in isn't mislabeled as "Direct Connect".

Fixes #1333

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes found by an adversarial review pass over the #1398/#1333 work:

- HIGH: reset-password / Finish Setup broke when login fell back to Fabric
  Connect. The fallback no longer returned a working instanceClient, so the
  reset flow kept POSTing alter/add/delete user on a stale cookie-mode client
  (401). The Fabric Connect branch now hands back a client rebuilt from the
  established auth (Bearer-direct or proxy).
- MEDIUM: the sign-in page cleared the connected user inside a render-phase
  useMemo keyed on the 10s-polled cluster object, so a poll tick could wipe a
  just-signed-in user. Moved the user reset into a once-per-id effect; the
  (idempotent) Fabric Connect flag clear stays in render so the client builds
  direct.
- SECURITY (low): never send the instance Bearer JWT to the central-manager
  origin — only attempt direct connect when a concrete direct operations URL is
  known, otherwise go straight to the proxy. And stop logging the raw axios
  error (it carries the token in error.config.headers).
- The establish direct probe now forces the operation token so a stale
  basic-auth entry can't shadow it (new getInstanceClient forceOperationToken).
- The instance sign-in guard keys off the instance's own state, so the parent
  cluster's Fabric Connect flag no longer suppresses a redirect for a
  directly-connected instance.
- The route guard no longer attempts establishment twice in one beforeLoad pass.

Tests: cover logout token cleanup, the establish proxy-fallback / no-direct-URL
/ both-fail-cleanup paths, getInstanceClient disableFabricConnect + cluster
proxy path + forceOperationToken, the instance-guard cluster-flag case, and the
route-guard orchestration (incl. the no-double-establish guard).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dawsontoth dawsontoth changed the title feat(auth): direct-connect via JWT in Fabric Connect mode feat(auth): direct-connect via JWT in Fabric Connect mode (+ fix #1333) Jun 30, 2026
dawsontoth and others added 2 commits June 30, 2026 17:43
Hardening from PR review: the direct-connect probe already only ran when an
operations URL was provided, but a proxy-looking URL passed as operationsUrl
would still have been used. Add isDirectOperationsUrl() so a URL containing the
central-manager proxy paths (/HDBInstance/ or /Cluster/) can never receive the
instance Bearer JWT — it falls straight through to proxy mode instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Previously an expired direct-connect operation token (Harper default ~1 day)
would 401 every request until a full page reload re-minted it. Now:

- create_authentication_tokens captures the refresh token too, kept in memory
  alongside the operation token.
- A 401 from a direct Bearer request triggers recovery: exchange the refresh
  token for a new operation token directly at the instance (refresh_operation_
  token, no proxy round-trip), falling back to a fresh proxy mint if the refresh
  token is rejected. The request is then replayed once with the new token.
- Recovery is deduplicated across concurrent 401s and capped at a single retry
  per request to avoid loops.

Closes the deferred item from the PR review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dawsontoth dawsontoth marked this pull request as ready for review July 1, 2026 15:02
@dawsontoth dawsontoth requested a review from a team as a code owner July 1, 2026 15:02
The instance-layout guard checked "am I already authenticated?" against the
context.authentication router snapshot, which lags the live authStore (listeners
notify asynchronously). Right after a direct sign-in — which sets the connection
synchronously — the snapshot still showed no user, so a manager (update perm)
fell through to the auto-Fabric-Connect branch, clobbering the just-established
direct session. The user then saw "Fabric Connect" instead of "Direct Connect"
back on the cluster list.

Read live authStore state (authStore.getConnectionById) for the entity, matching
the sign-in guards fixed for #1333.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@BboyAkers BboyAkers self-requested a review July 1, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Direct Sign In Fallsback to Fabric Connect Too Aggressively

2 participants