Skip to content

Job detail page improvements#391

Merged
krokicki merged 48 commits into
mainfrom
app-cleanup
Jun 20, 2026
Merged

Job detail page improvements#391
krokicki merged 48 commits into
mainfrom
app-cleanup

Conversation

@krokicki

Copy link
Copy Markdown
Member

Reworks the job detail page: a new Overview tab that summarizes a job at a glance, cleaner Parameters/log tabs, and polish on the running-service banner. Backend changes snapshot the runtime metadata the Overview needs onto each job at submit time.

Motivation

The job detail page opened on a raw Parameters dump, crammed status and timestamps into the header, and had no single place to answer "what was this job, how did it run, and how did it end?". It also showed null parameter/resource values and downloaded log files under synthetic names. The runtime info a user most wants — the declared tooling (nextflow / apptainer / pixi / conda), the base command, the run directory — was computed at submit time and then discarded, so it could not be shown on a finished job.

What changed

New Overview tab (default)

  • Added an Overview tab, now the default landing tab, with two responsive boxes that stack on small screens:
    • Status — runtime (live elapsed while running, total once finished), queue wait, submitted/started/finished timestamps, and exit code (with a human-readable meaning, e.g. 137 (killed (SIGKILL / out of memory))).
    • Execution — app name (links to the GitHub repo when parseable), entry point, type (Batch job / Service), declared runtime tooling as chips, base command, container / container args / conda env when set, and the cluster job id.
  • Added a Recent output panel showing the tail of stdout (last ~20 lines) with the trailing LSF job-summary footer stripped, plus a Recent errors panel that appears only when stderr is non-empty. Both link through to the full log tabs.
  • Slimmed the header down to title + status badge + actions.

Parameters and log tabs

  • The Parameters tab now renders parameters, cluster resources, environment variables, and other launch fields (pre/post-run, container, container args), with null/undefined values stripped from both the display and the downloaded JSON.
  • Null parameter and cluster-resource values are filtered in submit_job so they are never persisted (no more cpus: null).
  • Moved "Export params" into the header next to Relaunch; renamed the in-tab button to "Download".
  • File tabs (Script / Output / Error) show just the filename with the full path on hover, and use a plain right-aligned download icon matching the file browser. Downloads now use the file's real on-disk basename instead of a hardcoded job-<id>-<tab> name. Added a download button to the Script tab to match stdout/stderr.

Running-service banner

  • Turned the "Open Service" link into a button.
  • The service URL truncates with an ellipsis instead of wrapping.

Backend: persist runtime metadata

  • Persist command, conda_env, and the merged requirements onto the job record at submit time, and expose the already-stored work_dir on the API model — these previously existed only transiently during submission.
  • Alembic migration c7d2f4a9e103_add_runtime_metadata_to_jobs adds the command, conda_env, and requirements columns. Old jobs return null for these and the UI omits the corresponding rows.
  • get_job_file_paths now returns a browse-linkable work_dir entry alongside script/stdout/stderr.

Implementation notes

  • New pure frontend helpers in frontend/src/utils/jobDisplay.tsformatDuration, stripLsfFooter, tailLines, exitCodeMeaning — with unit tests in frontend/src/__tests__/unitTests/jobDisplay.test.ts.
  • The LSF footer is stripped only in the Overview preview; the full Output Log tab and downloads keep the complete file. The footer marker is a line of dashes immediately followed by a Sender: LSF System line; a non-match leaves the text untouched.
  • File-path hovers reuse the existing FgTooltip widget for styling consistent with the file browser.
  • The stdout/stderr queries are shared between the Overview and the log tabs (same query keys), so switching tabs does not refetch.

Testing

  • Backend: pixi run -e test test-backend — all pass, including new get_job_file_paths work-dir tests in tests/test_apps.py.
  • Frontend: pixi run test-frontend — all pass, including the new jobDisplay unit tests.
  • pixi run node-eslint-check, pixi run node-check, and a production pixi run node-build all pass (no new TypeScript errors).
  • Migration applied via pixi run migrate; c7d2f4a9e103 is the single head.

@StephanPreibisch @JaneliaSciComp/fileglancer

krokicki and others added 30 commits June 5, 2026 11:14
Users can publish their apps to a catalog browsed by everyone, and add
listings to their own collection. Adding from the catalog creates an
independent UserApp; the listing stores only metadata (no manifest
snapshot) so there is no staleness or drift to manage. The Apps page's
new "Browse Catalog" and "Add from URL" buttons replace the old single
"Add App" entry point.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trash icon on the app card and Delete button in the info dialog both
open a confirmation dialog before calling the remove mutation, matching
the Stop Service confirmation pattern in JobDetail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the standalone /catalog route and the separate Catalog/Jobs
navbar entries with a tab bar on /apps that switches between My Apps,
App Catalog, and Jobs. The Apps navbar entry now carries the active-job
badge that used to live on the Jobs link.

Tabs are a small NavLink-based component (not Material Tailwind's Tabs)
since the latter is built around in-memory Tabs.Panel content rather
than URL routing; NavLink gets us automatic active state from the URL
and proper back/forward behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The server-side verify_requirements() check inspected the backend's PATH,
but jobs run on the compute node as the user with a different environment.
This produced false negatives (tool present on the cluster but missing on
the backend blocked submission) and false positives (passed on the backend,
failed at runtime).

Add build_requirements_check(), which generates a bash snippet from the same
requirement parser and tool registry. It runs inside the job after PATH/conda/
env setup, checks tool existence and version constraints (sort -V), aggregates
all failures to stderr, and exits non-zero. Failures surface as a FAILED job
with the message in stderr.log, shown by the existing JobDetail UI.

submit_job no longer hard-fails on the server; the check is embedded in the
job script before pre_run/command.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The mutation hooks expose a single shared isPending flag, so passing it to
every ListingCard caused all add/unshare buttons to animate when any one was
in flight. Scope each card's pending state to the listing being acted on by
matching the mutation's in-flight variables (listing_id).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the inline GitHub URL link from catalog cards and add an info button
that opens a ListingInfoDialog, mirroring the "my apps" AppInfoDialog. The
dialog shows the URL, branch, and description in the same table layout, plus
a "Shared by" row, and surfaces Add/Unshare actions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a checkbox to the catalog that filters out listings already in the user's
apps, reusing the existing myAppKeys set. Updates the empty state to explain
when all shared apps are already installed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
The loading state swapped the label for a larger spinner plus loading text,
which grew the button in both dimensions. Now the label stays in normal flow
(hidden in place while loading) and a smaller spinner is overlaid, so the
button keeps a constant size and its label stays vertically centered.

loadingText is now used as the spinner's accessible label. Spinner gains an
optional sizeClasses prop (default unchanged) and skips empty text.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
Share/unshare is now only available in the app info dialog. The card keeps
its info, launch, and remove actions; the "Shared" badge still indicates
shared apps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
The trash can icon meant "remove from my apps" on the Apps page but
"unshare" in the catalog. Use users-slash (FaUsersSlash) for unshare and
users (FaUsers) for share, in the catalog card/info dialog and the app info
dialog, to distinguish sharing actions from removal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
- Share/unshare no longer close the app info or catalog info dialogs.
- Fold the share form into AppInfoDialog as an inline view instead of a
  separate stacked dialog, so the dialog stays open throughout sharing
  (stacked Material Tailwind dialogs dismissed the one underneath). Removes
  the now-unused ShareAppDialog.
- Rename the app's "Delete" action to "Remove".
- Add tooltips to every button in both info dialogs and keep the two dialogs
  in sync (layout, button styling, wording).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
Apps are versioned on their own terms (package.json, pixi.toml, etc.), so a
manifest-level version is meaningless. Drop it from the AppManifest model and
TS type, the pixi adapter, the info dialog and launch form, and the docs.
Legacy manifests that still include `version:` are accepted but the field is
ignored (Pydantic extra='ignore').

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Co-authored-by: Codex <codex@openai.com>
Requirement checking moved from submit-time on the server to job runtime
in the execution environment, but the Job Execution and extra_paths
sections still described the old server-side behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
submit_job stopped calling verify_requirements when requirement checking
moved to job runtime (build_requirements_check); the function was left
exported and exercised only by tests. Remove it, its sole helper
(_augmented_path), the now-unused shutil/subprocess/packaging imports,
and the orphaned test class. The conda-binary and version-comparison
cases it covered are exercised by the build_requirements_check tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_find_manifests_in_repo raised on the first adapter conversion failure,
so a single adapter's error could mask a later adapter that would have
handled the repo. Collect all adapter errors, return as soon as any
adapter succeeds (logging the rest), and only raise an aggregated error
when no adapter produced a manifest. Add tests covering the
one-fails-one-succeeds, all-fail-aggregated, and none-handle paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
model.py and apps/core.py each defined near-identical requirement-spec
patterns that had to be kept in sync by hand. Make model._REQUIREMENT_PATTERN
the single source of truth (groups: tool, operator, version) and import it
into core. This also lets build_requirements_check read the operator and
version straight from the match, dropping the separate _REQ_OP_PATTERN
split, and tidies merge_requirements to match each spec once.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
published_at comes back from the API as a naive UTC datetime (no tz
marker), but the catalog cards/dialog parsed it with a raw
new Date(...).toLocaleDateString(), which interprets it as local time
and can show the wrong day near midnight. Use the shared formatDateString
helper (already used for job/ticket/link dates), which normalizes naive
strings to UTC before formatting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The FgButton loading rework stopped rendering loadingText as visible text
and instead exposes it as the spinner overlay's aria-label (role="status"),
keeping the button size constant. The Loading and LoadingWithHref stories
still asserted the text with getByText, so their play functions threw and
Chromatic reported 2 component errors. Query by role/accessible name to
match the current a11y contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
krokicki and others added 18 commits June 16, 2026 22:23
TestBuildRequirementsCheck executes the generated requirement-check
snippet through `bash -c` and depends on POSIX tools (grep -oE, sort -V,
arrays) plus chmod-based executable shims. That environment isn't
available/consistent on Windows, so all 12 cases failed on the
windows-latest CI runner. The snippet only ever runs on Linux compute
nodes, so skip the class on win32 (matching test_worker/test_filestore).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the ability to download all job parameters (parameters, environment,
and cluster settings) as a JSON file and to upload such a file to populate
the launch form. A file may target any subset of the three tabs.

- AppLaunchForm: "Export params" and "Upload params file" buttons next to
  Submit; partial files applied field-by-field
- JobDetail: "Download params" button on /apps/jobs/:id to download the
  parameters used for a completed job
- Add AppLaunchParamsFile type + parseAppLaunchParamsFile validator and a
  shared downloadTextFile util (reused by JobDetail log downloads)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The generated runtime requirements-check snippet always included all four
bash helper functions (__fg_check_tool, __fg_extract_version, __fg_ver_le,
__fg_check_version) even when the checks never called them. Split the
preamble into per-helper blocks and emit only those the generated checks
reference: version-only helpers are dropped for name-only requirements, and
none are emitted when every requirement is invalid.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opening a job fired the script, stdout, and stderr file-content queries on
mount, making the first load slow even though the default Parameters tab
needs none of them. Gate each file query on its tab being active via a new
enabled flag on useJobFileQuery, so file content is fetched only when its
tab is viewed (and cached thereafter).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opening a job was slow because GET /api/jobs/{id} dispatched to the per-user
worker to compute file paths, which globbed the work dir for the script and
stat'd the logs over NFS on every read. The script filename is already
returned by cluster-api at submit time and was simply discarded.

- Add a script_path column to jobs (migration f3a9c41d76e2) and persist the
  value cluster-api returns from submit
- get_job_file_paths now derives paths from work_dir + stored script_path with
  no filesystem access, inferring existence from job state (script once
  submitted, logs once started, service_url only while running)
- read_job_file reads the stored script_path, falling back to a glob for
  legacy jobs
- get_job resolves file info in-process, removing the worker roundtrip; drop
  the now-unused get_job_file_paths worker action

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The job detail endpoint resolved browse links by realpath'ing every file
share mount, three times per request (once per file). On a cold server those
realpaths trigger NFS automounts, making the first job load take ~6s,
synchronously on the async handler.

All of a job's files live under one work dir, so its browse-link base is
resolvable once at submit time in the user-context worker (where mounts are
warm). Persist it and build per-file browse links from the DB by string
concatenation, with no filesystem access on read.

- Add work_dir_fsp_name / work_dir_subpath columns (migration b1c4e7f29a83)
- Worker submit resolves and returns the work dir's browse-link base;
  submit_job persists it
- get_job_file_paths builds browse links from the stored base (drop the
  per-read realpath and the FSP fetch); remove _resolve_browse_path

Legacy jobs (no stored base) show file paths without browse links; their
content tabs still work via the glob fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep stored job paths as raw strings when building metadata so POSIX cluster paths are not rewritten with local OS separators on Windows.

Co-authored-by: Codex <codex@openai.com>
core.py had grown to 1701 lines mixing nine concerns behind a flat list
of module-level functions. Split it into four single-purpose modules and
delete core.py (clean break):

- manifest.py  — worker dispatch + git repo cache + manifest discovery/loading
- command.py   — requirement checks + path/param validation + build_command
- jobs.py      — poll loop + job submission/cancel
- jobfiles.py  — work-dir path construction + job file access

Dependencies form a DAG (jobs imports the other three; the rest are
leaves), so there's no circular-import risk. __init__.py re-exports the
same public API, so server.py is unchanged. Repointed user_worker.py
imports, a model.py comment, and test imports/patch targets to the new
modules. Pure reorganization — no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Commit 589ba27 ("Fix job file paths on Windows") modified the
job-file-path logic in the old core.py. That logic now lives in
jobfiles.py after the split, so re-apply it there: keep stored job
paths as raw strings (via _stored_work_dir_path / _join_stored_path /
_stored_path_basename) so POSIX cluster paths are not rewritten with
local OS separators on Windows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Mirrors the stdout/stderr tabs, reusing handleDownload/downloadTextFile
to save the script as job-<id>-script.sh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Parameters tab now renders parameters, cluster resources, environment
  variables, and other launch fields (pre/post_run, container, args).
- Strip null/undefined parameter values from the tab and the downloaded
  JSON; filter them in submit_job so they are never persisted.
- Rename the parameters "Download params" button to "Download".

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Filter nulls from job.resources in the parameters tab display and the
downloaded JSON, and stop persisting null resource values in submit_job
so entries like "cpus: null" no longer appear.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Move "Export params" to the header, left of Relaunch.
- File tabs show just the filename (full path on hover) and keep the link.
- Replace labeled file download buttons with a plain right-aligned
  download icon matching the file browser.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Derive the download filename from the file's on-disk basename instead of
a hardcoded job-<id>-<tab> name, so it matches the displayed link.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add an Overview tab (now the default) to the job detail page summarizing
status, timeline, what ran, environment/runtime, files, and a recent stdout
tail. Slim the header down to title + status badge + actions.

Persist the runtime metadata the Overview needs onto the job at submit time
(command, conda_env, requirements) plus expose work_dir, since these were
previously computed and discarded. Adds an alembic migration and a work_dir
browse-link entry in get_job_file_paths.

Frontend adds pure helpers (formatDuration, stripLsfFooter, tailLines,
exitCodeMeaning) with unit tests, and reuses FgTooltip for file-path hovers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merge the timeline into the Status box and drop the separate Files and
Environment boxes; rename "What ran" to "Execution" and fold the runtime
chips, command, and container/conda details into it. Remove the redundant
status pill from the Status box (already in the header) and move the exit
code to the bottom. Two boxes stack responsively on small screens.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Turn the "Open Service" link into a button and truncate the service URL
with an ellipsis instead of wrapping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@krokicki krokicki merged commit 78da8d1 into main Jun 20, 2026
5 checks passed
@krokicki krokicki deleted the app-cleanup branch June 20, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant