Skip to content

WIP - WAL#5149

Draft
denik wants to merge 46 commits intomainfrom
denik/wal-review1
Draft

WIP - WAL#5149
denik wants to merge 46 commits intomainfrom
denik/wal-review1

Conversation

@denik
Copy link
Copy Markdown
Contributor

@denik denik commented Apr 30, 2026

Changes

Why

Tests

@denik denik temporarily deployed to test-trigger-is April 30, 2026 14:29 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is April 30, 2026 14:29 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 1, 2026 11:49 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 1, 2026 11:49 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 1, 2026 14:04 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 1, 2026 14:04 — with GitHub Actions Inactive
@denik denik force-pushed the denik/wal-review1 branch from 4e30122 to 864a2b9 Compare May 1, 2026 14:12
@denik denik temporarily deployed to test-trigger-is May 1, 2026 14:13 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 1, 2026 14:13 — with GitHub Actions Inactive
Varun Deep Saini and others added 21 commits May 4, 2026 13:58
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <deepsainivarun@gmail.com>
Signed-off-by: Varun Deep Saini <deepsainivarun@gmail.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
fix Open() calls; replace Finalize() with Close(); close state file in plan
Move state open/close management to process.go so the lifecycle is
transparent. process.go opens state for read (with WAL recovery) after
PullResourcesState and defers close. Deploy/destroy upgrade to write
mode via the new UpgradeToWrite() method which initializes the WAL
without re-reading state JSON.

Internal functions (CalculatePlan, ExportState, InitForApply,
ValidatePlanAgainstState) no longer manage their own open/close —
they expect state to already be open. Self-managed callers (bind,
migrate, yaml_sync, diff) handle their own state lifecycle.

Plan command uses ProcessBundleRetWithPlan to compute the plan while
state is still open for read inside processBundleRetInternal.

Co-authored-by: Isaac
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
WAL is recovered on next run via WithRecovery open in process.go;
deployCore already calls Finalize+Open explicitly before PushResourcesState.

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 13:01 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 13:01 — with GitHub Actions Inactive
Flush WAL to local state while the state DB is still open,
before remote files are deleted.

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 13:09 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 13:09 — with GitHub Actions Inactive
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 13:13 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 13:13 — with GitHub Actions Inactive
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 17:56 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 17:56 — with GitHub Actions Inactive
3 jobs exercise the same DAG + partial-WAL recovery path with
3x fewer output lines.

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 17:59 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 17:59 — with GitHub Actions Inactive
Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 18:01 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 18:01 — with GitHub Actions Inactive
- drop corrupted-wal-middle (same code path as corrupted-wal-entry)
- drop multiple-crashes (covered by crash-after-create)
- drop summary-after-crash (incomplete output; crash coverage in crash-after-create)
- fix empty-wal echo: (unexpected) -> (expected)
- fix parent test.toml: exit code 137 -> [KILLED] only; errors show Exit code: 1

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 18:04 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 18:04 — with GitHub Actions Inactive
On Linux, KillCaller (SIGKILL) may produce exit code 1 instead of 137.
Add a context-sensitive replacement to normalise exit code 1 only when it
directly follows [PROCESS_KILLED], so genuine error exits (exit code 1 from
cat/jq) remain visible as Exit code: 1 in the output.

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 10, 2026 18:58 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 18:58 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 19:40 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 10, 2026 19:40 — with GitHub Actions Inactive
- chain-3-jobs: fix stale echo "job_10" -> "job_03"
- corrupted-wal-entry, future-serial-wal, lineage-mismatch, stale-wal,
  wal-with-delete: commit static fixture files (resources.json,
  resources.json.wal) instead of creating them inline in script;
  wal-with-delete: commit databricks.yml as resources: {} instead of
  overwriting it at runtime

Co-authored-by: Denis Bilenko <denis.bilenko@databricks.com>
@denik denik temporarily deployed to test-trigger-is May 11, 2026 08:42 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is May 11, 2026 08:42 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants