Skip to content

[Bug]: \ on .forge.db causes 'database is locked' in every long session #3550

Description

@KooshaPari

Bug: busy_timeout = 0 on ~/forge/.forge.db causes database is locked under any concurrent forge process

Summary

Forge's local SQLite database (~/forge/.forge.db, 6.6 GB) is created with
PRAGMA busy_timeout = 0. Combined with journal_mode = WAL, this means any
moment two forge processes (main + child renderer/worker/event-source) write
at the same time, the second writer gets SQLITE_BUSY ("database is locked")
immediately, with no retry. The user sees the error in the chat prompt:

● [18:31:56] ERROR: database is locked

This fires in every long-running session, because chat persistence
(conversations table) and the FTS5 triggers
(conversations_fts_insert / conversations_fts_update) race with the
content script's webview writes.

Environment

  • forge CLI: v2.13.13
  • Host: macOS (Apple Silicon), ~/forge/ directory
  • DB path: ~/forge/.forge.db
  • DB size: 6.6 GB; WAL: 3.5 GB; SHM: 7.1 MB
  • 11,996 rows in conversations (Diesel ORM, FTS5 enabled)
  • Wall-clock observed: ≥ 1× per ~30 min interactive session

Reproduction

  1. Run any long interactive forge session (≥ 30 min).
  2. Have at least one other forge process or subagent active (e.g. --with-subagent,
    background /loop, the forgecode MCP server, or the renderer when streaming).
  3. During a chat response that triggers conversations UPDATE (every turn does),
    the race fires and the prompt shows ● [HH:MM:SS] ERROR: database is locked.

Evidence

Direct PRAGMA reads from the running DB:

$ sqlite3 ~/forge/.forge.db "PRAGMA busy_timeout; PRAGMA journal_mode; PRAGMA auto_vacuum; PRAGMA wal_autocheckpoint;"
0          # <-- BUG: should be ≥ 5000
wal
0          # <-- no auto-vacuum; DB grows monotonically
1000       # default, not the issue

WAL state:

$ sqlite3 ~/forge/.forge.db "PRAGMA wal_checkpoint(PASSIVE);"
0|739|739  # 739 frames in WAL, 0 checkpointed; WAL is not being drained

Page-level health:

$ sqlite3 ~/forge/.forge.db "PRAGMA page_count; PRAGMA freelist_count;"
1628401
1395       # 0.086% — DB is not bloated from deletes, just from write volume

Root cause

In forge's connection bootstrap (wherever the SqliteConnection or
diesel::sqlite::SqliteConnection::establish is called), PRAGMA busy_timeout
is being set to 0 (or never set, which inherits the connection-default 0).
The standard recommendation for any SQLite-with-WAL app with more than one
connection is busy_timeout = 5000 (or higher) so writers wait briefly for
readers to release their SHM lock instead of failing immediately.

A 6.6 GB database with 3.5 GB of unwritten WAL frames and 11.9k conversation
rows is not an edge case — it's the normal running state. The busy_timeout = 0
choice was almost certainly a default that nobody re-evaluated as the
schema and write volume grew.

Expected

PRAGMA busy_timeout = 5000;   -- or 10000 for heavy write sessions

The error should never be visible to the user; it should be transparently
retried by SQLite.

Actual

The SQLITE_BUSY error escapes to the chat prompt and the conversation
update either fails silently or retries from scratch on the next turn.

Proposed fix (minimal, in forge-db crate or wherever the connection is set up)

// diesel-sqlite (or whatever the equivalent is for forge's stack)
let conn = SqliteConnection::establish(&db_path)?;
conn.batch_execute("
    PRAGMA journal_mode = WAL;
    PRAGMA synchronous = NORMAL;     -- already 1
    PRAGMA busy_timeout = 5000;      -- <-- THE FIX
    PRAGMA temp_store = MEMORY;
    PRAGMA mmap_size = 268435456;    -- 256 MB
")?;

Optionally, schedule a PRAGMA wal_checkpoint(TRUNCATE) on session end to
drain the 3.5 GB WAL back into the main DB, and PRAGMA incremental_vacuum
to reclaim any free pages. With auto_vacuum = 0 (current setting), the
DB will keep growing — consider switching to incremental and running
PRAGMA incremental_vacuum periodically, or at least expose a
forge db compact admin command for users with multi-GB DBs.

Recommended user-side workaround (until fix lands)

# One-shot, applies to the existing DB until next schema migration
sqlite3 ~/forge/.forge.db "PRAGMA busy_timeout = 5000; PRAGMA wal_checkpoint(TRUNCATE);"
# Then restart forge so the connection picks up the new value.
# (PRAGMA values set by a separate sqlite3 process persist in the DB file
# for some pragmas, but busy_timeout is per-connection — so the application
# must set it on each new connection.)

This buys time but does not fix the root cause; the next time forge opens
the DB, busy_timeout reverts to 0 from the application's connection code.

Related

All three errors cluster in the same long-running session and look like
the "shared-resource races under load" family.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions