Bug: busy_timeout = 0 on ~/forge/.forge.db causes database is locked under any concurrent forge process
Summary
Forge's local SQLite database (~/forge/.forge.db, 6.6 GB) is created with
PRAGMA busy_timeout = 0. Combined with journal_mode = WAL, this means any
moment two forge processes (main + child renderer/worker/event-source) write
at the same time, the second writer gets SQLITE_BUSY ("database is locked")
immediately, with no retry. The user sees the error in the chat prompt:
● [18:31:56] ERROR: database is locked
This fires in every long-running session, because chat persistence
(conversations table) and the FTS5 triggers
(conversations_fts_insert / conversations_fts_update) race with the
content script's webview writes.
Environment
- forge CLI:
v2.13.13
- Host: macOS (Apple Silicon),
~/forge/ directory
- DB path:
~/forge/.forge.db
- DB size: 6.6 GB; WAL: 3.5 GB; SHM: 7.1 MB
- 11,996 rows in
conversations (Diesel ORM, FTS5 enabled)
- Wall-clock observed: ≥ 1× per ~30 min interactive session
Reproduction
- Run any long interactive forge session (≥ 30 min).
- Have at least one other forge process or subagent active (e.g.
--with-subagent,
background /loop, the forgecode MCP server, or the renderer when streaming).
- During a chat response that triggers
conversations UPDATE (every turn does),
the race fires and the prompt shows ● [HH:MM:SS] ERROR: database is locked.
Evidence
Direct PRAGMA reads from the running DB:
$ sqlite3 ~/forge/.forge.db "PRAGMA busy_timeout; PRAGMA journal_mode; PRAGMA auto_vacuum; PRAGMA wal_autocheckpoint;"
0 # <-- BUG: should be ≥ 5000
wal
0 # <-- no auto-vacuum; DB grows monotonically
1000 # default, not the issue
WAL state:
$ sqlite3 ~/forge/.forge.db "PRAGMA wal_checkpoint(PASSIVE);"
0|739|739 # 739 frames in WAL, 0 checkpointed; WAL is not being drained
Page-level health:
$ sqlite3 ~/forge/.forge.db "PRAGMA page_count; PRAGMA freelist_count;"
1628401
1395 # 0.086% — DB is not bloated from deletes, just from write volume
Root cause
In forge's connection bootstrap (wherever the SqliteConnection or
diesel::sqlite::SqliteConnection::establish is called), PRAGMA busy_timeout
is being set to 0 (or never set, which inherits the connection-default 0).
The standard recommendation for any SQLite-with-WAL app with more than one
connection is busy_timeout = 5000 (or higher) so writers wait briefly for
readers to release their SHM lock instead of failing immediately.
A 6.6 GB database with 3.5 GB of unwritten WAL frames and 11.9k conversation
rows is not an edge case — it's the normal running state. The busy_timeout = 0
choice was almost certainly a default that nobody re-evaluated as the
schema and write volume grew.
Expected
PRAGMA busy_timeout = 5000; -- or 10000 for heavy write sessions
The error should never be visible to the user; it should be transparently
retried by SQLite.
Actual
The SQLITE_BUSY error escapes to the chat prompt and the conversation
update either fails silently or retries from scratch on the next turn.
Proposed fix (minimal, in forge-db crate or wherever the connection is set up)
// diesel-sqlite (or whatever the equivalent is for forge's stack)
let conn = SqliteConnection::establish(&db_path)?;
conn.batch_execute("
PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL; -- already 1
PRAGMA busy_timeout = 5000; -- <-- THE FIX
PRAGMA temp_store = MEMORY;
PRAGMA mmap_size = 268435456; -- 256 MB
")?;
Optionally, schedule a PRAGMA wal_checkpoint(TRUNCATE) on session end to
drain the 3.5 GB WAL back into the main DB, and PRAGMA incremental_vacuum
to reclaim any free pages. With auto_vacuum = 0 (current setting), the
DB will keep growing — consider switching to incremental and running
PRAGMA incremental_vacuum periodically, or at least expose a
forge db compact admin command for users with multi-GB DBs.
Recommended user-side workaround (until fix lands)
# One-shot, applies to the existing DB until next schema migration
sqlite3 ~/forge/.forge.db "PRAGMA busy_timeout = 5000; PRAGMA wal_checkpoint(TRUNCATE);"
# Then restart forge so the connection picks up the new value.
# (PRAGMA values set by a separate sqlite3 process persist in the DB file
# for some pragmas, but busy_timeout is per-connection — so the application
# must set it on each new connection.)
This buys time but does not fix the root cause; the next time forge opens
the DB, busy_timeout reverts to 0 from the application's connection code.
Related
All three errors cluster in the same long-running session and look like
the "shared-resource races under load" family.
Bug:
busy_timeout = 0on~/forge/.forge.dbcausesdatabase is lockedunder any concurrent forge processSummary
Forge's local SQLite database (
~/forge/.forge.db, 6.6 GB) is created withPRAGMA busy_timeout = 0. Combined withjournal_mode = WAL, this means anymoment two forge processes (main + child renderer/worker/event-source) write
at the same time, the second writer gets
SQLITE_BUSY("database is locked")immediately, with no retry. The user sees the error in the chat prompt:
This fires in every long-running session, because chat persistence
(
conversationstable) and the FTS5 triggers(
conversations_fts_insert/conversations_fts_update) race with thecontent script's webview writes.
Environment
v2.13.13~/forge/directory~/forge/.forge.dbconversations(Diesel ORM, FTS5 enabled)Reproduction
--with-subagent,background
/loop, theforgecodeMCP server, or the renderer when streaming).conversationsUPDATE (every turn does),the race fires and the prompt shows
● [HH:MM:SS] ERROR: database is locked.Evidence
Direct PRAGMA reads from the running DB:
WAL state:
Page-level health:
Root cause
In
forge's connection bootstrap (wherever theSqliteConnectionordiesel::sqlite::SqliteConnection::establishis called),PRAGMA busy_timeoutis being set to
0(or never set, which inherits the connection-default0).The standard recommendation for any SQLite-with-WAL app with more than one
connection is
busy_timeout = 5000(or higher) so writers wait briefly forreaders to release their SHM lock instead of failing immediately.
A 6.6 GB database with 3.5 GB of unwritten WAL frames and 11.9k conversation
rows is not an edge case — it's the normal running state. The
busy_timeout = 0choice was almost certainly a default that nobody re-evaluated as the
schema and write volume grew.
Expected
The error should never be visible to the user; it should be transparently
retried by SQLite.
Actual
The
SQLITE_BUSYerror escapes to the chat prompt and the conversationupdate either fails silently or retries from scratch on the next turn.
Proposed fix (minimal, in
forge-dbcrate or wherever the connection is set up)Optionally, schedule a
PRAGMA wal_checkpoint(TRUNCATE)on session end todrain the 3.5 GB WAL back into the main DB, and
PRAGMA incremental_vacuumto reclaim any free pages. With
auto_vacuum = 0(current setting), theDB will keep growing — consider switching to
incrementaland runningPRAGMA incremental_vacuumperiodically, or at least expose aforge db compactadmin command for users with multi-GB DBs.Recommended user-side workaround (until fix lands)
This buys time but does not fix the root cause; the next time forge opens
the DB,
busy_timeoutreverts to0from the application's connection code.Related
application/x-mach-binary)contentscript.js:14083All three errors cluster in the same long-running session and look like
the "shared-resource races under load" family.