Disable snapshots for get_analytics queries#35
Conversation
|
I guess I didn't explain this well: this isn't implemented for |
Want to see if I understand. You're saying that we should:
|
|
Claude's take on picking this up: The one-liner now conflicts with Findings
Proposed plan
Questions for you, Kris:
|
|
|
@cb1kenobi mind weighing in on this PR too? Specifically the ideas in my last comment around RocksDB. |
Reimplements PR #35 on current main (the original one-liner now conflicts and was a no-op, since `snapshot: false` was never plumbed past the request). `snapshot: false` on a search request now means: read against the latest committed data without holding a consistent read snapshot open for the whole iteration, so a long-running scan doesn't pin a snapshot that blocks compaction / ties up resources. - RequestTarget: declare the `snapshot` option (default true). - Table.search: pass `snapshot === false` to `txn.useReadTxn(...)`. - DatabaseTransaction (RocksDB): create the read txn with `{ disableSnapshot: true }` when requested — the backend lever rocksdb-js actually exposes. - LMDBTransaction: accept the flag for parity but no-op it (lmdb's only lever is `getRange({snapshot:false})`, which throws on the dupSort secondary-index stores we use). LMDB-backed reads keep a snapshot for now. - analytics/read.ts get(): request `snapshot: false`. Passing the disableSnapshot RocksDB transaction down as the normal read transaction means no `getRange` snapshot option is needed, which sidesteps the lmdb dupSort throw entirely. Adds a regression test covering primary-key, dupSort secondary-index, and combined scans on both engines, and documents the behavior in resources/DESIGN.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
332f2ff to
dd9201a
Compare
|
Pushed the implementation (force-pushed this branch onto current One deviation from the plan above, in a good way. Plan steps 4–5 had us threading Net result vs. the three questions:
What changed
Verification: clean typecheck / prettier / oxlint; 84 analytics unit tests pass; new test green on both engines. Docs: HarperFast/documentation#557 adds a "Read behavior" note to the |
kriszyp
left a comment
There was a problem hiding this comment.
Approving — nice perf win: dropping the read snapshot stops a long hdb_analytics scan from pinning an MVCC snapshot that blocks RocksDB compaction. Zero default-path cost (opt-in boolean, default takes the identical path), scoped to the single read-only get_analytics site, and torn reads are fine for approximate analytics. Regression test included. Thanks @cap10morgan!
— 🤖 KrAIs (Kris's review assistant)
Address review feedback: pass `{ disableSnapshot }` directly instead of
branching. When undefined (the default read path) it is equivalent to the
`false` default, so behavior is unchanged.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
I'm assuming Studio is the main consumer of |
kriszyp
left a comment
There was a problem hiding this comment.
Approving — real perf win: dropping the read snapshot stops a long hdb_analytics scan from pinning an MVCC snapshot that blocks RocksDB compaction. Opt-in with zero default-path cost, read-only scope, regression test included; the latest push is a cosmetic collapse to a direct {disableSnapshot} pass-through. Thanks @cap10morgan!
— 🤖 KrAIs (Kris's review assistant)
get_analytics queries now read against the latest committed data without holding a read snapshot open for the duration of the scan, so long-running analytics queries stay easier on the rest of the system (no pinned snapshot blocking compaction). Documents the behavior and its consistency trade-off. Implements the docs follow-up for HarperFast/harper#35. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Disables read snapshots for
get_analyticsqueries so long-running analytics scans are easier on the rest of the system — a scan no longer pins a consistent read snapshot, which would otherwise block storage compaction for its duration.What
snapshot: falsemeansA search request can now set
snapshot: falseto read against the latest committed data without holding a consistent read snapshot open for the whole iteration. The trade-off is consistency: rows written while the query iterates may be observed. Default remainstrue(a stable snapshot is held).How it's wired
The backends expose snapshot control differently, so the flag is honored at the layer that actually supports it:
hdb_analytics): the read transaction is created with{ disableSnapshot: true }inDatabaseTransaction.getReadTxn. Passing that transaction down as the normal read txn is sufficient — nogetRangeoption is needed, which sidesteps the lmdb "can't disable snapshot on a dupSort store" throw entirely (every secondary index is dupSort).getRange({ snapshot: false }), which throws on dupSort stores). The flag is accepted for parity but is a documented no-op; LMDB-backed reads keep a snapshot for now.Changes
RequestTarget— declaressnapshot?: boolean(defaulttrue).Table.search—txn.useReadTxn(target.snapshot === false).DatabaseTransaction.getReadTxn/useReadTxn(RocksDB) — build the read txn with{ disableSnapshot: true }when requested.LMDBTransaction.useReadTxn— accepts the flag, no-ops it (commented).resources/analytics/read.tsget()— requestssnapshot: false.unitTests/resources/snapshotFalse.test.js— regression test (primary-key, dupSort secondary-index, and combined scans).resources/DESIGN.md— documents the read-snapshot behavior.Testing
Docs
Companion docs PR: HarperFast/documentation#557 (adds a "Read behavior" note to the
get_analyticsreference).