Skip to content

FEATURE: expose V8 ScriptCompiler::CachedData via Context#compile#413

Open
ursm wants to merge 1 commit into
rubyjs:mainfrom
ursm:feature/cached-data-411
Open

FEATURE: expose V8 ScriptCompiler::CachedData via Context#compile#413
ursm wants to merge 1 commit into
rubyjs:mainfrom
ursm:feature/cached-data-411

Conversation

@ursm
Copy link
Copy Markdown

@ursm ursm commented May 18, 2026

Summary

Implements #411 — exposes V8's ScriptCompiler::CachedData so callers can persist per-script bytecode cache and skip re-parsing large bundles on subsequent processes.

# First process: produce the cache
ctx = MiniRacer::Context.new
script = ctx.compile(File.read("bundle.js"), filename: "bundle.js")
File.binwrite("bundle.js.cache", script.cached_data) if script.cached_data
script.run

# Later process: restore from blob, skip the parse step
cached = File.binread("bundle.js.cache")
ctx = MiniRacer::Context.new
script = ctx.compile(File.read("bundle.js"), filename: "bundle.js", cached_data: cached)
File.binwrite("bundle.js.cache", script.cached_data) if script.cache_rejected?
script.run

API surface:

  • MiniRacer::Context#compile(source, filename:, cached_data:)MiniRacer::Script
  • Script#run — executes the compiled script; safe to call multiple times
  • Script#cached_data — returns the bytecode blob (nil if the supplied cached_data: was accepted, so callers can skip a redundant disk write)
  • Script#cache_rejected? — boolean signal for cache-key invalidation telemetry
  • Script#dispose / Script#disposed? — eager handle release
  • MiniRacer::V8_CACHED_DATA_VERSION_TAG — module-level constant (populated on first Context.new) wrapping v8::ScriptCompiler::CachedDataVersionTag(); mix into cache keys so a libv8-node bump invalidates blobs automatically

Design notes

Context dispose ordering (per @SamSaffron's flag): State::~State() walks st.scripts and resets each v8::Persistent<v8::Script> under the existing Locker/Isolate::Scope before isolate->Dispose(). Handle table is owned per-State.

Concurrency: compile/run/dispose RPCs go through the existing rendezvous mutex path; the handle table is only touched from the V8 thread. No new lock surface. Reentrancy isn't a concern here since compile/run don't trigger JS→Ruby callbacks themselves (those go through the existing v8_api_callback path).

GC finalizer trade-off: script_free does NOT send a dispose RPC — taking rr_mtx from a Ruby finalizer thread risks deadlock. Handles freed via finalizer rely on State::~State() walking the table at isolate teardown. Long-lived Contexts with many short-lived Scripts will accumulate handles until Context#dispose. Documented in README; Script#dispose is available for eager release.

CachedData buffer policy: input blob uses BufferNotOwned pointing at the ValueDeserializer's ArrayBuffer backing store (valid for the v8_compile call), avoiding a copy of potentially MB-sized blobs.

Packet protocol: new tags 'K' (compile), 'R' (run), 'D' (dispose) added to dispatch1. 'C' was already taken by call, hence 'K' for compile.

TruffleRuby: GraalJS has no per-script bytecode cache reachable from Polyglot::InnerContext#eval (Source.Builder.cached(true) is in-process/Engine-lifetime only; cross-process cache requires GraalVM Enterprise Auxiliary Engine Caching with Native Image, which mini_racer's truffleruby shim explicitly rejects). Shim falls back to source replay — same observable semantics, no compile reuse. cached_data: silently ignored, Script#cached_data returns nil, V8_CACHED_DATA_VERSION_TAG = 0.

Out of scope (deferred follow-ups)

  • GC finalizer that sends V8 dispose (needs lock-free queue design to avoid the rr_mtx deadlock)
  • Lazy produce_cached_data! RPC (current API keeps the blob on the Script struct; lazy path would help when only some scripts need persistence)
  • UnboundScript round-trip to move Scripts between Contexts
  • kEagerCompile option

Test plan

  • 12 new test cases in test/mini_racer_test.rb covering compile/run roundtrip, cached_data save/restore across Contexts, rejection path with corrupt blob, encoding validation, dispose semantics (script + cross-context), filename propagation in ParseError, interleave with Context#eval
  • Full test suite: 116 runs, 227 assertions, 0 failures, 3 skips (existing TruffleRuby skips)
  • Stress: 500 compile/run/dispose cycles + 100-script Context dispose walk — no crashes/leaks
  • BufferNotOwned change verified via cached_data round-trip (accepted → cached_data nil; rejected → fresh blob)
  • TruffleRuby smoke (no local TruffleRuby; relying on CI). Shim mirrors the established Snapshot.load / warmup_unsafe! no-op pattern in truffleruby.rb so should behave consistently.

Maintainer questions

  1. TruffleRuby parity policy: shim is no-op (no equivalent API in GraalJS). The established pattern (e.g. FEATURE: add Ruby-to-JS Uint8Array support #406) is to ship parity in the same PR — this shim is the closest equivalent we can offer. Happy to adjust if a different shape is preferred.
  2. V8_CACHED_DATA_VERSION_TAG initialization: defined lazily inside context_initialize (first Context.new) so Platform.set_flags! still applies to the tag. Constant doesn't exist before the first Context — acceptable trade-off, but happy to switch to a MiniRacer.cached_data_version_tag method if a constant-with-deferred-init is too quirky.

🤖 Generated with Claude Code

Adds Context#compile returning a MiniRacer::Script handle that can be run
multiple times and exposes V8's per-script bytecode cache. Callers pass
`cached_data:` to skip re-parsing on subsequent processes and read
`script.cached_data` to persist the produced blob.

The MiniRacer::V8_CACHED_DATA_VERSION_TAG constant exposes V8's
CachedDataVersionTag() so callers can invalidate their cache when libv8-node
is bumped.

TruffleRuby ships a shim that falls back to source replay since GraalJS has
no equivalent per-script cache reachable from Polyglot::InnerContext.

Refs rubyjs#411.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ursm
Copy link
Copy Markdown
Author

ursm commented May 18, 2026

For sequencing context: #412 (Module API) is the next planned PR but I'm holding it back until this one lands. The two share a lot of C++ surface (handle table, packet protocol, dispose ordering) so iterating patterns here once will be cheaper than rebasing #412 twice. Flagging in case it helps frame the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant