perf: allocation-free Val.Str extractor (~25% faster match)#885
Open
He-Pin wants to merge 1 commit into
Open
Conversation
Motivation: `Val.Str.unapply` returned `Some((pos, str))`, so every `case Val.Str(p, s)` match (115 sites across the evaluator, stdlib, and materializer) went through an `Option` + `Tuple2` layer. Even though JVM C2 escape analysis scalar-replaces those short-lived objects in tight loops (so heap allocation is already ~0), the extra Option/Tuple indirection still costs instructions per match. Modification: - Rewrite `Str.unapply` as an allocation-free name-based extractor returning a value class `StrExtract(self: Str)` (`isEmpty`/`get`), with `_1`/`_2` accessors on `Str`. The `AnyVal` result is consumed by the match desugaring without allocation, and the `Str` type test keeps the match refutable so the `AsciiSafeStr` subclass is matched exactly as before. All 115 call sites are unchanged. `StrExtract`/`unapply` are `private[sjsonnet]`. - Add `StrMatchBenchmark`, a JMH micro that isolates the extractor in a tight loop (mixing `AsciiSafeStr`) as a regression guard. Result: Isolated micro (1024 matches/op, -f4, 60 samples): 440.7 ± 2.8 ns/op -> 331.9 ± 2.9 ns/op, a reproducible ~25% (1.33x) speedup; both baseline and new allocate ~0 B/op (EA already removed the heap object — the win is instruction count). End-to-end (MainBenchmark) is within noise since Val.Str matching is a small fraction of total parse+eval+materialize work. Compiles on Scala 3.3.7 / 2.13.18 / 2.12.21; full JVM test suite green; zero behavior change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Val.Str.unapplyreturnedSome((pos, str)), so everycase Val.Str(p, s)match — 115 sites across the evaluator, stdlib, and materializer — went through anOption+Tuple2layer. JVM C2 escape analysis already scalar-replaces those short-lived objects in tight loops (heap allocation is ~0), but the extraOption/Tupleindirection still costs instructions per match.Modification
Str.unapplyas an allocation-free name-based extractor returning a value classStrExtract(self: Str)withisEmpty/get, plus_1/_2accessors onStr. TheAnyValresult is consumed by the match desugaring without allocation, and theStrtype test keeps the match refutable, so theAsciiSafeStrsubclass is matched exactly as before. All 115 call sites are unchanged.StrExtract/unapplyareprivate[sjsonnet].StrMatchBenchmark— a JMH micro that isolates the extractor in a tight loop (mixing theAsciiSafeStrsubclass) as a regression guard.Result
Isolated micro (
StrMatchBenchmark, 1024 matches/op,-f4 -wi10 -i15 -r2 -prof gc, 60 samples):A reproducible ~25% (1.33×) speedup on the match operation. Note both baseline and new allocate ~0 B/op — C2 EA already removed the heap object, so the win is instruction count, not allocation.
End-to-end (
MainBenchmark,stdlib.jsonnet) is within noise, sinceVal.Strmatching is a small fraction of total parse + eval + materialize work — the per-op win is real but diluted.Compiles on Scala 3.3.7 / 2.13.18 / 2.12.21; full JVM test suite green; zero behavior change.