Skip to content

Process inline RBS comments natively without the require-hook rewriter#2639

Draft
paracycle wants to merge 10 commits into
mainfrom
uk-no-sorbet-runtime
Draft

Process inline RBS comments natively without the require-hook rewriter#2639
paracycle wants to merge 10 commits into
mainfrom
uk-no-sorbet-runtime

Conversation

@paracycle
Copy link
Copy Markdown
Member

Summary

  • Tapioca's pipeline now reads inline #: / # @... RBS comments directly from source instead of relying on a load-time rewriter that produced sig {} blocks at boot.
  • Removes lib/tapioca/rbs/rewriter.rb, the require-hooks dependency, the bootsnap shim, dsl --only-bootsnap-rbs-cache, and the TAPIOCA_RBS_CACHE plumbing. No more Bootsnap iseq cache to manage.
  • The gem pipeline always builds a Rubydex::Graph (seeded with core/stdlib RBS) so constants in inline sigs can be resolved to their fully-qualified names. A matching path on the DSL side picks up #: sigs for arbitrary host-app methods.

Why

The require-hook rewriter rewrote every .rb file at load time so sorbet-runtime would track #: comments as sig {} blocks. It worked, but it forced require-hooks into the dependency tree, made boot slower, and shipped its own bootsnap iseq cache so large apps could survive the cost. With Tapioca now able to parse RBS straight from source (via Rubydex), all of that goes away.

What changed

New modules under lib/tapioca/rbs/

  • comments.rb — parses a stream of [comment_string, line] tuples (as obtained from Rubydex's Definition#comments) into signatures and annotations classified into class-level / method-level groups.
  • type_qualifier.rb — walks an RBI::Type tree and emits a fully-qualified string. User constants resolve through a Rubydex graph (so Bar inside Foo becomes ::Foo::Bar); Sorbet's own T:: constants are prefixed with ::T::; user-defined generics (Sorbet T::Generic) are resolved but emitted without a leading :: to match Sorbet's runtime serializer convention.
  • dsl_signatures.rb — DSL-side RBS lookup. Builds a per-process Rubydex graph from the host workspace + already-loaded source files + core/stdlib RBS, looks up declarations by qualified name (or by source location for anonymous classes built with Class.new), and synthesizes an RBI::Sig. Lexical nesting for anonymous classes is recovered via a small Prism visitor.

Gem pipeline (lib/tapioca/gem/...)

  • Pipeline builds the graph unconditionally (previously only when include_doc: true) and exposes gem_graph, rbs_comments_for_constant, and rbs_comments_for_method.
  • Listeners::Methods + MethodNodeAdded carry an optional Pipeline::RBSMethodLookup when no Sorbet runtime sig was found.
  • Listeners::SorbetSignatures synthesizes a sig from the RBS lookup when present (with attr-vs-method-vs-overload handling and method-level annotations).
  • Listeners::SorbetHelpers picks up # @abstract, # @final, # @sealed, # @interface.
  • Listeners::SorbetRequiredAncestors picks up # @requires_ancestor:.
  • Listeners::SorbetTypeVariables picks up class-level #: [A, B] (with variance / upper: / fixed: blocks).
  • Listeners::Documentation filters Rubydex definitions down to ones inside the gem so the now-always-on graph doesn't leak core-RBS docs.

DSL pipeline (lib/tapioca/dsl/...)

  • Dsl::Compiler#compile_method_*_to_rbi falls back to DslSignatures.build when no Sorbet runtime sig is registered. Compiler itself now explicitly extend T::Generic and declares ConstantType = type_member.
  • ActiveModelTypeHelper.type_for also falls back to RBS comments for deserialize / cast / cast_value / serialize.

Cleanup

  • Deletes lib/tapioca/rbs/rewriter.rb, sorbet/rbi/shims/bootsnap.rbi, sorbet/rbi/gems/require-hooks@0.4.0.rbi.
  • Drops require-hooks from tapioca.gemspec and Gemfile.lock.
  • Removes dsl --only-bootsnap-rbs-cache and the related CLI / command code.
  • Removes the "Rewriting RBS comments to Sorbet signatures" + "Caching rewrites with Bootsnap" + "Priming the cache from CI" sections from the README, replacing them with a short "Inline RBS comments" note.
  • internal.rb keeps Module.include(T::Sig) so bare sig {} calls in user/gem code (and our specs) still work without extend T::Sig.

Test status

bin/test runs 788 tests with 0 failures, 0 errors, 2 pre-existing skips.

A couple of tests had to move with the design:

  • spec/tapioca/runtime/reflection_spec.rb now uses a real sig { ... } instead of relying on the rewriter to translate #: -> String to a runtime sig.
  • spec/tapioca/gem/pipeline_spec.rb updates the T.proc / T::Array[::String] / requires_ancestor { Kernel } expectations to the new ::T and ::Kernel qualification convention.
  • spec/tapioca/cli/dsl_spec.rb drops the bootsnap- and --only-bootsnap-rbs-cache-specific tests, renames the "preserves RBS comment rewriting" case to "exposes inline RBS method signatures to DSL compilers", and broadens a Ruby-version-sensitive stack trace regex.

Tapioca used to discover method signatures purely from Sorbet's runtime
reflection. To support inline `#:` / `# @...` RBS comments, it shipped a
require-hook (`lib/tapioca/rbs/rewriter.rb`) that, at every load, rewrote
sources into `sig {}` blocks so `sorbet-runtime` would track them. That
detour required `require-hooks`, made boot slower, and forced a separate
bootsnap cache to be remotely usable on large apps.

Read RBS straight from source instead. The gem pipeline always builds a
`Rubydex::Graph` (with core/stdlib RBS seeded for constant resolution),
and the listeners that needed a runtime signature now also accept inline
RBS comments. A matching path on the DSL side picks up `#:` sigs for
arbitrary host-app methods so DSL compilers see the same signatures
they used to see through the rewriter.

Highlights:

  - New `Tapioca::RBS::Comments`, `Tapioca::RBS::TypeQualifier`, and
    `Tapioca::RBS::DslSignatures` modules handle parsing, fully-qualified
    type rendering, and DSL-side lookup.
  - `Gem::Pipeline` exposes `gem_graph`, `rbs_comments_for_constant`,
    and `rbs_comments_for_method`. Listeners (`SorbetSignatures`,
    `SorbetHelpers`, `SorbetRequiredAncestors`, `SorbetTypeVariables`)
    surface `#: ...`, `# @abstract`, `# @requires_ancestor:`, and
    `#: [A, B]` from source.
  - `Dsl::Compiler#compile_method_*_to_rbi` and
    `ActiveModelTypeHelper.type_for` fall back to `DslSignatures.build`
    when no Sorbet runtime sig exists.
  - All `T::` and `T.*` constants are emitted fully qualified (`::T::Array`,
    `::T.proc`, ...). User-defined constants are resolved through Rubydex
    so relative references like `Bar` inside `Foo::Bar` become
    `::Foo::Bar`. Lexical nesting for anonymous classes is recovered from
    source via a small Prism visitor.
  - Removes `lib/tapioca/rbs/rewriter.rb`, the `require-hooks` dependency,
    the bootsnap shim, `dsl --only-bootsnap-rbs-cache`, and the
    `TAPIOCA_RBS_CACHE` README section.
paracycle added 2 commits May 28, 2026 22:52
The DSL pipeline memoizes its per-process Rubydex graph and snapshots the
list of `$LOADED_FEATURES` paths it was indexed against. Test suites
that don't fork between tests (e.g. `DslSpec`) end up sharing one
graph across tests, but each test `require`s its own freshly-written
fixture file under a different `tmp_path/lib/...`. The cached graph
never picks those up, so `DslSignatures.build` returns nil for any
method defined in the new file and the DSL compiler falls back to
`T.untyped`.

Track which paths the graph has already indexed and, on each `graph`
call, incrementally index whatever showed up in `$LOADED_FEATURES`
since last time. Rubydex's `Graph#index_all` + `Graph#resolve` is
incremental, so this is cheap: the no-new-files path is one set diff
and an early return.
Two related changes that simplify and tighten DSL-side RBS resolution
now that Rubydex (on the `expose-definition-lexical-nesting` branch)
ships `Definition#lexical_owner` and `Definition#lexical_nesting`:

  - `Tapioca::RBS::DslSignatures.nesting_for` reads the lexical nesting
    straight off the matching `Rubydex::Definition` instead of parsing
    the source again with Prism. The transformation into the shape
    `Graph#resolve_constant` wants — short names, outermost first, with
    `::Foo` markers for compound or absolute openings — is done in one
    place and covers plain nesting, `class Foo::Bar` compound paths,
    and `module ::Bar` absolute paths uniformly. The Prism-based
    `NestingVisitor` is gone.

  - `Static::SymbolLoader.graph_from_paths` now also accepts the gem's
    `.rbi` stub files (collected from `rbi/` in the gem directory) and
    feeds them in through `Rubydex::Graph#index_source`. RBI is plain
    Ruby, so the indexer just needs to see the content under a `.rb`
    URI. This recovers constants that only exist in a gem's
    native-code shim (e.g. `Rubydex::ConstantReference`), which used
    to resolve through runtime reflection under the old rewriter
    path and would otherwise produce unresolvable references in the
    generated RBI.

Also picks up the new `Definition#lexical_owner`/`lexical_nesting`
RBI surface in `sorbet/rbi/gems/rubydex@*.rbi`, points the Gemfile at
the in-flight Rubydex branch, fixes a few Sorbet errors my earlier
commits introduced (`added_any` typing, nilable `Definition#declaration`,
`RBI::Extend#names` vs `name`, the `Module` upper bound on
`Dsl::Compiler::ConstantType`), and switches the `T.must` cast in
`DslSignatures.graph` to an `#: as !nil` inline RBS so we stop using
`T.xxx` calls in the new code paths.
@paracycle paracycle added the enhancement New feature or request label May 28, 2026
paracycle added 7 commits May 29, 2026 00:47
Shopify/rubydex#832 has landed on main, so the dedicated branch is gone
and the lexical-nesting API ships from main going forward.

Also re-runs `tapioca gem rubydex` to refresh the gem RBI against the
post-merge commit.
The inline RBS \`#: [ConstantType < Module[top]]\` annotation on
\`Tapioca::Dsl::Compiler\` is the source of truth for the class's
generic shape; the explicit \`extend T::Generic\` / \`ConstantType =
type_member\` lines I previously added are redundant duplication
of the same statement in a different idiom.

DSL compiler subclasses that need a refined \`ConstantType\` either
keep using the inline \`#: [ConstantType = ...]\` form (no runtime
\`type_member\` needed — \`constant\` is a plain instance variable)
or, if they prefer the explicit runtime form, declare \`extend
T::Generic\` and \`ConstantType = type_member { { fixed: ... } }\`
themselves. Both styles work.

Updates the \`compiler_spec.rb\` fixtures to use the inline RBS form
(they relied on the parent being generic at runtime, which is now
no longer the case) and adds the same Rubydex pin to
\`MockProject#tapioca_gemfile\` so subprocess test runs pick up the
\`Definition#lexical_owner\`/\`lexical_nesting\` API.
\`Runtime::Reflection.signature_of\` used to leak Sorbet's raw
\`T::Private::Methods::Signature\` to every caller. That meant
\`SorbetSignatures#compile_signature\`, \`Dsl::Compiler\`,
\`ActiveModelTypeHelper\`, and \`GraphqlTypeHelper\` all reached into
the same set of internals — \`arg_types\` / \`kwarg_types\` /
\`rest_type\` / \`block_name\` / \`mode\` / \`owner\` / \`method_name\`
— and reimplemented the same \"build a positional type list\",
\"sanitize the return type\", \"is this signature final?\" logic in
slightly different shapes.

Introduces \`Tapioca::Runtime::Signature\` as a small abstract type
with one initial concrete impl, \`SorbetSignature\`, that wraps the
old object. The interface is deliberately high-level and exposes
only what callers actually need:

  - \`method\` — the canonical \`UnboundMethod\` the sig is attached
    to.
  - \`parameter_type_strings\` — positional type strings,
    post-sanitization. Encapsulates the arg/kwarg/rest/keyrest/block
    plumbing that used to live in both \`compile_signature\` and
    \`Dsl::Compiler#parameters_types_from_signature\`.
  - \`return_type_string\` — sanitized return type.
  - \`valid_return_type_string\` — same, but \`nil\` when the type
    string is meaningless (\`void\`, \`T.untyped\`, \`T.noreturn\`,
    \`<NOT-TYPED>\`, ...).
  - \`valid_first_arg_type_string\` — first positional argument's
    sanitized type, or \`nil\` when meaningless. Replaces the only
    surviving \`arg_types.dig(0, 1)\` consumer.
  - \`compile_to_rbi_sig(parameters, &push_symbol)\` — emits an
    \`RBI::Sig\`. The body of the old \`SorbetSignatures#compile_signature\`
    plus the final-method lookup lifted onto the type itself.

Caller migrations:

  - \`SorbetSignatures#on_method\` collapses into a single
    \`signature.compile_to_rbi_sig(event.parameters) { |sym| @pipeline.push_symbol(sym) }\`
    call. \`compile_signature\` and \`signature_final?\` are gone.
  - \`Methods#compile_method\`'s writer-method detection no longer
    inspects \`signature.arg_types.size\` — it was a redundant cross-check
    against \`method.parameters.size\`, which we already test.
  - \`Dsl::Compiler#parameters_types_from_signature\` keeps the same
    public shape and delegates to \`signature.parameter_type_strings\`.
    \`compile_method_return_type_to_rbi\` delegates to
    \`signature.return_type_string\`.
  - \`ActiveModelTypeHelper#lookup_return_type_of_method\` becomes
    \`signature.valid_return_type_string\`. \`lookup_arg_type_of_method\`
    becomes \`signature.valid_first_arg_type_string\`. The
    \`MEANINGLESS_TYPES\` / \`MEANINGLESS_TYPE_STRINGS\` filtering
    moves onto \`Signature\` (\`MEANINGLESS_TYPE_STRINGS\` is now a
    shared constant; the runtime-type sentinels stay private to
    \`SorbetSignature\`).
  - \`GraphqlTypeHelper\` swaps \`signature&.return_type\` +
    \`valid_return_type?\` checks for \`signature&.valid_return_type_string\`.
    The Scalar branch's \`T::Utils.unwrap_nilable\` becomes
    \`RBIHelper.as_non_nilable_type\` on the resulting string, which is
    the same transformation but expressed at string level.

Types tightened: \`MethodNodeAdded#signature\` and \`Pipeline#push_method\`'s
\`signature\` parameter both move from \`untyped\` to
\`Tapioca::Runtime::Signature?\`.

No behavioural change for downstream callers; this is purely a
refactor that prepares the ground for an \`RbsSignature\`
implementation to land in a follow-up.
The DSL pipeline used to translate inline `#: ...` comments into a
bare `RBI::Sig` via `Tapioca::RBS::DslSignatures.build` and then have
every consumer (`Dsl::Compiler#rbs_*`, `ActiveModelTypeHelper`'s
fallback branches) reach into the resulting `RBI::Sig` directly —
`params.first.type.to_s`, `return_type.to_s`, the meaningless-type
filter, etc. — duplicating the same surface that `SorbetSignature`
already encapsulates.

Wrap the parsed sig in a new `Tapioca::Runtime::RbsSignature`
subclass of `Tapioca::Runtime::Signature`. It carries the original
method, the qualified `RBI::Sig`, and the RBS method-level
annotations (`# @abstract`, `# @override`,
`# @without_runtime`, ...). The interface is the same one
`SorbetSignature` already exposes:

  - `method`
  - `parameter_type_strings`
  - `return_type_string` / `valid_return_type_string`
  - `valid_first_arg_type_string`
  - `compile_to_rbi_sig(parameters) { |sym| ... }`

`compile_to_rbi_sig` is where the RBS-specific bits — annotation
application, the `method_added` / `singleton_method_added`
`without_runtime` rule — finally live in one place instead of
being inlined into each consumer.

`# @without_runtime` is back to driving `sig.without_runtime = true`
on the emitted RBI sig (rather than dropping the sig entirely, which
was a holdover from the rewriter days that didn't make sense for
static RBI generation). The spec that previously asserted the
without-runtime method had no sig is updated to expect the
`T::Sig::WithoutRuntime.sig` form Sorbet's static checker wants.

Caller migrations:

  - `Tapioca::RBS::DslSignatures.build` returns `RbsSignature?` and
    folds annotation harvesting into the construction.
  - `Dsl::Compiler#rbs_parameter_types_for` and
    `Dsl::Compiler#rbs_return_type_for` delegate to
    `signature.parameter_type_strings` / `return_type_string`.
  - `ActiveModelTypeHelper#lookup_return_type_of_method` and
    `lookup_arg_type_of_method` collapse from two branches into one
    `signature&.valid_…_string` call sourced from a single
    `lookup_signature_of_method` that picks Sorbet sig first and
    RBS sig as fallback.
  - `Signature#method` widens to `(Method | UnboundMethod)` to
    accommodate the DSL-side `obj.method(:foo)` call sites; the gem
    pipeline narrows back to `UnboundMethod` at its call site via
    `Method#unbind`.

The gem-pipeline-side `MethodNodeAdded#rbs_lookup` /
`Pipeline::RBSMethodLookup` / `SorbetSignatures#compile_rbs_lookup`
path stays as-is for now — that's the next commit. This commit
just lays the polymorphic groundwork so the DSL side already runs
through it.

The pre-existing `T.must` typecheck error in
`Dsl::Compiler#compile_method_parameters_to_rbi` is also gone as a
side effect: `parameters_types_from_signature` now returns a
concrete `Array[String]`, so `method_types[index]` is `String?`
(not `T.untyped`) and the `T.must` is no longer redundant.
Both paths now build an `RbsSignature` via the shared
`Tapioca::RBS::SignatureBuilder`: parse the `#:` comments, translate to
`RBI::Sig`, qualify every constant against a Rubydex graph for the
declaration's lexical scope. They differ only in which graph they pass
in — workspace vs. gem.

`Reflection.signature_of` now takes an optional block as the RBS lookup
override: callers that need a non-default scope (the gem-RBI pipeline)
pass one; everything else gets the workspace-scoped `DslSignatures.build`
by default. `compile_to_rbi_sig` returns `Array[RBI::Sig]` so RBS
overloads survive the polymorphic interface.

This deletes ~120 lines of duplicated translate/qualify/annotate code
from `SorbetSignatures` and `DslSignatures`, drops the `RBSMethodLookup`
wrapper and the `MethodNodeAdded#rbs_lookup` plumbing, and lets the gem
listener collapse to a single `signature.compile_to_rbi_sig` call for
both backends.
0.2.5 ships `Definition#lexical_owner` and `Definition#lexical_nesting`,
which is the only Rubydex API this branch needed from the unreleased
main. Now that it's out, we can drop the github pins from the dev
Gemfile and from MockProject's subprocess Gemfile and bump the gemspec
floor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant