Skip to content

fix(generator): all 54 specs compile (gitea Swagger 2.0 skipped)#19

Merged
lightsofapollo merged 5 commits into
mainfrom
fix/spec-compile-49
May 9, 2026
Merged

fix(generator): all 54 specs compile (gitea Swagger 2.0 skipped)#19
lightsofapollo merged 5 commits into
mainfrom
fix/spec-compile-49

Conversation

@lightsofapollo
Copy link
Copy Markdown
Contributor

@lightsofapollo lightsofapollo commented May 8, 2026

Summary

All 54 OpenAPI 3.x specs in specs/ now compile cleanly via cargo check. The single remaining unsupported spec is gitea (Swagger 2.0 — out of scope by the version gate). Local scripts/spec-compile.sh (no args) takes ~12 minutes and emits a clean 54 passed, 0 failed, 1 skipped summary.

Generation Compile rate
Pre-conformance work 2/54
#16 (anthropic+openai hardcoded) 2/54
#17 (initial gold list) 20/54
#18 (broader corpus) 43/54
This PR 54/54

Bugs fixed across the run (highlights)

This PR consolidates a long iterative push. The most impactful fixes:

  1. Aligned typed-error-enum fallback arm. When op had only default (no specific 4xx/5xx) error responses, codegen emitted typed enum but fallback tried Some(serde_json::Value). Cascaded to fix google-* (4 specs), lithic, and others.

  2. Indirect cycles via union wrappers (Box variant payloads in recursive_schemas). Closed stripe (17→0), microsoft-graph (5→0), lithic (1535→0).

  3. Reserved std type names. Cloudflare's Result schema shadowed std::result::Result, breaking every Result<T, ApiOpError<...>>. Reserved-name set now appends Type. Single fix collapsed 15,559 cloudflare errors to 14.

  4. Path-template var auto-synthesis. Specs that reference {owner} etc. in path without declaring them as parameters now get a synthesized String parameter. Closed langsmith (21→0), knocklabs (5→0).

  5. OneOf nullable-pattern wrapper unwrap. oneOf: [$ref X, null] no longer synthesizes a wrapper named the same as the inner ref. Closed discord (19→0).

  6. Same path-template variable used twice. /accounts/{account_id}/.../accounts/{account_id} now correctly emits two format args. Closed 2 of cloudflare's 14.

  7. Rust-name self-reference via spec-name collision. When two distinct spec schemas PascalCase to the same Rust ident, a "cross-reference" becomes a self-reference after emission-time dedup. Now Box defensively. Closed cloudflare (14→0).

  8. Type-alias chain self-reference. cal-com's malformed oneOf:[$ref Self] produced pub type X = Y; struct Y { data: X }. New target_aliases_back_to walks the alias chain to detect cycles. Closed cal-com (3→0).

Plus the prior batch from earlier in this same branch: r#self panic, operationId collision auto-disambiguation (case-insensitive), Extensions lenient mode with strict accessor, exclusiveMinimum: bool|number, Vec<...> Ident panic via syn::parse_str, enum variant negative-number disambiguation, Twilio-style filter-op suffix mapping, version gate in TOML flow, struct field name dedup, enum variant case-collision dedup, self-ref union → Box, format-string param dedup at codegen, lazy nullable-anyOf unwrap, $ref shape variants, reserved Rust 2024 keywords (gen), Default derive guard for no-match enums, sort-enum negative-prefix dedup, per-method param ident collisions via analyzer-side rust_ident.

CI scope

Trimmed spec-compile job to just anthropic + openai (production targets). Full-corpus checking exceeded the 6-hour CI job limit on microsoft-graph alone (~42 min in CI). Local scripts/spec-compile.sh is the right place for that breadth — runs in ~12 min on a developer machine.

Quality follow-ups (in bd)

5 quality beads filed for the next iteration: Q1 method-name canonicalization, Q2 format-typed scalars, Q3 builder pattern (blocked by Q1), Q4 tagged discriminator enums, Q5 Display for ApiOpError. .beads/issues.jsonl committed.

Test plan

  • cargo test --tests — 205/205 pass
  • cargo clippy --all-features -- -D warnings — clean
  • cargo fmt --check — clean
  • scripts/spec-compile.sh — 54/54 PASS, 1 SKIP (gitea Swagger 2.0)
  • CI spec-compile job passes on anthropic + openai (will verify on push)

🤖 Generated with Claude Code

…ing set

Running scripts/spec-compile.sh against all 54 OpenAPI 3.x specs in the
repo (gitea is Swagger 2.0, skipped) surfaced six classes of generator
bugs. Fixed the ones that move the most specs from FAIL → PASS:

1. `r#self` panic
   `self`, `super`, `crate`, `Self` cannot be raw identifiers in Rust —
   proc_macro2 panics outright. Spec fields named `self` (datadog-v2,
   github, microsoft-graph, snyk, …) hit this. Fix: rename to
   `<keyword>_field` / `<keyword>_param` instead of `r#<keyword>`.

2. operationId collisions reject whole documents
   T6's strict-error policy was correct per spec but real-world docs
   (arcade, cal-com, telnyx, val-town, …) often violate it. Fix:
   auto-disambiguate by suffixing with HTTP method (`opId_post`,
   `opId_put`), and a counter on further collisions, with a stderr
   warning. Spec validity is recoverable; whole-document rejection is not.

3. Extensions reject non-`x-*` keys
   Real specs sprinkle non-`x-` fields in places they don't belong
   (`produces`/`in`/`type`/`density`/`title`/`description` were observed).
   Fix: Extensions now accepts any leftover key but exposes
   `non_extension_keys()` so silent drops remain visible — the CLI can
   warn instead of erroring.

4. exclusiveMinimum: bool vs number
   3.0/Swagger used `bool`; 3.1 (JSON Schema 2020-12) uses `number`.
   Fix: model as a `bool | f64` enum.

5. `Vec<serde_json::Value>` Ident panic
   generate_array_item_type split on "::" but produced strings with
   angle brackets that aren't valid idents. Fix: parse via
   `syn::parse_str::<syn::Type>` first.

6. enum variant collisions on signed numbers
   `1` and `-1` both produced `Variant1`. Fix: prefix negatives with
   `Neg` (e.g. `VariantNeg1`).

7. Twilio-style filter param ident collisions
   `StartTime`, `StartTime<`, `StartTime>` all snake-cased to
   `start_time`. Fix: map `<`, `>`, `<=`, `>=` to `_lt`/`_gt`/`_lte`/
   `_gte` in sanitize_param_name. Twilio went from CHECK-FAIL to PASS.

8. Version gate didn't run in TOML config flow
   The `generate` subcommand in src/bin/openapi-to-rust.rs has its own
   pipeline that bypasses cli::run_generation_cli. Mirrored the version
   check so Swagger 2.0 specs (gitea) error early with a clear hint
   instead of failing later inside the deserializer.

scripts/spec-compile.sh
- Auto-discovers specs/*.{yaml,json}.
- Skips Swagger 2.0 with a SKIP marker (gitea).
- Optional SPEC_COMPILE_PARSE_ONLY=1 for quick generator-only checks.
- Optional SPEC_COMPILE_LIMIT=N / positional whitelist of names.

ci(spec-compile)
The job now compiles a "gold list" of 20 specs that pass cleanly:
anthropic, asana, browserbase, cartesia, cerebras, coda, coingecko,
digitalocean, groq, imagekit, launchdarkly, meta-llama, openai, resend,
runway, spotify, terminal-shop, twilio, val-town, writer. Local
`scripts/spec-compile.sh` (no args) still runs the full corpus. The
remaining 34 specs surface other generator bugs (E0308 type mismatches,
E0428 name collisions in github, E0117 orphan rule violations in
stripe, E0072 recursive type sizing in snyk) — tracked in #14 as
follow-ups.

All 205 unit tests still pass; clippy + fmt clean.

Refs #14
Running scripts/spec-compile.sh (no args) against all 54 OpenAPI 3.x
specs in specs/ — gitea is Swagger 2.0, skipped — surfaced eight more
classes of generator bugs after the initial 20-spec gold list. This PR
fixes them and broadens the CI gold list to 43 specs.

Bugs fixed (in order of impact):

1. **Type-name collisions across emitted types.** Two analyzed schemas
   that PascalCase to the same Rust ident (e.g. box's component
   `ClassificationTemplate` struct + an inline single-value enum
   synthesized from `Classification.$template`) yielded two definitions
   in types.rs with the same name → E0119 (conflicting impls) +
   E0428 (defined multiple times). Fix: dedup at emission time in
   generator::generate_types — the first occurrence wins, later ones
   are silently dropped.

2. **Struct field name collisions.** Properties whose names sanitize
   to the same Rust ident (`connectionString` and `connection_string`
   in supabase) emitted duplicate fields. Fix: per-struct uniqueness
   tracking with `_2`/`_3` suffixes.

3. **Enum variant case-collision.** `["ASC","DESC","asc","desc"]`
   collapsed to two `Asc`/`Desc` variants. Same in client.rs sort
   enums (`["created_at","-created_at"]`). Fix: dedup in
   generate_string_enum and generate_single_param_enum.

4. **Self-referential union variant → infinite-size enum.**
   microsoft-graph had oneOf wrappers like
   `pub enum X { X(X), Variant2(...) }`. Box the self-ref to break
   the cycle.

5. **Nullable-anyOf wrapper collisions with the inner $ref.**
   `Step.status: anyOf [$ref StepStatus, null]` synthesized a wrapper
   named `StepStatus` that overwrote the actual top-level schema.
   Fix: detect `is_nullable_pattern` in property analysis and unwrap
   to the inner type. Also, when a wrapper IS needed, suffix
   collisions with `Union2`/`Union3`.

6. **`$ref` shape variants.** Real-world specs use:
   - `#/definitions/X` (Swagger 2.0 carry-over in google-tasks).
     Recognise as alias for `#/components/schemas/X`.
   - `#/components/parameters/X/schema` (pagerduty). Last segment
     "schema" isn't a type name. Tighten extract_schema_name to
     filter unsupported shapes; fall back to serde_json::Value
     instead of failing whole-document analysis.

7. **Per-method parameter ident collisions.** Two parameters in the
   same operation that snake-case to the same name (vercel's
   `exclude_ids` + `exclude-ids`, modern-treasury duplicate `name`)
   produced E0382 / E0415. Fix: analyzer assigns a unique
   `rust_ident` to each ParameterInfo at operation scope; client
   generator consults it everywhere.

8. **Empty/non-string enum values.** gitpod has
   `type: string, enum: [2000, 5000, 10000, ...]` (numeric values on
   a string-typed schema). string_enum_values used to filter to .as_str
   only, producing an empty Vec → empty enum (E0665, E0004). Fix:
   coerce non-string scalars via Display.

CI: spec-compile job now exercises 43 specs (up from 20). Local
`scripts/spec-compile.sh` (no args) still runs the full corpus for
exploring the remaining 11 failures (cal-com, cloudflare, discord,
gcore, knocklabs, langsmith, lithic, microsoft-graph, stripe, telnyx,
vercel — tracked under #14).

All 205 unit tests still pass; clippy + fmt clean.

Refs #14
Continued chasing real-world spec failures through scripts/spec-compile.sh.
49 of 54 OpenAPI 3.x specs in specs/ now compile cleanly via cargo check
(gitea is Swagger 2.0, skipped). Up from 43 in #18.

## Bugs fixed (in order of how many specs they unblocked)

1. **Wrong fallback arm for typed-error enums.** When an op had only
   `default` (no specific 2xx) error responses, op_error_type emitted the
   typed enum but the codegen's "no typed enum" arm tried `typed = Some(v)`
   where v: serde_json::Value, mismatching the typed slot. Aligned the
   conditions in client_generator.rs:1206 so the default arm becomes
   `typed = None` whenever any non-2xx response exists.

2. **Indirect cycles via union wrappers.** stripe's
   BankAccount → BankAccountCustomer (enum) → Customer →
   BankAccountCustomer cycle wasn't direct self-reference, so my prior
   self-ref Box fix didn't catch it. generate_union_enum and
   generate_discriminated_enum now also Box variant payloads whose target
   is in analysis.dependencies.recursive_schemas. Closed stripe (17 errs
   → 0), microsoft-graph (5 → 0), lithic (1535 → 0).

3. **Reserved std type names.** cloudflare has a schema literally named
   `Result`; emitting `pub enum Result` shadows std::result::Result,
   breaking every `-> Result<T, ApiOpError<...>>`. Also gcore had a
   `Default` schema shadowing std::default::Default. to_rust_type_name
   now appends `Type` to a small reserved-name set (Result, Option, Box,
   Vec, String, Default, Clone, Debug, Send, Sync, Sized, Iterator, From,
   Into, TryFrom, TryInto, AsRef, AsMut, Some, None, Ok, Err).

4. **Rust 2024 keyword `gen`.** vercel had fields/types named `gen`.
   Added to is_rust_keyword.

5. **Default derive on enum with no variant matching default.** telnyx
   has `default: "en"` on a language enum with values like `en-US`,
   `en-AU`, … — no exact match. We were emitting `#[derive(Default)]`
   without `#[default]` on any variant, triggering E0665. Now we drop
   the Default derive when no variant matches.

6. **Sort-enum negative-prefix collisions.** telnyx and gcore use
   `["created_at", "-created_at", "ASC", "-ASC", …]` for sort orders.
   Both PascalCased to the same Rust variant, causing E0428 on the
   inline param enum. generate_single_param_enum now dedupes variant
   names with `_2`/`_3`/… suffixes.

7. **Per-method parameter ident collisions.** vercel's
   `exclude_ids` + `exclude-ids`, modern-treasury's duplicate `name`,
   twilio's `StartTime`/`StartTime>` produced E0382 (use of moved value)
   and E0415 (binding declared twice) in generated bodies. Added
   `ParameterInfo.rust_ident` populated by the analyzer at operation
   scope; client_generator.rs consults it everywhere instead of
   sanitizing param.name independently per call site.

8. **Case-sensitive operationId collision detection.** telnyx had two
   ops with operationIds `getMdrUsageReports` and `GetMdrUsageReports`.
   These didn't collide string-wise but PascalCased to the same Rust
   ident, producing two `GetMdrUsageReportsApiError` enum definitions
   (E0428). T6's collision check now compares PascalCased forms.

9. **Non-string scalars in `enum`.** gitpod has
   `type: string, enum: [2000, 5000, 10000, ...]` — numeric values on a
   string-typed schema. string_enum_values used to filter to .as_str()
   only, producing an empty Vec → empty enum (E0665, E0004). Now
   coerces non-string scalars via Display.

10. **Unresolvable $refs.** pagerduty uses
    `#/components/parameters/foo/schema` (last segment `schema` isn't
    a type name). google-tasks uses Swagger 2.0 carry-over
    `#/definitions/Foo`. extract_schema_name now (a) recognises
    `#/definitions/{X}` as an alias for `#/components/schemas/{X}`,
    (b) tightens the last-segment fallback to require PascalCase and
    skip JSON Schema sub-path keywords, and (c) when a ref still
    can't be resolved, falls back to serde_json::Value with a stderr
    warning instead of failing whole-document analysis.

11. **Nullable-anyOf wrapper collisions with the inner $ref.**
    `Step.status: anyOf [$ref StepStatus, null]` synthesized a
    wrapper named `StepStatus` that overwrote the actual top-level
    schema. Detect `is_nullable_pattern` in property analysis and
    unwrap to the inner type. When a wrapper IS needed, suffix
    collisions with `Union2`/`Union3`.

12. **Type-name dedup at emission.** Defensive layer: if two
    analyzed schemas PascalCase to the same Rust ident, the first
    occurrence wins and later ones are silently dropped (catches
    cases where analysis missed the collision).

## CI

The spec-compile job now exercises 49 specs, up from 43:
  + gcore lithic microsoft-graph stripe telnyx vercel

## Quality follow-ups tracked in `bd` (`.beads/issues.jsonl`)

- Q1 Method-name canonicalization
- Q2 Format-typed scalars (date-time, uuid, byte, binary, ipv*, uri)
- Q3 Builder pattern for ops with many parameters (depends on Q1)
- Q4 Tagged discriminator enums
- Q5 Display for ApiOpError that surfaces the typed body

All 205 unit tests still pass; clippy + fmt clean.

Refs #14
The remaining 5 failing specs from #19 all flip to PASS with this batch.
Verified locally: scripts/spec-compile.sh runs all 54 → 54 PASS, 1 SKIP
(gitea / Swagger 2.0).

## Bugs fixed

1. **Path-template variables not declared as parameters.** langsmith,
   knocklabs, and cloudflare have paths like
   `/v1/repos/{owner}/{repo}/...` where the spec declares only `repo`
   (or none). Generated code emitted
   `format!("/repos/{owner}/{}", repo)` and `owner` wasn't in scope
   (E0425). The analyzer now scans the path template for `{var}`
   placeholders and synthesizes a required `String` parameter for any
   that aren't already declared. Logs a warning per occurrence.
   Closed langsmith (21 errs → 0) and knocklabs (5 → 0).

2. **OneOf nullable-pattern wrapper collisions.** Discord's
   `QuarantineUserAction.metadata: oneOf [null, $ref
   QuarantineUserActionMetadata]` synthesized a wrapper named
   `QuarantineUserActionMetadata` that overwrote the real top-level
   schema, producing E0425 "type not found". My earlier nullable-
   pattern unwrap only handled anyOf; now also handles oneOf. Same
   collision-suffix dance on the wrapper name when it's needed.
   Closed discord (19 → 0).

3. **Same path-template variable used twice.** Cloudflare has
   `/accounts/{account_id}/.../accounts/{account_id}` — same name
   used twice. The old `replace_all` produced two `{}` placeholders
   but only one format arg, triggering E0277 ("3 positional
   arguments in format string, but there are 2"). The URL builder now
   walks the path char-by-char and emits one `{}` + one format arg
   per occurrence. Closed 2 of cloudflare's 14 errors.

4. **Rust-name self-reference via spec-name collision.** Cloudflare
   has two distinct schemas (`dns-firewall_dns-firewall-reverse-dns-
   response` and `dns-firewall_dns_firewall_reverse_dns_response`)
   that PascalCase to the same Rust ident. After my emission-time
   dedup drops one, what looked like a cross-reference at the spec
   level becomes a self-reference at the Rust level (E0072 infinite
   size). generate_field_type now also Boxes when target's Rust name
   == enclosing struct's Rust name, regardless of dependency graph.
   Closed cloudflare (14 → 0).

5. **Type-alias chain self-reference.** cal-com's spec literally has
   `oneOf:[$ref Self], allOf:[$ref Self]` for a property — a
   circular reference. Our generator emits a type alias
   `pub type ReassignBookingOutput20240813Data =
   ReassignBookingOutput20240813;` and the parent struct has
   `data: ReassignBookingOutput20240813Data` → E0072. Added
   target_aliases_back_to: walks the analysis's type-alias chain (up
   to depth 16) and Boxes the field if the chain reaches the
   enclosing struct's Rust name. Closed cal-com (3 → 0).

## CI scope

Trimmed `spec-compile` job from 49 specs to **just anthropic + openai**
(the production-target specs). Sentry's full-corpus check exceeded the
6-hour CI job limit on microsoft-graph alone (~42 minutes per spec on
a free runner; ~10x slower than local). Local
`scripts/spec-compile.sh` (no args) still verifies all 54 — the right
place for that level of coverage.

All 205 unit tests still pass; clippy (`-D warnings`) + fmt clean.

Refs #14
@lightsofapollo lightsofapollo changed the title fix(generator): compile 49 of 54 specs (was 43); broaden CI gold list fix(generator): all 54 specs compile (gitea Swagger 2.0 skipped) May 9, 2026
@lightsofapollo lightsofapollo merged commit 0c0e9e3 into main May 9, 2026
5 checks passed
@lightsofapollo lightsofapollo deleted the fix/spec-compile-49 branch May 9, 2026 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant