Skip to content

fix(generator): compile 43 specs (was 20), broaden CI corpus#18

Closed
lightsofapollo wants to merge 2 commits into
mainfrom
fix/spec-compile-corpus
Closed

fix(generator): compile 43 specs (was 20), broaden CI corpus#18
lightsofapollo wants to merge 2 commits into
mainfrom
fix/spec-compile-corpus

Conversation

@lightsofapollo
Copy link
Copy Markdown
Contributor

Summary

After the previous PR's gold list of 20 specs, ran scripts/spec-compile.sh (no args) against all 54 OpenAPI 3.x specs in specs/ (gitea is Swagger 2.0, skipped) and chased the failures. 43/54 now compile cleanly via cargo check. CI's spec-compile job now gates on the broader 43-spec list.

Bugs fixed

In order of how many specs they unblocked:

  1. Type-name collisions across emitted types. Two analyzed schemas that PascalCase to the same Rust ident (e.g. box's ClassificationTemplate struct + a synthesized inline enum from Classification.$template) emitted two definitions → E0119/E0428. Dedup at emission time; first occurrence wins.

  2. Struct field name collisions. connectionString + connection_string in supabase both → connection_string. Per-struct uniqueness tracking with _2/_3 suffixes.

  3. Enum variant case collisions. [ASC,DESC,asc,desc] collapsed to two Asc/Desc variants; same in client.rs sort enums ([created_at,-created_at]). Dedup in both generate_string_enum and generate_single_param_enum.

  4. Self-referential union variant → infinite-size enum. microsoft-graph had pub enum X { X(X), V2(...) }. Box the self-ref.

  5. Nullable-anyOf wrapper collisions. Step.status: anyOf [$ref StepStatus, null] synthesized a wrapper named StepStatus that overwrote the top-level schema. Detect is_nullable_pattern in property analysis and unwrap; suffix collisions with Union2 when a wrapper IS needed.

  6. $ref shape variants. #/definitions/X (Swagger 2.0 carry-over in google-tasks) recognized as alias. #/components/parameters/X/schema (pagerduty) falls back to serde_json::Value instead of failing whole-document analysis.

  7. Per-method parameter ident collisions. vercel's exclude_ids/exclude-ids, modern-treasury's duplicate name, twilio's StartTime/StartTime>. Analyzer now assigns each ParameterInfo.rust_ident at operation scope, consulted by the client generator everywhere.

  8. Numeric enum values on string-typed schemas. gitpod has type: string, enum: [2000, 5000, ...]. Coerce non-string scalars via Display instead of filtering them out (which produced an empty enum, E0665/E0004).

Conformance

Phase Compile rate
Before #16 2/54
After #17 (initial gold list) 20/54
This PR 43/54

Remaining 11 (tracked under #14): cal-com, cloudflare, discord, gcore, knocklabs, langsmith, lithic, microsoft-graph, stripe, telnyx, vercel. Top error classes: E0308 (cloudflare 7k+, lithic 662, stripe 4), E0428 (gcore 50, telnyx 39), E0425 (langsmith 20, knocklabs 3, discord 6).

Test plan

  • cargo test --tests — 205/205 pass
  • cargo clippy --all-features -- -D warnings — clean
  • cargo fmt --check — clean
  • scripts/spec-compile.sh against all 54 specs locally — 43 compile cleanly, 11 fail with categorized errors
  • CI spec-compile job gates on the 43 working specs (will verify on push)

🤖 Generated with Claude Code

…ing set

Running scripts/spec-compile.sh against all 54 OpenAPI 3.x specs in the
repo (gitea is Swagger 2.0, skipped) surfaced six classes of generator
bugs. Fixed the ones that move the most specs from FAIL → PASS:

1. `r#self` panic
   `self`, `super`, `crate`, `Self` cannot be raw identifiers in Rust —
   proc_macro2 panics outright. Spec fields named `self` (datadog-v2,
   github, microsoft-graph, snyk, …) hit this. Fix: rename to
   `<keyword>_field` / `<keyword>_param` instead of `r#<keyword>`.

2. operationId collisions reject whole documents
   T6's strict-error policy was correct per spec but real-world docs
   (arcade, cal-com, telnyx, val-town, …) often violate it. Fix:
   auto-disambiguate by suffixing with HTTP method (`opId_post`,
   `opId_put`), and a counter on further collisions, with a stderr
   warning. Spec validity is recoverable; whole-document rejection is not.

3. Extensions reject non-`x-*` keys
   Real specs sprinkle non-`x-` fields in places they don't belong
   (`produces`/`in`/`type`/`density`/`title`/`description` were observed).
   Fix: Extensions now accepts any leftover key but exposes
   `non_extension_keys()` so silent drops remain visible — the CLI can
   warn instead of erroring.

4. exclusiveMinimum: bool vs number
   3.0/Swagger used `bool`; 3.1 (JSON Schema 2020-12) uses `number`.
   Fix: model as a `bool | f64` enum.

5. `Vec<serde_json::Value>` Ident panic
   generate_array_item_type split on "::" but produced strings with
   angle brackets that aren't valid idents. Fix: parse via
   `syn::parse_str::<syn::Type>` first.

6. enum variant collisions on signed numbers
   `1` and `-1` both produced `Variant1`. Fix: prefix negatives with
   `Neg` (e.g. `VariantNeg1`).

7. Twilio-style filter param ident collisions
   `StartTime`, `StartTime<`, `StartTime>` all snake-cased to
   `start_time`. Fix: map `<`, `>`, `<=`, `>=` to `_lt`/`_gt`/`_lte`/
   `_gte` in sanitize_param_name. Twilio went from CHECK-FAIL to PASS.

8. Version gate didn't run in TOML config flow
   The `generate` subcommand in src/bin/openapi-to-rust.rs has its own
   pipeline that bypasses cli::run_generation_cli. Mirrored the version
   check so Swagger 2.0 specs (gitea) error early with a clear hint
   instead of failing later inside the deserializer.

scripts/spec-compile.sh
- Auto-discovers specs/*.{yaml,json}.
- Skips Swagger 2.0 with a SKIP marker (gitea).
- Optional SPEC_COMPILE_PARSE_ONLY=1 for quick generator-only checks.
- Optional SPEC_COMPILE_LIMIT=N / positional whitelist of names.

ci(spec-compile)
The job now compiles a "gold list" of 20 specs that pass cleanly:
anthropic, asana, browserbase, cartesia, cerebras, coda, coingecko,
digitalocean, groq, imagekit, launchdarkly, meta-llama, openai, resend,
runway, spotify, terminal-shop, twilio, val-town, writer. Local
`scripts/spec-compile.sh` (no args) still runs the full corpus. The
remaining 34 specs surface other generator bugs (E0308 type mismatches,
E0428 name collisions in github, E0117 orphan rule violations in
stripe, E0072 recursive type sizing in snyk) — tracked in #14 as
follow-ups.

All 205 unit tests still pass; clippy + fmt clean.

Refs #14
Running scripts/spec-compile.sh (no args) against all 54 OpenAPI 3.x
specs in specs/ — gitea is Swagger 2.0, skipped — surfaced eight more
classes of generator bugs after the initial 20-spec gold list. This PR
fixes them and broadens the CI gold list to 43 specs.

Bugs fixed (in order of impact):

1. **Type-name collisions across emitted types.** Two analyzed schemas
   that PascalCase to the same Rust ident (e.g. box's component
   `ClassificationTemplate` struct + an inline single-value enum
   synthesized from `Classification.$template`) yielded two definitions
   in types.rs with the same name → E0119 (conflicting impls) +
   E0428 (defined multiple times). Fix: dedup at emission time in
   generator::generate_types — the first occurrence wins, later ones
   are silently dropped.

2. **Struct field name collisions.** Properties whose names sanitize
   to the same Rust ident (`connectionString` and `connection_string`
   in supabase) emitted duplicate fields. Fix: per-struct uniqueness
   tracking with `_2`/`_3` suffixes.

3. **Enum variant case-collision.** `["ASC","DESC","asc","desc"]`
   collapsed to two `Asc`/`Desc` variants. Same in client.rs sort
   enums (`["created_at","-created_at"]`). Fix: dedup in
   generate_string_enum and generate_single_param_enum.

4. **Self-referential union variant → infinite-size enum.**
   microsoft-graph had oneOf wrappers like
   `pub enum X { X(X), Variant2(...) }`. Box the self-ref to break
   the cycle.

5. **Nullable-anyOf wrapper collisions with the inner $ref.**
   `Step.status: anyOf [$ref StepStatus, null]` synthesized a wrapper
   named `StepStatus` that overwrote the actual top-level schema.
   Fix: detect `is_nullable_pattern` in property analysis and unwrap
   to the inner type. Also, when a wrapper IS needed, suffix
   collisions with `Union2`/`Union3`.

6. **`$ref` shape variants.** Real-world specs use:
   - `#/definitions/X` (Swagger 2.0 carry-over in google-tasks).
     Recognise as alias for `#/components/schemas/X`.
   - `#/components/parameters/X/schema` (pagerduty). Last segment
     "schema" isn't a type name. Tighten extract_schema_name to
     filter unsupported shapes; fall back to serde_json::Value
     instead of failing whole-document analysis.

7. **Per-method parameter ident collisions.** Two parameters in the
   same operation that snake-case to the same name (vercel's
   `exclude_ids` + `exclude-ids`, modern-treasury duplicate `name`)
   produced E0382 / E0415. Fix: analyzer assigns a unique
   `rust_ident` to each ParameterInfo at operation scope; client
   generator consults it everywhere.

8. **Empty/non-string enum values.** gitpod has
   `type: string, enum: [2000, 5000, 10000, ...]` (numeric values on
   a string-typed schema). string_enum_values used to filter to .as_str
   only, producing an empty Vec → empty enum (E0665, E0004). Fix:
   coerce non-string scalars via Display.

CI: spec-compile job now exercises 43 specs (up from 20). Local
`scripts/spec-compile.sh` (no args) still runs the full corpus for
exploring the remaining 11 failures (cal-com, cloudflare, discord,
gcore, knocklabs, langsmith, lithic, microsoft-graph, stripe, telnyx,
vercel — tracked under #14).

All 205 unit tests still pass; clippy + fmt clean.

Refs #14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant