Skip to content

feat(generator): Q2 typed-scalar formats with opt-out config#20

Merged
lightsofapollo merged 7 commits into
mainfrom
feat/q2-typed-scalars
May 9, 2026
Merged

feat(generator): Q2 typed-scalar formats with opt-out config#20
lightsofapollo merged 7 commits into
mainfrom
feat/q2-typed-scalars

Conversation

@lightsofapollo
Copy link
Copy Markdown
Contributor

Summary

Generates better Rust types out of OpenAPI specs by honoring format
hints and vendor extensions instead of collapsing everything to
String / serde_json::Value. Default-on for common cases, opt-out
per feature via a new [generator.types] TOML block (or globally via
--types-conservative).

What changed

Commit What
Q2.0 (ce94ff6) TypeMapper chokepoint in src/type_mapping.rs — single point for every (openapi_type, format) → Rust-type decision. No behavior change on its own.
Q2 (fa17c50) String-format scalars: date-timechrono::DateTime<Utc>; uuiduuid::Uuid; byteVec<u8> + inlined base64 codec; binarybytes::Bytes; ipv4/ipv6std::net::Ip*Addr; uriurl::Url. Threaded codec hints through SchemaType::Primitive.serde_with to the field-emission site. --types-conservative CLI flag for bisecting.
Q2.8 (ddfcf3f) REQUIRED_DEPS.toml written next to generated code listing exactly which optional crates the generated module references. Same summary printed to stderr at end of generation so the artifact is discoverable.
Q2.7 (ed1dda5) anyOf of primitives now produces the same clean #[serde(untagged)] pub enum X { String(String), Integer(i64) } shape oneOf already had — no more synthetic per-variant type aliases.
Q2.1+Q2.2+Q2.3 (05d555b) uint32/uint64u32/u64; built-in format aliases (uuid4 → uuid, unix-time → int64); additionalProperties: <schema>BTreeMap<String, T> (typed map instead of serde_json::Value map).
Q2.4+Q2.6 (19493d0) OpenAPI constraint annotations (minimum/pattern/etc.) emitted as /// Constraint: … doc comments. x-enum-varnames / x-enum-descriptions vendor extensions honored for nicer enum variant names + per-variant docs.

Defaults

Everything is on. Opt out per format via the relevant strategy in
[generator.types], or pass --types-conservative on the CLI to
collapse the entire surface back to pre-Q2 behavior. Email is the
one format that stays off by default (email_address crate is more
opinionated than the wire usually warrants).

No client-side validation. OpenAPI constraints surface as doc
comments only — no validator crate, no #[validate(...)]. The
server is the source of truth; client SDKs stay thin.

Generator config schema

[generator.types]
date_time = "chrono"   # | "time" | "string"
uuid      = "uuid"     # | "string"
byte      = "base64"   # | "vec_u8" | "string"
binary    = "bytes"    # | "vec_u8" | "string"
ipv4      = "std"      # | "string"
uri       = "url"      # | "string"
unsigned  = true       # uint32/uint64 → u32/u64

[generator.types.shape]
additional_properties_typed = true
primitive_unions            = true

[generator.types.constraints]
mode = "doc"   # | "off"

[generator.types.enums]
x_enum_varnames     = true
x_enum_descriptions = true

[generator.types.format_aliases]
"my-vendor-uuid" = "uuid"

Test plan

  • 21 lib unit tests in src/type_mapping.rs (TypeMapper config, dep requirements, format aliases, conservative mode).
  • 14 integration tests in tests/typed_scalars_test.rs (date-time, uuid, byte+codec, REQUIRED_DEPS write/skip).
  • 6 integration tests in tests/primitive_unions_test.rs (oneOf/anyOf parity).
  • 8 integration tests in tests/integer_formats_test.rs (uint32/64, alias normalization).
  • 7 integration tests in tests/additional_properties_typed_test.rs (typed BTreeMap, opt-out, false/true/schema).
  • 8 integration tests in tests/constraint_doc_test.rs (doc emission, pattern escaping, no-validate guarantee).
  • 6 integration tests in tests/x_enum_varnames_test.rs (varname override, descriptions, length-mismatch fallback).
  • Full integration suite passes with zero non-Q2-related regressions; ~10 snapshots updated to reflect intentional shape changes.
  • scripts/spec-compile.sh: 54/54 specs compile cleanly under the new defaults (1 skipped: gitea, baseline).
  • Anthropic + OpenAI smoke-tested manually with REQUIRED_DEPS.toml confirmed end-to-end.

Deferred follow-ups

  • format: durationchrono::Duration needs a custom ISO 8601 codec; currently stays as String.
  • uniqueItems: trueBTreeSet<T> opt-in is its own bead (Q2.5, P3) — kept off by default because of API-shape churn.

🤖 Generated with Claude Code

lightsofapollo and others added 7 commits May 9, 2026 00:42
… (Q2.0)

Introduce src/type_mapping.rs as the single chokepoint for every
(openapi_type, format) → Rust-type decision. Pre-refactor the same
logic lived in two places (openapi_type_to_rust_type and
get_number_rust_type) plus inline "String".to_string() literals in
the Typed/TypedMulti arm. Adding format-aware mappings (chrono, uuid,
url, …) without a chokepoint would mean touching every site for every
format; with TypeMapper each future Q2.* issue edits one method.

Wiring:
- TypeMapper holds TypeMappingConfig + UsedFeatures; defaults preserve
  pre-refactor behavior bit-for-bit.
- GeneratorConfig.types carries the config; ConfigFile parses
  [generator.types] from TOML and threads it through
  into_generator_config().
- SchemaAnalyzer gains a type_mapper field; new() defaults it,
  with_type_mapper() takes a caller-built mapper. The TOML-config
  path in src/bin/openapi-to-rust.rs uses with_type_mapper so user
  config drives type generation.
- openapi_type_to_rust_type, get_number_rust_type, and the
  Typed/TypedMulti arm at analysis.rs:1151 now delegate to TypeMapper.

Verification:
- 18 lib unit tests pass (incl. 5 new TypeMapper tests).
- Full integration suite: zero snapshot diffs.
- scripts/spec-compile.sh: 54 passed, 0 failed, 1 skipped (gitea, baseline).

Closes openapi-generator-r36 (Q2.0). Unblocks Q2 (quq) and Q2.1–Q2.8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`format: date-time` / `uuid` / `uri` / `binary` / `byte` / `ipv4` / `ipv6`
on a `string` property now produces typed Rust scalars by default
instead of bare `String`. Opt out per format in `[generator.types]`
or globally via `--types-conservative`.

## Defaults

| format     | rust type                       | crate           |
|------------|---------------------------------|-----------------|
| date-time  | chrono::DateTime<Utc>           | chrono+serde    |
| date       | chrono::NaiveDate               | chrono+serde    |
| time       | chrono::NaiveTime               | chrono+serde    |
| uuid       | uuid::Uuid                      | uuid+serde      |
| uri / url  | url::Url                        | url+serde       |
| ipv4 / ipv6| std::net::Ipv*Addr              | std (no dep)    |
| binary     | bytes::Bytes                    | bytes+serde     |
| byte       | Vec<u8> + #[serde(with = "base64_serde")] | base64 |

`email` and `duration` stay as String for now (less universal /
needs ISO 8601 codec; both follow-ups).

## Wiring

- `TypeMappingConfig` switched from `Option<String>` placeholders to
  proper `DateStrategy`/`UuidStrategy`/`ByteStrategy`/etc enums; each
  defaults to its typed strategy.
- `TypeMapper.string_format()` dispatches on the normalized format and
  records used crates in `UsedFeatures` (consumed by Q2.8 later).
- `SchemaType::Primitive` gained a `serde_with: Option<String>` field
  carrying the codec hint; threaded from `MappedType` through analysis
  to the generator's field-attr emission.
- `analyze_property_schema_with_context`'s String non-enum arm now
  routes through `TypeMapper` (Q2.0 only got the top-level Typed arm).
- `SchemaAnalysis.used_type_features` snapshots the mapper's used
  crates after analysis; the generator emits `mod base64_serde` only
  when `format: byte` was actually referenced.
- `base64_serde` includes an `option` submodule so nullable
  `Option<Vec<u8>>` fields use `with = "base64_serde::option"` —
  serde dispatches on field type and the base codec only handles
  `Vec<u8>`.
- `type_lacks_default()` extended for chrono / url / time / iso8601 /
  email_address types so `#[serde(default)]` is suppressed where the
  scalar has no `Default` impl.
- `type_name_to_variant_name` + `generate_union_enum` handle qualified
  / generic Rust paths in primitive oneOf variants
  (`bytes::Bytes`, `chrono::DateTime<chrono::Utc>`, …) — without
  these, oneOf variants like `Vec<u8>+VideoReferenceInputParam`
  produced `BytesBytes(BytesBytes)` and refused to compile.
- `generate_type_alias` and `generate_field_type` now use a single
  `parse_rust_type()` helper backed by `syn::parse_str` instead of
  ad-hoc `::`-splitting that choked on generics.

## CLI

- `openapi-to-rust generate --types-conservative` — overrides
  `[generator.types]` to set every format back to "string", useful
  for bisecting regressions caused by typed-scalar adoption.

## Verification

- 33 lib + integration unit tests (10 new typed-scalar end-to-end +
  7 new TypeMapper tests).
- spec-compile gate: 54 passed, 0 failed, 1 skipped (gitea, baseline).
  `--types-conservative` not directly gated yet — the conservative
  mapper is exercised by the dedicated unit/integration tests.
- Bumps test-rig Cargo.toml templates (spec-compile, test_helpers,
  fixture_tests, multi_response_client_test) with the new optional
  deps so the gates exercise the typed-scalar path.

Closes openapi-generator-quq (Q2).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Q2 turned typed-scalar formats on by default, which means generated
code now references chrono/uuid/url/bytes/base64 even though the
generator doesn't own the consuming crate's Cargo.toml. Without an
advisory, users hit "use of unresolved module `chrono`" on first
build with no clear pointer to the fix.

This change surfaces required deps via three mechanisms:

1. `GenerationResult.required_deps: Vec<DepRequirement>` — programmatic
   access for library consumers.
2. `<output_dir>/REQUIRED_DEPS.toml` — copy-pasteable file with a
   `[dependencies]` block, written by `write_files()` only when
   the generated code references at least one optional crate.
3. CLI `openapi-to-rust generate` prints the same summary to stderr
   and ends with the file path so the artifact is discoverable.

## Wiring

- `TypeFeature::dep_requirement()` — canonical (crate, version,
  features) per feature; single source of truth so the spec-compile
  gate, test harnesses, and end-user advisory can't drift.
- `DepRequirement::to_toml_line()` — picks the most compact valid
  `[dependencies]` form (string version when no features, inline
  table when features are needed).
- `collect_dep_requirements()` snapshots `UsedFeatures` as a
  sorted, de-duplicated list — output is deterministic for diffs.
- `render_required_deps_toml()` returns `None` when input is empty
  so callers can skip writing the file (no clutter for pure-string
  specs).

## Verification

- 5 new unit tests (dep_requirement rendering, sorted/deduped
  collection, empty-vs-populated render).
- 4 new end-to-end tests (required_deps populated from real
  analysis, REQUIRED_DEPS.toml written/skipped correctly).
- Smoke test against anthropic spec: stderr advisory + on-disk file
  both produced as expected (chrono + base64).
- Full integration suite passes (28 lib + 14 typed-scalar tests).
- spec-compile gate: 54 passed, 1 skipped (gitea, baseline).

Closes openapi-generator-fbn (Q2.8).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-Q2.7 the `oneOf` and `anyOf` paths diverged on primitive
variants. Same input, different output:

  oneOf: [string, integer]   →   pub enum X { String(String), Integer(i64) }
  anyOf: [string, integer]   →   pub type XString = String;
                                 pub type XIntegerVariant1 = i64;
                                 pub enum X { XString(XString), XIntegerVariant1(XIntegerVariant1) }

Both are #[serde(untagged)] so they round-trip the same JSON, but
the anyOf shape leaked synthetic type aliases into the generated
module and gave callers worse-named variants. The original Q2.7
bead description claimed primitive unions fell back to
`serde_json::Value`; that was stale — primitives have always
become Union variants. The real gap was alias bloat on the anyOf
path.

This change makes `analyze_anyof_union`'s primitive branch mirror
`analyze_untagged_oneof_union`: route the variant schema through
TypeMapper, push the resulting Rust type directly as the variant
target. The generator's `generate_union_enum` already knew how
to render bare primitive types as variants (the `bool|i32|String`
match at line 1319) so no generator-side change was needed.

Toggle:
  [generator.types.shape]
  primitive_unions = false   # restore pre-Q2.7 alias shape

Default `true`. The opt-out exists for users with snapshot
checks that depend on the aliased variant names.

Verification:
- 6 new tests in tests/primitive_unions_test.rs covering oneOf,
  anyOf (default + opt-out), 3-variant unions, and explicit-null
  filtering.
- 7 existing snapshots updated to reflect the cleaner shape:
  content_union_structured, discriminator_array_standalone,
  inline_variant_naming, multi_array_variants, nested_union_array,
  property_underscore_types, union_array_naming.
- Full integration suite passes.
- spec-compile gate: 54 passed, 1 skipped (gitea, baseline).

Closes openapi-generator-j6n (Q2.7).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Q2.2, Q2.3)

Three small Q2 follow-ups, all default-on, opt-out per-feature.

## Q2.1 — uint32/uint64 → u32/u64

`format: uint32` / `uint64` now map to `u32` / `u64` instead of
degrading to `i64`. ~288 usages across the spec corpus. `[generator.types]
unsigned = false` reverts to pre-Q2.1 i64 fallback.

## Q2.2 — built-in format aliases

Vendor-specific format names normalize to canonical ones before
the standard format dispatch. Built-in aliases:
  uuid4, uuid_v4, UUID  → uuid
  unix-time, unix_time, unixtime, timestamp → int64

User-supplied [generator.types.format_aliases] entries win on
collision so users can override built-ins (e.g. force `uuid4`
back to plain string).

## Q2.3 — typed BTreeMap from additionalProperties: <schema>

Pre-Q2.3 `additionalProperties: <schema>` collapsed to
`BTreeMap<String, serde_json::Value>`, dropping the value-type
information. Now the schema is analyzed and the emitted field is
`BTreeMap<String, T>` where T is the resolved type (including
typed scalars from Q2 — e.g. `additionalProperties: { format:
uuid }` produces `BTreeMap<String, uuid::Uuid>`). Implementation:
  - `SchemaType::Object.additional_properties: bool` →
    `ObjectAdditionalProperties` enum (Forbidden / Untyped / Typed).
  - Generator emits the BTreeMap field with the right value type.
  - `[generator.types.shape] additional_properties_typed = false`
    reverts to the pre-Q2.3 untyped behavior.

## Verification

- 21 lib unit tests (added 6 for Q2.1/Q2.2 alias and unsigned coverage).
- 8 new integration tests in tests/integer_formats_test.rs.
- 7 new integration tests in tests/additional_properties_typed_test.rs.
- 1 snapshot update (nested_inline_objects_test) reflecting the
  typed BTreeMap shape from Q2.3.
- spec-compile gate: previously verified 54/54 pass under Q2.1+Q2.2;
  Q2.3 changes have no spec-corpus regressions in local checks.

Closes openapi-generator-bw1 (Q2.1), openapi-generator-gub (Q2.2),
openapi-generator-61h (Q2.3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two doc-comment-emitting features. Both default-on, both feed
non-binding human-readable hints to callers without adding any
runtime crate dependencies.

## Q2.4 — constraint annotations as doc comments

Pre-Q2.4 the generator parsed minimum/maximum/min_length/
max_length/pattern/multiple_of/min_items/max_items/uniqueItems
into SchemaDetails but never emitted them. Real specs use these
heavily (13k+ uniqueItems and 4k+ min/max occurrences across the
corpus); dropping them was a real loss for callers trying to
understand the contract.

Now each property with at least one constraint gets a
`/// Constraint: <key>=<value>, …` doc comment. Pattern strings
are escaped so `///` and `*/` substrings can't terminate the
surrounding doc comment.

Toggle: `[generator.types.constraints] mode = "doc"` (default) /
`"off"` (suppress entirely).

**No client-side validation** by design. The generator never
emits `#[validate(...)]` attributes or pulls in the `validator`
crate. OpenAPI constraints belong on the wire contract; the
server is the source of truth. Doc comments give callers
visibility without the SDK duplicating server logic and going
brittle when rules drift. The `no_validate_attribute_is_ever_emitted`
test pins this guarantee.

Implementation:
- `PropertyConstraints` struct in analysis.rs captures the
  relevant SchemaDetails fields per property.
- `PropertyInfo` carries the constraints alongside the schema type.
- Generator emits the doc line via `generate_constraint_doc()`
  + `format_constraints_doc()` helper.

## Q2.6 — x-enum-varnames / x-enum-descriptions

Common vendor extensions for enum schemas: arrays of Rust-friendly
variant identifiers and per-variant descriptions, parallel to the
spec's `enum` array. Used by arcade.yaml, datadog-v2.yaml, and
others in the corpus.

When `x-enum-varnames` is present and length-matches the enum
array, the generator uses those identifiers for variant names
instead of the default PascalCase heuristic. Wire format is
preserved via `#[serde(rename = "<original-value>")]`. When
`x-enum-descriptions` is present, each entry becomes the variant's
doc comment.

Length-mismatched extensions are silently dropped at analysis
time with a stderr warning; the generator falls back to the
default heuristic.

Toggles: `[generator.types.enums]` `x_enum_varnames` /
`x_enum_descriptions` (both default true).

Implementation:
- `EnumExtensions` struct in analysis.rs holds the validated
  varnames + descriptions.
- `SchemaAnalysis.enum_extensions` side-channel keyed by analyzed-
  schema name (avoided extending every StringEnum constructor).
- `extract_enum_extensions()` populates after analyze() by reading
  `original` JSON.
- `generate_string_enum` + `generate_extensible_enum` accept an
  `Option<&EnumExtensions>` and apply overrides when toggles allow.

## Verification

- 8 new tests in tests/constraint_doc_test.rs (Q2.4).
- 6 new tests in tests/x_enum_varnames_test.rs (Q2.6).
- 1 snapshot updated (union_array_naming) where a real spec
  field with a `pattern` got its constraint doc surfaced.
- Full integration suite passes; spec-compile gate verification
  pending in next commit.

Closes openapi-generator-d8y (Q2.4) and openapi-generator-4mu (Q2.6).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- fmt: rustfmt across new/modified files (test helpers, generator
  bin, ConstraintMode wiring).
- clippy:
  - Convert `impl Default for <Strategy>` blocks to
    `#[derive(Default)]` + `#[default]` per-variant for all eight
    type-mapping strategy enums and ConstraintMode (clippy
    `derivable_impls`).
  - Replace literal U+200B chars in `format_constraints_doc`'s
    pattern escaping with the `\u{200B}` Rust escape (clippy
    `invisible_characters`).
- doc: wrap `Vec<u8>` in backticks in the SchemaType::Primitive
  docstring (rustdoc `invalid_html_tags` treated `<u8>` as an
  unclosed HTML tag).
- test: add `serde_with: ..` to a SchemaType::Primitive pattern
  match in `examples/number_formats.rs`, and add the new `types`
  field to the GeneratorConfig literal in `examples/complete_workflow.rs`.

No behavior change. All four gates pass locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lightsofapollo lightsofapollo merged commit 33de627 into main May 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant