Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
2d800e5
Add AGENTS.md
schani Jun 26, 2026
54c796a
Add comment injection schema repros
schani Jun 26, 2026
02c6cae
Add tree-sitter comment injection coverage
schani Jun 26, 2026
add8601
Normalize description line endings
schani Jun 26, 2026
4c56a4f
Escape C-style description comments
schani Jun 26, 2026
28d1ca5
Escape triple-quoted description comments
schani Jun 26, 2026
b85afc5
Escape curly-dash description comments
schani Jun 26, 2026
cfb6a58
Document comment escaping strategy
schani Jun 26, 2026
19f35c7
Centralize comment escaping in emitters
schani Jun 26, 2026
e6a47b5
Run comment injection fixtures in CI
schani Jun 26, 2026
15e38c4
Remove PHP comment injection fixture from CI
schani Jun 26, 2026
b4384a2
Address comment escaping review feedback
schani Jun 26, 2026
7f7376b
Add nested comment injection fixture
schani Jun 26, 2026
a7ffcd1
Add Kotlin comment injection parser coverage
schani Jun 28, 2026
fa0428a
Add Elixir comment injection parser coverage
schani Jun 28, 2026
aa684d3
Add Pike comment injection parser coverage
schani Jun 28, 2026
d9247a4
Add Dart comment injection parser coverage
schani Jun 28, 2026
474de7a
Add Swift comment injection parser coverage
schani Jun 28, 2026
eec7ba7
Document vendored parser WASMs
schani Jun 28, 2026
7e809f9
Bump version to 23.3.0
schani Jul 2, 2026
13b12f3
Harden comment-injection escaping for residual holes
schani Jul 3, 2026
39b0d0c
Extend comment-injection test coverage
schani Jul 3, 2026
edd8c34
Run Objective-C comment-injection in CI; update docs
schani Jul 3, 2026
1e3a736
Don't run Objective-C comment-injection in CI
schani Jul 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .github/workflows/test-pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ jobs:
- php
- scala3,schema-scala3
- elixir,schema-elixir,graphql-elixir
- comment-injection-treesitter,comment-injection-typescript,comment-injection-typescript-zod,comment-injection-typescript-effect-schema

# Partially working
# - schema-typescript # TODO Unify with typescript once fixed
Expand Down Expand Up @@ -68,6 +69,9 @@ jobs:

# - fixture: objective-c # segfault on compiled test cmd
# runs-on: macos-latest
# comment-injection-objective-c also hits that runtime
# segfault (the generated .m compiles but ./test crashes
# with SIGSEGV), so it stays out of CI too.

name: ${{ matrix.fixture }}
steps:
Expand Down
64 changes: 64 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# AGENTS.md

Notes for coding agents working in this repository.

## Environment

- This is a TypeScript/npm monorepo using npm workspaces.
- Prefer the Node version from `.nvmrc` (`nvm use`; currently Node 22.14.0). `package.json` requires Node >= 18.12.0.
- Install dependencies with `npm ci`.

## Build and run

- Build everything with:

```bash
npm run build
```

This runs `npm run clean`, builds all workspaces that have a `build` script, then runs the root `tsc`.

- After building, the CLI entry point is `dist/index.js`, for example:

```bash
node dist/index.js --version
node dist/index.js --help
```

- For live rebuild/re-run while developing renderer output, use `npm start -- "<quicktype args>"`.

## Tests

- The test runner is `script/test`, exposed as:

```bash
npm test
```

- The full suite runs all fixtures and needs external language toolchains for many targets (`dotnet`, Java/Maven, Go, Rust, Python/mypy, PHP, Ruby, Kotlin, Scala, Elixir, etc.). On a machine without those tools, plain `npm test` will fail when it reaches the first missing toolchain.

- For local focused testing, use fixture filters. Fixture names are in `test/languages.ts` and `test/fixtures.ts`; comma-separated fixture groups are supported:

```bash
QUICKTEST=true FIXTURE=javascript npm test
QUICKTEST=true FIXTURE=typescript npm test -- test/inputs/json/samples/pokedex.json
CPUs=2 QUICKTEST=true FIXTURE=javascript npm test
```

`QUICKTEST=true` skips the large miscellaneous JSON input set. Extra arguments after `--` are sample files or directories to run.

- GitHub Actions uses the same pattern, e.g. `QUICKTEST=true FIXTURE=${{ matrix.fixture }} npm test`, after installing toolchain dependencies for each fixture group in `.github/workflows/test-pr.yaml`.

## Validation performed

The following commands were run successfully in this workspace:

```bash
npm ci
npm run build
node dist/index.js --version
CPUs=2 QUICKTEST=true FIXTURE=javascript npm test
QUICKTEST=true FIXTURE=typescript npm test -- test/inputs/json/samples/pokedex.json
```

Also observed: `npm test` without fixture filters started the full 70-fixture suite and failed on this machine because `dotnet` is not installed. `npm run lint` currently fails because ESLint cannot find a configuration file.
79 changes: 79 additions & 0 deletions COMMENT-INJECTION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# JSON Schema comment injection

## TypeScript reproduction

Added a focused schema fixture:

- `test/inputs/schema/comment-injection.schema`
- `test/inputs/schema/comment-injection.1.json`

The schema puts comment-closing text in both an object `description` and a property `description`:

- `*/` and `/*` for C-style block comments (the opener matters for Kotlin and Scala 3, whose block comments nest)
- `{-` and `-}` for Elm/Haskell comments
- `"""` on its own line for Python/Elixir docstrings/heredocs
- `</summary> & <br>` for C# XML doc comments
- `\r}`, `\u0085}`, `\u2028}`, and `\u2029}` for line-comment outputs whose lexers treat CR, NEL, LINE SEPARATOR, or PARAGRAPH SEPARATOR as line terminators
- property descriptions ending in `\`, `"`, and `"""` for docstring/heredoc delimiter boundaries and C-family line splicing

Run:

```bash
CPUs=1 QUICKTEST=true FIXTURE=schema-typescript npm test -- test/inputs/schema/comment-injection.schema
```

Expected result: the generated TypeScript validates `comment-injection.1.json` and prints equivalent JSON.

Before escaping was added, this test failed before validation because `TopLevel.ts` was syntactically invalid; the schema `description` escaped the generated `/** ... */` comment.

## Schema fields that can reach comments

In JSON Schema input, `packages/quicktype-core/src/attributes/Description.ts` collects `description` into type and property-description attributes. Renderers then emit those attributes as documentation comments.

Observed comment sinks:

- schema/type `description` on objects/classes
- property `description` for object properties/fields
- `description` on enum schemas
- `description` on union schemas and other named types when a renderer emits docs for those named types

`title` is different in the inspected path: JSON Schema input uses it for type/top-level naming, not as raw documentation text. The generated JSON Schema renderer can output JSON `title` fields, but those are JSON string values, not source comments.

Other non-schema inputs can also supply descriptions or leading comments, but the reproduction here is limited to JSON Schema `description`.

## Potentially affected outputs and triggers

Outputs are affected when raw schema descriptions are placed into comments/docstrings without escaping that target's comment delimiter or line terminators. The shared fix now normalizes description line endings and escapes delimiter text at comment-emission time.

- C-style doc comments `/** ... */`: TypeScript, Flow, JavaScript when descriptions are emitted, Java, C (`cjson`), C++, PHP, Kotlin, Scala 3, Smithy4s. Trigger with `*/`; escaped as `* /`. The opener `/*` is escaped as `/ *`, too, because Kotlin and Scala 3 nest block comments, so an unmatched opener would swallow the rest of the file.
- Elm/Haskell doc comments `{-| ... -}`: Elm, Haskell. Trigger with `{-` or `-}`; escaped as `{ -` and `- }`.
- Triple-quoted docstrings/heredocs: Python, Elixir. These are string literals, so three separate escapes apply: `"""` is escaped as `\"\"\"`, backslashes are doubled (otherwise a description ending in `\` swallows the first quote of the closing delimiter), and a line-ending unescaped `"` directly before an inline closing `"""` is escaped as `\"` (otherwise four quotes in a row leave a stray quote after the closing delimiter).
- XML doc comments `/// <summary>`: C#. Trigger with `</summary>`, `<`, or `&`; escaped as XML entities (`&lt;`, `&gt;`, `&amp;`) in both the regular and `--density dense` paths.
- Line comments (`//`, `///`): C#, Go, Rust, Ruby, Swift, Objective-C, Dart, Pike, Crystal, and enum comments in TypeScript-Zod/TypeScript-Effect-Schema. Trigger with a carriage return (`\r`) — or U+0085/U+2028/U+2029, which JavaScript and C# lexers also treat as line terminators — when descriptions are split only on `\n`; fixed by normalizing all of them to `\n` before comment emission.
- C-family line splicing: C, C++, and Objective-C splice a backslash-newline into one line even inside comments, so a description ending in `\` pulls the next generated line into the comment (and trips `-Wcomment` under `-Werror`). Fixed by appending `.` after a comment line's trailing backslash in those renderers.

Plain JavaScript output did not emit the tested schema descriptions into model comments, so the object/property reproduction does not affect it the same way. TypeScript-Zod and TypeScript-Effect-Schema also did not emit the object/property descriptions from `comment-injection.schema`; they only emitted the enum description from `comment-injection-enum.schema`.

The tree-sitter fixture includes Go, Rust, and Ruby even though their grammars did not reproduce a syntax break with the CR line-comment payload; this keeps parser coverage in place for those generated outputs and future payload/escaping changes.

Still not covered by the tree-sitter fixture: Objective-C, Crystal, and Elm. Objective-C's available tree-sitter grammar reports baseline errors on generated `.m` output; Crystal and Elm have usable grammars but are not included in this parser-coverage pass.

## Test cases added

- `test/inputs/schema/comment-injection.schema` covers object and property descriptions, including both Elm/Haskell nested-comment delimiters, XML doc metacharacters, Unicode line terminators, and descriptions ending in `\`, `"`, and `"""`.
- `test/inputs/schema/comment-injection-enum.schema` covers enum descriptions via an enum-valued property.
- `test/inputs/schema/comment-injection-nested-comment.schema` specifically covers unmatched nested-comment openers (`{-` and `/*`) in object and property descriptions.
- `test/inputs/schema/comment-injection-enum-nested-comment.schema` is the enum-description variant of the nested-comment payload; the tree-sitter fixture substitutes it for the enum-only targets so each sample run does distinct work.

The existing `JSONSchemaFixture` instances pick these samples up for schema-based language tests. Additional narrow `comment-injection-*` fixtures cover affected outputs that did not already have full schema fixtures: Objective-C uses all four samples; TypeScript-Zod and TypeScript-Effect-Schema use the two enum-description samples. The `comment-injection-objective-c` fixture is registered but not run in CI: like the full `objective-c` fixture, the generated `.m` compiles cleanly with clang but the compiled `./test` binary segfaults at runtime (SIGSEGV) on the sample JSON — a pre-existing Objective-C harness issue unrelated to comment injection. It can be run locally on macOS.

A parser-only fixture, `comment-injection-treesitter`, generates all configured targets and parses them with tree-sitter WASM grammars. It currently covers TypeScript, TypeScript-Zod, TypeScript-Effect-Schema, Swift, C#, Java, Dart, C (`cjson`), C++, PHP, Kotlin, Go, Pike, Rust, Ruby, Python, Elixir, Scala 3, and Haskell. It is intentionally one fixture/test that loops over all configured languages and reports all parse failures together. The Swift, Dart, Kotlin, Pike, and Elixir grammars are vendored under `test/tree-sitter-wasms` to avoid npm peer/native dependency issues in CI. Loaded grammars are cached per WASM path across targets and samples.

Two grammar quirks require per-target workarounds in the fixture:

- tree-sitter-dart wrongly parses a block-comment opener inside a line comment (`// /*`) as an ERROR, whereas real Dart runs line comments to the end of the line; the Dart target neutralizes `/*` inside line comments before parsing.
- tree-sitter-c parses preprocessor conditionals structurally and cannot match the `extern "C" {` brace across the two `#ifdef __cplusplus` blocks that `cjson` output uses as a guard, reporting a spurious `MISSING #endif` even for benign output; the cjson target strips those two guard blocks (which only wrap the file) before parsing, so a real injection that leaves a MISSING node is still caught.

The fixture also scans the raw generated bytes for injection classes the grammars can't detect (their lexers are more lenient than the real compilers): no output may contain a surviving U+0085/U+2028/U+2029 line terminator, and per-target `forbiddenSubstrings` catch markup that parses as comment text but is dangerous downstream (C# rejects the raw `</summary> & <br>` payload, verifying the XML entity escaping).

These are regression tests and should pass with the shared comment escaping/sanitization in place.
Loading
Loading