pxf: TableReader streaming + scan / BindRow#38
Merged
Conversation
Companion to protowire-go v0.74 (pxf.TableReader streaming) and v0.75
(TableReader.scan + pxf.BindRow). Reads rows from an InputStream one
at a time with working-set memory bounded by the size of the largest
single row.
API additions:
- TableReader(InputStream) — header parsed at construction; reader
positioned at the first row
- type(), columns(), directives() — header accessors
- tail() — InputStream yielding buffered + remaining bytes; use to
chain a second TableReader for multi-table documents
- next() — returns the next Ast.TableRow or null at EOF; errors
are sticky
- scan(Message.Builder) — reads next row and binds cells to msg
fields by column name via BindRow
- BindRow.bindRow(Message.Builder, List<String>, Ast.TableRow) —
standalone helper for callers iterating Result.tables()
Implementation: byte-level row-boundary scanner pulls bytes from the
source InputStream on demand and slices one ( ... ) row range at a
time, handed to Parser.parseTableRow for cell decoding. The scanner
is string / bytes / line-comment / block-comment aware so embedded
parens don't trip it. Header parsing reuses Parser.parse() on the
buffered header prefix; the standalone constraint, dotted-column
rejection, and arity enforcement come from there. Header byte
budget caps at 64 KiB.
BindRow strategy mirrors protowire-go's table_bind.go: render each
non-null cell back to PXF text (`<column> = <value>\n`), concat as
synthetic body, run UnmarshalOptions.withSkipValidate(true).unmarshal.
Reuses every existing decoder branch (WKTs, wrappers, enums,
defaults, required, oneof) for zero new switch arms.
Tests in TableReaderTest (24 cases): basic streaming, three cell
states, side-channel directives before header, sticky errors,
list/block cells rejected mid-stream, strings + block + line
comments with embedded parens, byte-at-a-time InputStream
(adversarial for buffer boundaries), multi-table via tail(),
equivalence with materializing path, oversized-header rejection,
scan happy path, scan empty-cell-leaves-field-at-zero,
null-on-wrapper clearing, WKT timestamp binding, BindRow against
materializing path, arity mismatch, non-leaf-cell rejection.
Parser gained a package-private parseTableRow(byte[], int) static
helper so TableReader can decode each row's byte slice without
re-running the full document grammar.
CodeQL "Unread local variable" flagged three pattern-match bindings in writeCellValue's switch (NullVal n, ListVal l, BlockVal bv) — all three cases just appended a literal string or threw, with no use of the bound variable. Java 21 standard pattern matching requires a binding on every `case` label (unnamed-pattern `_` is preview in 21 and stable in 22; this project doesn't enable preview), so the names couldn't be dropped. Refactor: handle NullVal / ListVal / BlockVal via `instanceof` checks before the switch (no binding needed), and keep the switch for the value-using cases that actually read the bound variable. Switch is no longer exhaustive over the sealed Ast.Value so a default arm guards against any future leaf-value type we miss adding. No behavior change. All BindRow / TableReader tests still pass.
Same CodeQL "Unread local variable" fix as the previous commit applied to BindRow. Only NullVal had the issue here — ListVal and BlockVal both use their bound variables (l.elements() / bv.entries()). Java 21 standard pattern matching requires a binding on every `case` label, and unnamed-pattern `_` is preview-only. Routing the no-binding case out via instanceof before the switch is the cleanest fix. Default arm guards against any future leaf-value type we miss adding.
trendvidia
added a commit
that referenced
this pull request
May 12, 2026
Promote [Unreleased] entries to [0.75.0] in advance of tagging v0.75.0. First tagged version after v0.70.0; consolidates four merged batches that brought the Java port up to the v0.75.0 protowire spec: - #35 parser-side support for v0.72 named directives + v0.73 @entry prefix list + @table directive - #36 schema reserved-name check (draft §3.13) - #37 Result.directives() / Result.tables() accessors - #38 TableReader streaming + scan(Message.Builder) + BindRow Aligns protowire-java's version number with the rest of the protowire-* stack, which is at v0.75.0 across protowire and protowire-go. Wire format unchanged across all four batches. After merge + tag, the JetBrains plugin jar refresh in the spec repo (scripts/refresh_jetbrains_parser_jar.sh) picks up the new parser and unblocks the red squiggles on @<name>/@entry/@table documents in the IDE.
trendvidia
added a commit
that referenced
this pull request
May 12, 2026
CodeQL was failing with "could not process any code written in Java/Kotlin" on PRs that touched no Java source — the README PR (#40) and the release-cut commit (#39, which only edits CHANGELOG.md). Diagnosis: gradle/actions/setup-gradle@v6 restores the build cache on every run. README- and CHANGELOG-only changes leave every input to the JavaCompile task identical, so Gradle reports the `classes` task as UP-TO-DATE and runs zero `javac` invocations. CodeQL's Java tracer wraps `javac`; no invocations means no extracted classes, which surfaces as the empty-database error at the analyze step. Confirmed by the run-duration pattern in `gh run list --workflow=codeql.yml`: - failing runs (#39 release-cut, #40 README) finished in ~1 min - successful runs (#35-#38 feature PRs that touched src/) took ~2 min The minute of "missing" wall-clock is exactly the javac step that didn't run. Fix: `--no-build-cache clean classes` forces a recompile every CodeQL invocation. Drops a `clean` task ahead of `classes` to guarantee no UP-TO-DATE skip, and `--no-build-cache` is belt-and-braces against the setup-gradle action's cache restore. Slightly slower CodeQL runs (~30s more on cold compile) but the analyze step gets real javac output to extract from.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Companion to protowire-go v0.74 (
pxf.TableReaderstreaming) and v0.75 (TableReader.scan+pxf.BindRow). Reads rows from anInputStreamone at a time with working-set memory bounded by the size of the largest single row — the shape consumers need for CSV-replacement datasets that don't fit in memory.```java
try (var tr = new TableReader(in)) {
while (true) {
var b = AllTypes.newBuilder();
if (!tr.scan(b)) break;
process(b.build());
}
}
```
Cell-state semantics match
BindRow: anullcell leaves the field absent (pxf.defaultapplied,pxf.requirederrors), anAst.NullValcell clears wrappers / optional / oneof per §3.9, any other value sets the field. WKT timestamps + durations, enum-by-name, proto3 wrappers all bind correctly becauseBindRowre-uses the existing unmarshal pipeline (format-and-reparse).API additions
Implementation
)inside literals don't trip the scan. Mirrorsprotowire-go'sfindNextRowintable_stream.go.Parser.parse(). Buffered header prefix (up through the column-list)) is parsed by the existing AST parser, so the standalone constraint, dotted-column rejection, and arity enforcement all flow through unchanged.Parser.parseTableRow. New package-private static helperParser.parseTableRow(byte[], int)letsTableReaderdecode each row's byte slice without re-running the full document grammar.BindRowstrategy. Render each non-null cell back to PXF text (<column> = <value>\n), concat as synthetic body, runUnmarshalOptions.withSkipValidate(true).unmarshal. Reuses every existing decoder branch (WKTs, wrappers, enums, defaults, required, oneof) for zero new switch arms. Same tradeprotowire-gomade.TableReaderpointed at a giant body-only document with no@tableever.Tradeoffs
next()orscanthrows, subsequent calls rethrow the same exception. Matches the Go port's contract.tail()semantics. Must only be called afternext()returnsnull. Calling earlier returns bytes the current reader still intends to consume, which desynchronizes the next reader.TableReaderdoes NOT implementAutoCloseable. The underlyingInputStreamis the caller's responsibility; closing the reader would close `src` and preventtail()chaining. Documented in javadoc.ChunkedStreamreader is adversarial for any buffer-boundary bug; protowire-go's equivalent caught real issues there.Stacking
Fourth of five Java port PRs:
Test plan