Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,63 @@ format changes.

### Added

- **`TableReader` streaming + `scan(Message.Builder)` + `BindRow` helper.**
Companion to `protowire-go` v0.74 (`pxf.TableReader`) and v0.75
(`TableReader.scan` / `BindRow`). Reads rows from an
{@link java.io.InputStream} one at a time with working-set memory bounded
by the size of the largest single row — the shape consumers need for
CSV-replacement datasets that don't fit in memory.

```java
try (var tr = new TableReader(in)) {
while (true) {
var b = AllTypes.newBuilder();
if (!tr.scan(b)) break;
process(b.build());
}
}
```

Cell-state semantics match `BindRow`: a `null` cell leaves the field
absent (`pxf.default` applied, `pxf.required` errors), an `Ast.NullVal`
cell clears wrappers / optional / oneof per §3.9, any other value sets
the field. WKT timestamps and durations, enum-by-name, proto3 wrappers
all bind correctly because `BindRow` re-uses the existing unmarshal
pipeline (format-and-reparse).

Implementation: byte-level row-boundary scanner pulls bytes from the
source `InputStream` on demand and slices one `( ... )` row range at a
time, which is then handed to `Parser.parseTableRow` for cell decoding.
The scanner is string / bytes-literal / line-comment / block-comment
aware so embedded parens or `)` inside literals don't trip it. Header
parsing reuses `Parser.parse()` against the buffered header prefix, so
the standalone constraint and dotted-column rejection get the same
enforcement the materializing path uses. Header byte budget caps at
64 KiB — fail-fast against a `TableReader` pointed at a giant
body-only document with no `@table` ever.

Multi-table documents chain via `tr.tail()`, which returns an
`InputStream` yielding the bytes the reader buffered but didn't consume
followed by the remaining source.

Public API additions:
- `TableReader(InputStream)`, `type()`, `columns()`, `directives()`,
`tail()`, `next()`, `scan(Message.Builder)`
- `BindRow.bindRow(Message.Builder, List<String>, Ast.TableRow)`

Errors are sticky: once `next()` or `scan` throws, subsequent calls
rethrow the same exception (matches the Go port's contract).

Tests in `TableReaderTest` (24 cases): basic streaming, three cell
states, side-channel directives before header, sticky errors,
list/block cells rejected mid-stream, strings / block + line comments
with embedded parens, byte-at-a-time `InputStream` (adversarial for
buffer boundaries), multi-table via `tail()`, equivalence with the
materializing path, oversized-header rejection, `scan` happy path,
`scan` empty-cell-leaves-field-at-zero, `null`-on-wrapper clearing,
WKT timestamp binding, `BindRow` against the materializing path,
arity mismatch, non-leaf-cell rejection.

- **`Result.directives()` / `Result.tables()` accessors.** `FastDecoder`
used to consume named directives and `@table` directives at the
document head without storing them (PR #35 parser-side port did the
Expand Down
125 changes: 125 additions & 0 deletions pxf/src/main/java/org/protowire/pxf/BindRow.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
// SPDX-License-Identifier: MIT
// Copyright (c) 2026 TrendVidia, LLC.
package org.protowire.pxf;

import com.google.protobuf.Message;

import java.io.ByteArrayOutputStream;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.List;

/**
* Per-row proto-binding helper for {@code @table} rows. Sits atop the
* streaming {@link TableReader} (via {@link TableReader#scan}) and is also
* exported as a standalone helper for callers that iterate the
* materializing path's {@link Result#tables()} rows.
*
* <p>Implementation strategy: convert each non-{@code null} cell back to
* its PXF text representation, concatenate as a {@code <column> = <value>\n}
* body, and run through the existing unmarshal pipeline with
* {@link UnmarshalOptions#skipValidate()} on. That reuses every branch of
* the existing decoder — WKT timestamps and durations, wrapper-type
* nullability, enum-by-name resolution, {@code pxf.required} /
* {@code pxf.default}, oneof handling — instead of growing a parallel
* Value-to-FieldDescriptor switch with ~50 arms. The cost is a small
* format-and-reparse per row; that's an acceptable trade for a streaming
* convenience API whose consumers have already opted into the convenience
* tier. Same trade {@code protowire-go} made in {@code table_bind.go}.
*/
public final class BindRow {
private BindRow() {}

/**
* Bind the cells of {@code row} to the fields of {@code builder} by
* column name. The {@code columns} list MUST have the same length as
* {@code row.cells()}; mismatch raises {@link IllegalArgumentException}.
*
* <p>Cell-state semantics (mirrors draft §3.4.4):
* <ul>
* <li>A {@code null} cell ⇒ field absent ({@code pxf.default} is
* applied if declared; {@code pxf.required} errors otherwise).</li>
* <li>An {@link Ast.NullVal} cell ⇒ field cleared, per §3.9
* (clears wrappers / optional / oneof).</li>
* <li>Any other {@link Ast.Value} ⇒ field set.</li>
* </ul>
*
* <p>{@code builder}'s descriptor MUST contain fields whose names
* appear in {@code columns}; a column referring to an unknown field
* surfaces as a "field not found" error from the underlying unmarshal
* call (unless {@link UnmarshalOptions#discardUnknown} is set).
*/
public static void bindRow(Message.Builder builder, List<String> columns, Ast.TableRow row) {
if (columns.size() != row.cells().size()) {
throw new IllegalArgumentException(
"BindRow: " + columns.size() + " columns vs " + row.cells().size() + " cells");
}
byte[] body = rowToPxfBody(columns, row);
// Run the synthetic body through the standard unmarshal pipeline.
// SkipValidate avoids re-running the reserved-name check per row
// (the caller's TableReader / unmarshalFull already validated the
// descriptor once at bind time).
UnmarshalOptions.defaults().withSkipValidate(true).unmarshal(body, builder);
}

/**
* Render a row as a PXF body: one {@code <column> = <value>} entry per
* non-{@code null} cell, in column order. Empty cells produce no
* entry — the field stays absent from the decoder's perspective.
*/
static byte[] rowToPxfBody(List<String> columns, Ast.TableRow row) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < row.cells().size(); i++) {
Ast.Value cell = row.cells().get(i);
if (cell == null) continue;
sb.setLength(0);
sb.append(columns.get(i)).append(" = ");
writeCellValue(sb, cell);
sb.append('\n');
out.writeBytes(sb.toString().getBytes(StandardCharsets.UTF_8));
}
return out.toByteArray();
}

/**
* Format a single cell value as PXF text. v1 {@code @table} cells are
* scalar-shaped (no list, no block), so only the leaf-value variants
* appear; list and block AST nodes are unreachable here because
* {@code parseTableRow} / {@code consumeRowCell} rejects them before
* the streaming reader hands them to {@code bindRow}. Hand-constructed
* TableRow values bypass that check, so guard defensively.
*
* <p>The {@code NullVal} / {@code ListVal} / {@code BlockVal} cases
* don't need to read the bound variable, so they're checked via
* {@code instanceof} before the value-using switch. Java 21 standard
* pattern matching requires a variable binding on every {@code case}
* label; routing the no-binding cases out keeps the switch tidy and
* sidesteps CodeQL's "unused local variable" check.
*/
static void writeCellValue(StringBuilder sb, Ast.Value v) {
if (v instanceof Ast.NullVal) {
sb.append("null");
return;
}
if (v instanceof Ast.ListVal || v instanceof Ast.BlockVal) {
throw new IllegalArgumentException(
"BindRow: unexpected " + (v instanceof Ast.ListVal ? "list" : "block")
+ " value in cell (v1 @table cells are scalar-shaped)");
}
switch (v) {
case Ast.StringVal s ->
sb.append('"').append(Format.escape(s.value())).append('"');
case Ast.IntVal i -> sb.append(i.raw());
case Ast.FloatVal f -> sb.append(f.raw());
case Ast.BoolVal b -> sb.append(b.value() ? "true" : "false");
case Ast.BytesVal by ->
sb.append("b\"").append(Base64.getEncoder().encodeToString(by.value())).append('"');
case Ast.IdentVal id -> sb.append(id.name());
case Ast.TimestampVal t -> sb.append(t.raw());
case Ast.DurationVal d -> sb.append(d.raw());
default -> throw new IllegalArgumentException(
"BindRow: unexpected cell value type " + v.getClass().getSimpleName());
}
}
}
14 changes: 12 additions & 2 deletions pxf/src/main/java/org/protowire/pxf/Format.java
Original file line number Diff line number Diff line change
Expand Up @@ -103,13 +103,21 @@ private static void formatEntries(StringBuilder sb, List<Ast.Entry> entries, int
}

private static void formatValue(StringBuilder sb, Ast.Value v, int level) {
// NullVal has no payload to format — the bound variable would be
// unused, which CodeQL flags. Java 21 standard pattern matching
// requires a binding on every `case` label (unnamed `_` is a
// preview feature this project doesn't enable), so we route the
// no-binding case out before the switch.
if (v instanceof Ast.NullVal) {
sb.append("null");
return;
}
switch (v) {
case Ast.StringVal s -> { sb.append('"').append(escape(s.value())).append('"'); }
case Ast.StringVal s -> sb.append('"').append(escape(s.value())).append('"');
case Ast.IntVal i -> sb.append(i.raw());
case Ast.FloatVal f -> sb.append(f.raw());
case Ast.BoolVal b -> sb.append(b.value() ? "true" : "false");
case Ast.BytesVal by -> sb.append("b\"").append(Base64.getEncoder().encodeToString(by.value())).append('"');
case Ast.NullVal n -> sb.append("null");
case Ast.IdentVal id -> sb.append(id.name());
case Ast.TimestampVal t -> sb.append(t.raw());
case Ast.DurationVal d -> sb.append(d.raw());
Expand All @@ -130,6 +138,8 @@ private static void formatValue(StringBuilder sb, Ast.Value v, int level) {
writeIndent(sb, level);
sb.append('}');
}
default -> throw new IllegalStateException(
"Format: unexpected value type " + v.getClass().getSimpleName());
}
}

Expand Down
13 changes: 13 additions & 0 deletions pxf/src/main/java/org/protowire/pxf/Parser.java
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,19 @@ public static Ast.Document parse(String input) {
return parse(input.getBytes(StandardCharsets.UTF_8));
}

/**
* Parse a single {@code ( cell, cell, ... )} tuple as a {@code @table}
* row. Used by {@link TableReader} to decode each row's byte slice
* without re-running the full document grammar. {@code input} MUST
* start with {@code (} and contain a balanced row tuple.
*
* @param input row bytes including the surrounding parens
* @param expected expected cell count (column arity)
*/
static Ast.TableRow parseTableRow(byte[] input, int expected) {
return new Parser(input).parseTableRow(expected);
}

private void advance() {
while (true) {
current = lex.next();
Expand Down
Loading