diff --git a/CHANGELOG.md b/CHANGELOG.md index 0b146e9..bb3f5cb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -18,6 +18,41 @@ format changes. ## [Unreleased] +### v1.0 spec changes + +Three one-time spec changes from the protowire v1.0 freeze line +(STABILITY.md). **Breaking** — there is no alias period; v1.0 is itself +the major bump. + +- `@table` directive renamed to `@dataset` (draft §3.4.4). Public API + follows: `Ast.TableDirective` → `Ast.DatasetDirective`, `Ast.TableRow` + → `Ast.DatasetRow`, `TableReader` → `DatasetReader`, + `Document.tables()` → `Document.datasets()`, `Result.tables()` → + `Result.datasets()`. Source files `TableReader.java`, + `TableReaderTest.java`, `TableParserTest.java` renamed accordingly. + Decoder semantics unchanged. + +- `@proto` directive added (draft §3.4.5). New `Ast.ProtoDirective` + record + `Ast.ProtoShape` enum (`ANONYMOUS`, `NAMED`, `SOURCE`, + `DESCRIPTOR`). Four body shapes lexically distinguished: + `@proto { ... }` (anonymous), `@proto pkg.Type { ... }` (named), + `@proto """..."""` (source), `@proto b"..."` (descriptor). Exposed + via `Document.protos()` and `Result.protos()`. Descriptor form is + the MUST-support shape per spec; this port supports all four. + +- Reserved directive names expanded from 5 to 13 (draft §3.4.6). + Decoder rejects `@table`, `@datasource`, `@view`, `@procedure`, + `@function`, `@permissions` as spec-reserved (future-allocated). + `SchemaValidator.FUTURE_RESERVED_DIRECTIVES` exposes the set. + +`@dataset`'s row message type is now optional in the AST. When +omitted, the directive consumes the typed binding of a preceding +anonymous `@proto` per draft §3.4.4 Anonymous binding. + +`Lexer.repositionTo(int)` added for the parser's `@proto` brace-body +skip (interior is protobuf source, not PXF, so the lexer hops past +the body rather than tokenising it). + ## [0.75.0] Catch-up release. First tagged version after v0.70.0; brings the Java diff --git a/README.md b/README.md index 7eb5d73..dd2651e 100644 --- a/README.md +++ b/README.md @@ -98,9 +98,9 @@ byte[] formatted = Pxf.formatDocument(doc); The AST is a sealed hierarchy of records (`Ast.Document`, `Ast.Entry`, `Ast.Value`); pattern matching is used throughout the formatter and decoder. -### Directives and `@table` (Result accessors) +### Directives and `@dataset` (Result accessors) -PXF documents can carry [`@` directives, `@entry` bundles, and `@table` rows](https://github.com/trendvidia/protowire#directives) at the document root alongside (or instead of) a message body. `unmarshalFull` captures all three on `Result`: +PXF documents can carry [`@` directives, `@entry` bundles, and `@dataset` rows](https://github.com/trendvidia/protowire#directives) at the document root alongside (or instead of) a message body. `unmarshalFull` captures all three on `Result`: ```java Result r = Pxf.unmarshalFull(pxfBytes, b); @@ -112,8 +112,8 @@ for (Ast.Directive d : r.directives()) { // chosen message, chameleon's @header pattern. } -for (Ast.TableDirective t : r.tables()) { - // t.type(), t.columns(), t.rows() List. +for (Ast.DatasetDirective t : r.datasets()) { + // t.type(), t.columns(), t.rows() List. // Each row.cells().get(i) is: // null — empty cell (field absent, pxf.default applies) // Ast.NullVal — explicit null (field cleared per §3.9) @@ -121,42 +121,42 @@ for (Ast.TableDirective t : r.tables()) { } ``` -`r.directives()` excludes `@type` and `@table` (those have their own accessors). Order is preserved. +`r.directives()` excludes `@type` and `@dataset` (those have their own accessors). Order is preserved. -### `TableReader`: streaming `@table` consumption +### `DatasetReader`: streaming `@dataset` consumption For datasets too large to materialize, read rows from an `InputStream` with working-set memory bounded by the size of the largest single row — not by the row sequence: ```java try (var in = Files.newInputStream(Path.of("trades.pxf"))) { - var tr = new TableReader(in); + var tr = new DatasetReader(in); String typ = tr.type(); List cols = tr.columns(); - List hdrs = tr.directives(); // side-channel directives before the @table header + List hdrs = tr.directives(); // side-channel directives before the @dataset header - Ast.TableRow row; + Ast.DatasetRow row; while ((row = tr.next()) != null) { // row.cells(): List with the three-state mapping above. } } ``` -`NewTableReader` throws `NoSuchElementException` if the input ends before any `@table` directive. Multi-table documents chain via `tr.tail()`, which returns an `InputStream` of the buffered-but-unconsumed bytes followed by the remaining source: +`NewDatasetReader` throws `NoSuchElementException` if the input ends before any `@dataset` directive. Multi-table documents chain via `tr.tail()`, which returns an `InputStream` of the buffered-but-unconsumed bytes followed by the remaining source: ```java -var tr1 = new TableReader(src); +var tr1 = new DatasetReader(src); // ... iterate tr1.next() until it returns null ... -var tr2 = new TableReader(tr1.tail()); +var tr2 = new DatasetReader(tr1.tail()); ``` Per-row arity and v1 cell-grammar errors (`[...]` / `{...}` cells, dotted columns) surface as the offending row is consumed, not deferred to end-of-input — see [draft §3.4.4 "Streaming consumption"](https://github.com/trendvidia/protowire/blob/main/docs/draft-trendvidia-protowire-00.txt). ### `scan` and `BindRow`: per-row binding -`TableReader.scan(builder)` reads the next row and binds its cells to the message by column name; returns `false` when the row sequence is exhausted: +`DatasetReader.scan(builder)` reads the next row and binds its cells to the message by column name; returns `false` when the row sequence is exhausted: ```java -var tr = new TableReader(in); +var tr = new DatasetReader(in); while (true) { var b = Trade.newBuilder(); if (!tr.scan(b)) break; @@ -164,12 +164,12 @@ while (true) { } ``` -`BindRow.bindRow(builder, columns, row)` is the same logic exposed standalone, for callers iterating `Result.tables()[i].rows()` on the materializing path: +`BindRow.bindRow(builder, columns, row)` is the same logic exposed standalone, for callers iterating `Result.datasets()[i].rows()` on the materializing path: ```java Ast.Document doc = Pxf.parse(pxfBytes); -for (Ast.TableDirective tbl : doc.tables()) { - for (Ast.TableRow row : tbl.rows()) { +for (Ast.DatasetDirective tbl : doc.datasets()) { + for (Ast.DatasetRow row : tbl.rows()) { var b = Trade.newBuilder(); BindRow.bindRow(b, tbl.columns(), row); process(b.build()); @@ -223,9 +223,9 @@ PXF (`:pxf`): - ✅ AST-preserving `formatDocument`. - ✅ **`@` named directives** at document root with raw-body extraction (`Ast.Directive`, `Result.directives()`). - ✅ **`@entry` bundle directive** (zero-or-more prefix list; four permitted shapes per draft §3.4.3). -- ✅ **`@table` directive** (the protowire-native CSV replacement) — `Ast.TableDirective`, `Ast.TableRow`, three-state cells, parser enforces row arity + dotted-column rejection + list/block-cell rejection + standalone-constraint. -- ✅ **Streaming `TableReader`** over `InputStream` for datasets too large to materialize. Working-set memory bounded by largest single row. -- ✅ **Per-row binding** via `TableReader.scan(Message.Builder)` and standalone `BindRow.bindRow(...)`. +- ✅ **`@dataset` directive** (the protowire-native CSV replacement) — `Ast.DatasetDirective`, `Ast.DatasetRow`, three-state cells, parser enforces row arity + dotted-column rejection + list/block-cell rejection + standalone-constraint. +- ✅ **Streaming `DatasetReader`** over `InputStream` for datasets too large to materialize. Working-set memory bounded by largest single row. +- ✅ **Per-row binding** via `DatasetReader.scan(Message.Builder)` and standalone `BindRow.bindRow(...)`. - ✅ **Schema reserved-name check** (`SchemaValidator.validateFile` / `validateDescriptor`) catches schemas declaring fields/oneofs/enum values named `null`/`true`/`false`. Runs by default on every `unmarshal*` call; `UnmarshalOptions.withSkipValidate(true)` opts out. SBE (`:sbe`): diff --git a/pxf/src/main/java/org/protowire/pxf/Ast.java b/pxf/src/main/java/org/protowire/pxf/Ast.java index fb417a4..e151d91 100644 --- a/pxf/src/main/java/org/protowire/pxf/Ast.java +++ b/pxf/src/main/java/org/protowire/pxf/Ast.java @@ -19,24 +19,28 @@ public record Comment(Position pos, String text) {} * * @param typeUrl body's message type from {@code @type}; may be empty * @param directives side-channel {@code @} directives at document - * root in source order (excludes {@code @type} and - * {@code @table}) - * @param tables {@code @table} directives in source order. Per draft - * §3.4.4 a document with any table MUST NOT also have - * a {@code typeUrl} or body entries; the parser - * enforces this + * root in source order (excludes spec-defined + * directives: {@code @type}, {@code @dataset}, + * {@code @proto}, {@code @entry}) + * @param datasets {@code @dataset} directives in source order. Per + * draft §3.4.4 a document with any dataset MUST + * NOT also have a {@code typeUrl} or body entries; + * the parser enforces this + * @param protos {@code @proto} directives in source order + * (draft §3.4.5) * @param entries message body entries * @param leadingComments comments before any directive or body entry */ public record Document( String typeUrl, List directives, - List tables, + List datasets, + List protos, List entries, List leadingComments) { public static Document of(String typeUrl, List entries) { - return new Document(typeUrl, List.of(), List.of(), List.copyOf(entries), List.of()); + return new Document(typeUrl, List.of(), List.of(), List.of(), List.copyOf(entries), List.of()); } } @@ -74,31 +78,78 @@ public record Directive( List leadingComments) {} /** - * A {@code @table ( col1, col2, ... ) row*} directive at document - * root (draft §3.4.4). Carries many instances of one message type — the - * protowire-native CSV replacement. + * A {@code @dataset ( col1, col2, ... ) row*} directive at + * document root (draft §3.4.4). Carries many instances of one message + * type — the protowire-native CSV replacement. * *

v1 cell-grammar restrictions enforced by the parser: cells exclude * list and block values; column entries are unqualified field names * (no dotted paths); row arity equals column count; documents with any - * {@code @table} MUST NOT carry {@code @type} or body field entries. + * {@code @dataset} MUST NOT carry {@code @type} or body field entries. + * + *

{@code type} MAY be empty when an anonymous {@code @proto} + * directive (Section 3.4.5) precedes the dataset in document order; + * the anonymous schema is consumed as the row message type. */ - public record TableDirective( + public record DatasetDirective( Position pos, String type, List columns, - List rows, + List rows, List leadingComments) {} /** - * One parenthesized cell tuple in a {@link TableDirective}. The cells - * list has the same length as the containing table's column list. + * One parenthesized cell tuple in a {@link DatasetDirective}. The cells + * list has the same length as the containing dataset's column list. * A {@code null} entry in cells denotes an absent field (the "empty * cell" between two commas); a {@link NullVal} denotes a present-but- * null field; any other {@link Value} denotes a present field with that * value. */ - public record TableRow(Position pos, List cells) {} + public record DatasetRow(Position pos, List cells) {} + + /** + * Shape of a {@link ProtoDirective}'s body (draft §3.4.5). + */ + public enum ProtoShape { + /** {@code @proto { }} — defines an unnamed message. */ + ANONYMOUS, + /** {@code @proto { }} — single named message. */ + NAMED, + /** {@code @proto """"""} — complete .proto source file. */ + SOURCE, + /** {@code @proto b""} — compiled descriptor. */ + DESCRIPTOR; + } + + /** + * A {@code @proto } directive at document root (draft §3.4.5). + * Carries an embedded protobuf schema, making the PXF document + * self-describing. The shape distinguishes the four lexically-determined + * body forms. + * + *

{@code body} carries raw bytes per shape: + *

    + *
  • {@link ProtoShape#ANONYMOUS}, {@link ProtoShape#NAMED}: bytes + * between the opening {@code {} and matching {@code }} (both + * exclusive). The bytes are protobuf message-body source.
  • + *
  • {@link ProtoShape#SOURCE}: contents of the triple-quoted string + * (with leading-LF / dedent applied). The bytes are a complete + * {@code .proto} source file.
  • + *
  • {@link ProtoShape#DESCRIPTOR}: base64-decoded bytes of the + * bytes literal. The bytes are a serialised + * {@code google.protobuf.FileDescriptorSet}.
  • + *
+ * + *

{@code typeName} is non-empty only when {@code shape} is + * {@link ProtoShape#NAMED}. + */ + public record ProtoDirective( + Position pos, + ProtoShape shape, + String typeName, + byte[] body, + List leadingComments) {} /** A single entry inside a message body: assignment, map entry, or block. */ public sealed interface Entry permits Assignment, MapEntry, Block { diff --git a/pxf/src/main/java/org/protowire/pxf/BindRow.java b/pxf/src/main/java/org/protowire/pxf/BindRow.java index 1b2e488..6c96092 100644 --- a/pxf/src/main/java/org/protowire/pxf/BindRow.java +++ b/pxf/src/main/java/org/protowire/pxf/BindRow.java @@ -10,8 +10,8 @@ import java.util.List; /** - * Per-row proto-binding helper for {@code @table} rows. Sits atop the - * streaming {@link TableReader} (via {@link TableReader#scan}) and is also + * Per-row proto-binding helper for {@code @dataset} rows. Sits atop the + * streaming {@link DatasetReader} (via {@link DatasetReader#scan}) and is also * exported as a standalone helper for callers that iterate the * materializing path's {@link Result#tables()} rows. * @@ -49,7 +49,7 @@ private BindRow() {} * surfaces as a "field not found" error from the underlying unmarshal * call (unless {@link UnmarshalOptions#discardUnknown} is set). */ - public static void bindRow(Message.Builder builder, List columns, Ast.TableRow row) { + public static void bindRow(Message.Builder builder, List columns, Ast.DatasetRow row) { if (columns.size() != row.cells().size()) { throw new IllegalArgumentException( "BindRow: " + columns.size() + " columns vs " + row.cells().size() + " cells"); @@ -57,7 +57,7 @@ public static void bindRow(Message.Builder builder, List columns, Ast.Ta byte[] body = rowToPxfBody(columns, row); // Run the synthetic body through the standard unmarshal pipeline. // SkipValidate avoids re-running the reserved-name check per row - // (the caller's TableReader / unmarshalFull already validated the + // (the caller's DatasetReader / unmarshalFull already validated the // descriptor once at bind time). UnmarshalOptions.defaults().withSkipValidate(true).unmarshal(body, builder); } @@ -67,7 +67,7 @@ public static void bindRow(Message.Builder builder, List columns, Ast.Ta * non-{@code null} cell, in column order. Empty cells produce no * entry — the field stays absent from the decoder's perspective. */ - static byte[] rowToPxfBody(List columns, Ast.TableRow row) { + static byte[] rowToPxfBody(List columns, Ast.DatasetRow row) { ByteArrayOutputStream out = new ByteArrayOutputStream(); StringBuilder sb = new StringBuilder(); for (int i = 0; i < row.cells().size(); i++) { @@ -83,12 +83,12 @@ static byte[] rowToPxfBody(List columns, Ast.TableRow row) { } /** - * Format a single cell value as PXF text. v1 {@code @table} cells are + * Format a single cell value as PXF text. v1 {@code @dataset} cells are * scalar-shaped (no list, no block), so only the leaf-value variants * appear; list and block AST nodes are unreachable here because - * {@code parseTableRow} / {@code consumeRowCell} rejects them before + * {@code parseDatasetRow} / {@code consumeRowCell} rejects them before * the streaming reader hands them to {@code bindRow}. Hand-constructed - * TableRow values bypass that check, so guard defensively. + * DatasetRow values bypass that check, so guard defensively. * *

The {@code NullVal} / {@code ListVal} / {@code BlockVal} cases * don't need to read the bound variable, so they're checked via @@ -105,7 +105,7 @@ static void writeCellValue(StringBuilder sb, Ast.Value v) { if (v instanceof Ast.ListVal || v instanceof Ast.BlockVal) { throw new IllegalArgumentException( "BindRow: unexpected " + (v instanceof Ast.ListVal ? "list" : "block") - + " value in cell (v1 @table cells are scalar-shaped)"); + + " value in cell (v1 @dataset cells are scalar-shaped)"); } switch (v) { case Ast.StringVal s -> diff --git a/pxf/src/main/java/org/protowire/pxf/TableReader.java b/pxf/src/main/java/org/protowire/pxf/DatasetReader.java similarity index 79% rename from pxf/src/main/java/org/protowire/pxf/TableReader.java rename to pxf/src/main/java/org/protowire/pxf/DatasetReader.java index ddcab76..db2500e 100644 --- a/pxf/src/main/java/org/protowire/pxf/TableReader.java +++ b/pxf/src/main/java/org/protowire/pxf/DatasetReader.java @@ -13,9 +13,9 @@ import java.util.NoSuchElementException; /** - * Streaming consumption for the {@code @table} directive (draft §3.4.4 + * Streaming consumption for the {@code @dataset} directive (draft §3.4.4 * "Streaming consumption"). Pulls bytes from an {@link InputStream} on - * demand and yields one {@link Ast.TableRow} per {@link #next()} call, + * demand and yields one {@link Ast.DatasetRow} per {@link #next()} call, * with working-set memory bounded by the size of the largest single row. * Use it for datasets too large to materialize via {@link Pxf#unmarshal} / * {@link Pxf#unmarshalFull}. @@ -24,24 +24,24 @@ * cell-grammar rule on each row as it is consumed (not deferred to end of * input), and MUST yield rows in source order. Both invariants fall out * of the implementation here: the row-boundary scanner produces one - * {@code ( ... )} byte slice at a time, and {@link Parser#parseTableRow} + * {@code ( ... )} byte slice at a time, and {@link Parser#parseDatasetRow} * decodes it. * - *

A TableReader is positioned at the first row after - * {@link #TableReader(InputStream)} returns. Call {@link #next()} in a + *

A DatasetReader is positioned at the first row after + * {@link #DatasetReader(InputStream)} returns. Call {@link #next()} in a * loop until it returns {@code null}; the row sequence is exhausted at - * that point. For documents containing multiple {@code @table} - * directives, construct a second TableReader from {@link #tail()}. + * that point. For documents containing multiple {@code @dataset} + * directives, construct a second DatasetReader from {@link #tail()}. * - *

A TableReader is NOT safe for concurrent use. + *

A DatasetReader is NOT safe for concurrent use. */ -public final class TableReader { +public final class DatasetReader { /** - * Cap on the byte budget for the {@code @table} header (leading - * directives + {@code @table TYPE (col1, col2, ...)}). Real headers - * are tiny; the cap exists to fail-fast on misuse — a TableReader - * pointed at a multi-gigabyte document with no {@code @table} + * Cap on the byte budget for the {@code @dataset} header (leading + * directives + {@code @dataset TYPE (col1, col2, ...)}). Real headers + * are tiny; the cap exists to fail-fast on misuse — a DatasetReader + * pointed at a multi-gigabyte document with no {@code @dataset} * directive shouldn't OOM trying to find one. */ private static final int DEFAULT_HEADER_MAX_BYTES = 64 * 1024; @@ -64,43 +64,43 @@ public final class TableReader { /** * Consume any leading directives ({@code @type}, {@code @}, etc.) - * and the {@code @table TYPE ( cols )} header, returning a reader + * and the {@code @dataset TYPE ( cols )} header, returning a reader * positioned at the first row. * * @throws NoSuchElementException if the input ends before any - * {@code @table} directive is seen + * {@code @dataset} directive is seen * @throws PxfException on a malformed header * @throws IOException on an underlying {@link InputStream} * read failure */ - public TableReader(InputStream src) throws IOException { + public DatasetReader(InputStream src) throws IOException { this.src = src; readHeader(); } - /** The row message type declared by the {@code @table} header. */ + /** The row message type declared by the {@code @dataset} header. */ public String type() { return type; } - /** The column field names declared by the {@code @table} header, in source order. */ + /** The column field names declared by the {@code @dataset} header, in source order. */ public List columns() { return columns; } /** * The side-channel directives ({@code @} / {@code @entry} / - * etc., NOT {@code @type} or {@code @table}) that appeared before the - * {@code @table} header. Stable for the reader's lifetime. + * etc., NOT {@code @type} or {@code @dataset}) that appeared before the + * {@code @dataset} header. Stable for the reader's lifetime. */ public List directives() { return directives; } /** * Returns an {@link InputStream} that yields the bytes the reader has * buffered but not consumed, followed by any remaining bytes from the - * underlying source. Use it to chain a second TableReader for - * documents containing multiple {@code @table} directives: + * underlying source. Use it to chain a second DatasetReader for + * documents containing multiple {@code @dataset} directives: * *

{@code
-     * var tr1 = new TableReader(src);
+     * var tr1 = new DatasetReader(src);
      * // ... iterate tr1.next() until null ...
-     * var tr2 = new TableReader(tr1.tail());
+     * var tr2 = new DatasetReader(tr1.tail());
      * }
* *

MUST only be called after {@link #next()} has returned {@code null}. @@ -117,10 +117,10 @@ public InputStream tail() { * sequence is exhausted. After {@code null} (or any other error), all * subsequent calls return {@code null} or rethrow the sticky error. * - *

The returned {@link Ast.TableRow}'s cells list is freshly + *

The returned {@link Ast.DatasetRow}'s cells list is freshly * allocated; reading the next row does not invalidate it. */ - public Ast.TableRow next() throws IOException { + public Ast.DatasetRow next() throws IOException { if (stickyError != null) throw stickyError; if (finished) return null; for (;;) { @@ -129,9 +129,9 @@ public Ast.TableRow next() throws IOException { int start = rowRange[0]; int end = rowRange[1]; byte[] rowBytes = sliceBytes(pending, start, end + 1); - Ast.TableRow row; + Ast.DatasetRow row; try { - row = Parser.parseTableRow(rowBytes, columns.size()); + row = Parser.parseDatasetRow(rowBytes, columns.size()); } catch (PxfException e) { stickyError = e; throw e; @@ -158,7 +158,7 @@ public Ast.TableRow next() throws IOException { * value sets the field. */ public boolean scan(Message.Builder builder) throws IOException { - Ast.TableRow row = next(); + Ast.DatasetRow row = next(); if (row == null) return false; BindRow.bindRow(builder, columns, row); return true; @@ -171,18 +171,18 @@ private void readHeader() throws IOException { int headerEnd = scanHeaderEnd(pending); if (headerEnd >= 0) { // Parse the header prefix as a (rowless) PXF document. - // Parser is happy with an @table directive that has no + // Parser is happy with an @dataset directive that has no // rows yet, and validates everything we care about - // (leading-directive shape, @type/@table conflict, + // (leading-directive shape, @type/@dataset conflict, // dotted columns, etc.). byte[] headerBytes = sliceBytes(pending, 0, headerEnd + 1); Ast.Document doc = Parser.parse(headerBytes); - if (doc.tables().isEmpty()) { - // Should not happen — scanHeaderEnd found an @table — + if (doc.datasets().isEmpty()) { + // Should not happen — scanHeaderEnd found an @dataset — // but defensive. - throw new NoSuchElementException("pxf: no @table directive in stream"); + throw new NoSuchElementException("pxf: no @dataset directive in stream"); } - Ast.TableDirective tbl = doc.tables().get(0); + Ast.DatasetDirective tbl = doc.datasets().get(0); this.type = tbl.type(); this.columns = tbl.columns(); this.directives = doc.directives(); @@ -190,12 +190,12 @@ private void readHeader() throws IOException { return; } if (srcEof) { - throw new NoSuchElementException("pxf: no @table directive in stream"); + throw new NoSuchElementException("pxf: no @dataset directive in stream"); } if (pending.length >= DEFAULT_HEADER_MAX_BYTES) { throw new PxfException(Position.UNKNOWN, - "pxf: @table header exceeds " + DEFAULT_HEADER_MAX_BYTES + " bytes; " - + "check that the input begins with `@table TYPE (cols)`"); + "pxf: @dataset header exceeds " + DEFAULT_HEADER_MAX_BYTES + " bytes; " + + "check that the input begins with `@dataset TYPE (cols)`"); } pull(STREAM_PULL_SIZE); } @@ -224,7 +224,7 @@ private void pull(int n) throws IOException { /** * Search {@code input} for the first complete - * {@code @table TYPE ( cols )} directive and return the index of the + * {@code @dataset TYPE ( cols )} directive and return the index of the * {@code )} that closes its column list. Returns -1 if the input * ends before the header is complete (caller should pull more bytes). * Throws {@link PxfException} on malformed string/comment. @@ -232,34 +232,39 @@ private void pull(int n) throws IOException { static int scanHeaderEnd(byte[] input) { int atIdx = findAtTable(input); if (atIdx < 0) return -1; - int lparen = findNextChar(input, atIdx + "@table".length(), '('); + int lparen = findNextChar(input, atIdx + "@dataset".length(), '('); if (lparen < 0) return -1; return findMatchingParen(input, lparen); } /** - * Return the byte offset of the next {@code @table} keyword outside + * Return the byte offset of the next {@code @dataset} keyword outside * strings/comments. The match must be followed by a non-identifier - * byte so we don't false-match {@code @tableau}. Returns -1 when not + * byte so we don't false-match {@code @datasetau}. Returns -1 when not * found or when the input ends mid-construct. */ static int findAtTable(byte[] input) { + final byte[] needle = {'@', 'd', 'a', 't', 'a', 's', 'e', 't'}; int i = 0; while (i < input.length) { int j = skipStringOrComment(input, i); if (j == NEED_MORE) return -1; if (j != i) { i = j; continue; } - if (input[i] == '@' && i + 6 <= input.length - && input[i + 1] == 't' && input[i + 2] == 'a' && input[i + 3] == 'b' - && input[i + 4] == 'l' && input[i + 5] == 'e') { - int after = i + 6; - if (after == input.length) { - // `@table` followed by more bytes we haven't seen yet - // — be conservative. - return -1; + if (input[i] == '@' && i + needle.length <= input.length) { + boolean match = true; + for (int k = 1; k < needle.length; k++) { + if (input[i + k] != needle[k]) { match = false; break; } } - if (!isIdentPart(input[after])) { - return i; + if (match) { + int after = i + needle.length; + if (after == input.length) { + // `@dataset` followed by more bytes we haven't seen + // yet — be conservative. + return -1; + } + if (!isIdentPart(input[after])) { + return i; + } } } i++; diff --git a/pxf/src/main/java/org/protowire/pxf/FastDecoder.java b/pxf/src/main/java/org/protowire/pxf/FastDecoder.java index 8f38486..1d5321f 100644 --- a/pxf/src/main/java/org/protowire/pxf/FastDecoder.java +++ b/pxf/src/main/java/org/protowire/pxf/FastDecoder.java @@ -59,7 +59,7 @@ void decode(Message.Builder b) { } // Drain leading directives. @type populates the type binding; - // @ and @table are side-channel — for the v0.72/v0.73 + // @ and @dataset are side-channel — for the v0.72/v0.73 // parser-side port we consume + discard so the body decode path // still works on documents that carry them. Full directive // recording on Result + per-table accessor land in a follow-up. @@ -72,7 +72,7 @@ void decode(Message.Builder b) { case AT_TYPE -> { if (sawTable) { throw new PxfException(current.pos(), - "@table directive cannot coexist with @type (draft §3.4.4)"); + "@dataset directive cannot coexist with @type (draft §3.4.4)"); } sawType = true; advance(); @@ -86,22 +86,26 @@ void decode(Message.Builder b) { Ast.Directive dir = consumeNamedDirective(); if (trackPresence) result.addDirective(dir); } - case AT_TABLE -> { + case AT_DATASET -> { if (sawType) { throw new PxfException(current.pos(), - "@table directive cannot coexist with @type (draft §3.4.4)"); + "@dataset directive cannot coexist with @type (draft §3.4.4)"); } if (firstTablePos == null) firstTablePos = current.pos(); sawTable = true; - Ast.TableDirective tbl = consumeTableDirective(); - if (trackPresence) result.addTable(tbl); + Ast.DatasetDirective tbl = consumeDatasetDirective(); + if (trackPresence) result.addDataset(tbl); + } + case AT_PROTO -> { + Ast.ProtoDirective pd = consumeProtoDirective(); + if (trackPresence) result.addProto(pd); } default -> { break directives; } } } if (sawTable && current.kind() != TokenKind.EOF) { throw new PxfException(firstTablePos, - "@table directive cannot coexist with top-level field entries (draft §3.4.4)"); + "@dataset directive cannot coexist with top-level field entries (draft §3.4.4)"); } decodeFields(b, false); if (trackPresence) postDecode(b, ""); @@ -116,6 +120,10 @@ void decode(Message.Builder b) { private Ast.Directive consumeNamedDirective() { Position pp = current.pos(); String name = current.value(); + if (SchemaValidator.FUTURE_RESERVED_DIRECTIVES.contains(name)) { + throw new PxfException(pp, + "@" + name + " is a spec-reserved directive name with no v1 semantics (draft §3.4.6)"); + } advance(); // consume @ // Zero or more prefix identifiers, with one-token lookahead so an @@ -160,29 +168,28 @@ private Ast.Directive consumeNamedDirective() { } /** - * Consume a {@code @table ( cols ) ( vals )...} directive and - * return an {@link Ast.TableDirective} record. AT_TABLE is current on + * Consume a {@code @dataset ( cols ) ( vals )...} directive and + * return an {@link Ast.DatasetDirective} record. AT_DATASET is current on * entry. The same parser-tier enforcement applies: row arity, dotted- * column rejection, list/block-cell rejection. */ - private Ast.TableDirective consumeTableDirective() { + private Ast.DatasetDirective consumeDatasetDirective() { Position pp = current.pos(); - advance(); // consume @table + advance(); // consume @dataset - if (current.kind() != TokenKind.IDENT) { - throw new PxfException(current.pos(), - "expected row message type after @table, got " + current.kind()); + String type = ""; + if (current.kind() == TokenKind.IDENT) { + type = current.value(); + advance(); } - String type = current.value(); - advance(); if (current.kind() != TokenKind.LPAREN) { throw new PxfException(current.pos(), - "expected '(' to start @table column list, got " + current.kind()); + "expected '(' to start @dataset column list, got " + current.kind()); } advance(); if (current.kind() != TokenKind.IDENT) { throw new PxfException(current.pos(), - "@table column list must contain at least one field name, got " + current.kind()); + "@dataset column list must contain at least one field name, got " + current.kind()); } java.util.List columns = new java.util.ArrayList<>(); while (true) { @@ -193,28 +200,28 @@ private Ast.TableDirective consumeTableDirective() { String colName = current.value(); if (colName.indexOf('.') >= 0) { throw new PxfException(current.pos(), - "@table column \"" + colName + "\": dotted column paths are not supported in v1 (draft §3.4.4)"); + "@dataset column \"" + colName + "\": dotted column paths are not supported in v1 (draft §3.4.4)"); } columns.add(colName); advance(); if (current.kind() == TokenKind.COMMA) { advance(); continue; } if (current.kind() == TokenKind.RPAREN) break; throw new PxfException(current.pos(), - "expected ',' or ')' in @table column list, got " + current.kind()); + "expected ',' or ')' in @dataset column list, got " + current.kind()); } advance(); // consume ) - java.util.List rows = new java.util.ArrayList<>(); + java.util.List rows = new java.util.ArrayList<>(); while (current.kind() == TokenKind.LPAREN) { - rows.add(consumeTableRow(columns.size())); + rows.add(consumeDatasetRow(columns.size())); } - return new Ast.TableDirective(pp, type, + return new Ast.DatasetDirective(pp, type, java.util.List.copyOf(columns), java.util.List.copyOf(rows), java.util.List.of()); } - private Ast.TableRow consumeTableRow(int expected) { + private Ast.DatasetRow consumeDatasetRow(int expected) { Position pp = current.pos(); advance(); // consume ( @@ -226,20 +233,88 @@ private Ast.TableRow consumeTableRow(int expected) { } if (current.kind() != TokenKind.RPAREN) { throw new PxfException(current.pos(), - "expected ',' or ')' in @table row, got " + current.kind()); + "expected ',' or ')' in @dataset row, got " + current.kind()); } advance(); if (cells.size() != expected) { throw new PxfException(pp, - "@table row has " + cells.size() + " cells, expected " + expected + " (column count)"); + "@dataset row has " + cells.size() + " cells, expected " + expected + " (column count)"); } // Cells legitimately contain null entries; List.copyOf rejects nulls. - return new Ast.TableRow(pp, + return new Ast.DatasetRow(pp, java.util.Collections.unmodifiableList(new java.util.ArrayList<>(cells))); } /** - * Consume one cell of a @table row. Returns {@code null} for an empty + * Consume a {@code @proto } directive (draft §3.4.5). AT_PROTO + * is current on entry. Mirrors {@link Parser#parseProtoDirective}: the + * four body shapes (anonymous / named / source / descriptor) are + * distinguished lexically by the next token after {@code @proto}. + */ + private Ast.ProtoDirective consumeProtoDirective() { + Position pp = current.pos(); + advance(); // consume @proto + + switch (current.kind()) { + case LBRACE -> { + byte[] body = captureBraceBody("@proto (anonymous form)"); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.ANONYMOUS, "", body, java.util.List.of()); + } + case IDENT -> { + String typeName = current.value(); + advance(); + if (current.kind() != TokenKind.LBRACE) { + throw new PxfException(current.pos(), + "expected '{' after @proto " + typeName + ", got " + current.kind()); + } + byte[] body = captureBraceBody("@proto " + typeName); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.NAMED, typeName, body, java.util.List.of()); + } + case STRING -> { + byte[] body = current.value().getBytes(java.nio.charset.StandardCharsets.UTF_8); + advance(); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.SOURCE, "", body, java.util.List.of()); + } + case BYTES -> { + String raw = current.value(); + byte[] decoded; + try { + decoded = java.util.Base64.getDecoder().decode(raw); + } catch (IllegalArgumentException e1) { + try { + decoded = java.util.Base64.getUrlDecoder().decode(raw); + } catch (IllegalArgumentException e2) { + throw new PxfException(current.pos(), + "@proto descriptor body: invalid base64: " + e1.getMessage()); + } + } + advance(); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.DESCRIPTOR, "", decoded, java.util.List.of()); + } + default -> throw new PxfException(current.pos(), + "expected '{', dotted identifier, triple-quoted string, or b\"...\" after @proto, got " + current.kind()); + } + } + + /** + * Slice raw bytes between {@code {} and the matching {@code }} (both + * exclusive) without decoding the body as PXF. LBRACE is current on + * entry. Repositions the lexer past the closing brace. + */ + private byte[] captureBraceBody(String label) { + int open = current.pos().offset(); + int close = lex.findMatchingBrace(open); + if (close < 0) { + throw new PxfException(current.pos(), label + ": unmatched '{'"); + } + byte[] body = lex.sliceBytes(open + 1, close); + lex.repositionTo(close + 1); + advance(); + return body; + } + + /** + * Consume one cell of a @dataset row. Returns {@code null} for an empty * cell (no value between commas, or at row start/end). Rejects list * and block values per v1 cell-grammar (draft §3.4.4). */ @@ -247,9 +322,9 @@ private Ast.Value consumeRowCell() { switch (current.kind()) { case COMMA, RPAREN -> { return null; } case LBRACKET -> throw new PxfException(current.pos(), - "@table cells cannot contain list values in v1 (draft §3.4.4)"); + "@dataset cells cannot contain list values in v1 (draft §3.4.4)"); case LBRACE -> throw new PxfException(current.pos(), - "@table cells cannot contain block values in v1 (draft §3.4.4)"); + "@dataset cells cannot contain block values in v1 (draft §3.4.4)"); default -> { /* fall through to value consumer */ } } return consumeAstValue(); @@ -259,7 +334,7 @@ private Ast.Value consumeRowCell() { * Consume one PXF leaf value from the current token stream and * return the matching {@link Ast.Value} record. Mirrors * {@link Parser#parseValue()} for the subset of values that can - * appear in a @table cell (list and block values are rejected at + * appear in a @dataset cell (list and block values are rejected at * {@link #consumeRowCell} before reaching here). * *

Duplicates the switch arms in Parser.parseValue. The diff --git a/pxf/src/main/java/org/protowire/pxf/Format.java b/pxf/src/main/java/org/protowire/pxf/Format.java index 983a342..86b624a 100644 --- a/pxf/src/main/java/org/protowire/pxf/Format.java +++ b/pxf/src/main/java/org/protowire/pxf/Format.java @@ -19,7 +19,7 @@ public static String formatDocument(Ast.Document doc) { formatDirective(sb, d); sb.append('\n'); } - for (Ast.TableDirective t : doc.tables()) { + for (Ast.DatasetDirective t : doc.datasets()) { formatTableDirective(sb, t); sb.append('\n'); } @@ -39,14 +39,14 @@ private static void formatDirective(StringBuilder sb, Ast.Directive d) { sb.append('\n'); } - private static void formatTableDirective(StringBuilder sb, Ast.TableDirective t) { - sb.append("@table ").append(t.type()).append(" ("); + private static void formatTableDirective(StringBuilder sb, Ast.DatasetDirective t) { + sb.append("@dataset ").append(t.type()).append(" ("); for (int i = 0; i < t.columns().size(); i++) { if (i > 0) sb.append(", "); sb.append(t.columns().get(i)); } sb.append(")\n"); - for (Ast.TableRow row : t.rows()) { + for (Ast.DatasetRow row : t.rows()) { sb.append('('); for (int i = 0; i < row.cells().size(); i++) { if (i > 0) sb.append(", "); diff --git a/pxf/src/main/java/org/protowire/pxf/Lexer.java b/pxf/src/main/java/org/protowire/pxf/Lexer.java index 43ebd63..88acc6e 100644 --- a/pxf/src/main/java/org/protowire/pxf/Lexer.java +++ b/pxf/src/main/java/org/protowire/pxf/Lexer.java @@ -62,6 +62,22 @@ int[] lineColAt(int off) { return new int[] {l, c}; } + /** + * Reposition the lexer to a byte offset, recomputing line/col so + * subsequent error messages stay accurate. Used by Parser to skip + * past an {@code @proto} brace-bounded body whose interior is + * protobuf source (not PXF) without lexing through it. + */ + void repositionTo(int target) { + if (target < 0 || target > input.length) { + throw new IllegalArgumentException("repositionTo: out of bounds " + target); + } + int[] lc = lineColAt(target); + this.pos = target; + this.line = lc[0]; + this.col = lc[1]; + } + /** Slice a raw byte range from the input, copy-on-read. */ byte[] sliceBytes(int from, int to) { if (from < 0 || to > input.length || from > to) { @@ -426,7 +442,8 @@ private Token lexDirective(Position pp) { String name = slice(start, pos); if (name.isEmpty()) return new Token(TokenKind.ILLEGAL, "@", pp); if ("type".equals(name)) return new Token(TokenKind.AT_TYPE, "@type", pp); - if ("table".equals(name)) return new Token(TokenKind.AT_TABLE, "@table", pp); + if ("dataset".equals(name)) return new Token(TokenKind.AT_DATASET, "@dataset", pp); + if ("proto".equals(name)) return new Token(TokenKind.AT_PROTO, "@proto", pp); return new Token(TokenKind.AT_DIRECTIVE, name, pp); } diff --git a/pxf/src/main/java/org/protowire/pxf/Parser.java b/pxf/src/main/java/org/protowire/pxf/Parser.java index 9b0ce02..9c663b9 100644 --- a/pxf/src/main/java/org/protowire/pxf/Parser.java +++ b/pxf/src/main/java/org/protowire/pxf/Parser.java @@ -31,16 +31,16 @@ public static Ast.Document parse(String input) { } /** - * Parse a single {@code ( cell, cell, ... )} tuple as a {@code @table} - * row. Used by {@link TableReader} to decode each row's byte slice + * Parse a single {@code ( cell, cell, ... )} tuple as a {@code @dataset} + * row. Used by {@link DatasetReader} to decode each row's byte slice * without re-running the full document grammar. {@code input} MUST * start with {@code (} and contain a balanced row tuple. * * @param input row bytes including the surrounding parens * @param expected expected cell count (column arity) */ - static Ast.TableRow parseTableRow(byte[] input, int expected) { - return new Parser(input).parseTableRow(expected); + static Ast.DatasetRow parseDatasetRow(byte[] input, int expected) { + return new Parser(input).parseDatasetRow(expected); } private void advance() { @@ -66,11 +66,12 @@ private Ast.Document parseDocument() { List leading = flushComments(); String typeUrl = ""; List directives = new ArrayList<>(); - List tables = new ArrayList<>(); + List datasets = new ArrayList<>(); + List protos = new ArrayList<>(); - // Top-of-document directives. @type, @, and @table may - // interleave in any order; @type populates typeUrl, others append - // to their respective lists. + // Top-of-document directives. @type, @, @dataset, and @proto + // may interleave in any order; @type populates typeUrl, others + // append to their respective lists. directives: while (true) { switch (current.kind()) { @@ -84,7 +85,8 @@ private Ast.Document parseDocument() { advance(); } case AT_DIRECTIVE -> directives.add(parseDirective()); - case AT_TABLE -> tables.add(parseTableDirective()); + case AT_DATASET -> datasets.add(parseDatasetDirective()); + case AT_PROTO -> protos.add(parseProtoDirective()); default -> { break directives; } @@ -92,16 +94,16 @@ private Ast.Document parseDocument() { } // Standalone constraint (draft §3.4.4): a document containing any - // @table directive MUST NOT also carry @type or top-level field - // entries — the @table header IS the document's type declaration. - if (!tables.isEmpty()) { + // @dataset directive MUST NOT also carry @type or top-level field + // entries — the @dataset header IS the document's type declaration. + if (!datasets.isEmpty()) { if (!typeUrl.isEmpty()) { - throw new PxfException(tables.get(0).pos(), - "@table directive cannot coexist with @type; the @table header declares the document's type (draft §3.4.4)"); + throw new PxfException(datasets.get(0).pos(), + "@dataset directive cannot coexist with @type; the @dataset header declares the document's type (draft §3.4.4)"); } if (current.kind() != TokenKind.EOF) { throw new PxfException(current.pos(), - "@table directive cannot coexist with top-level field entries; the document's payload is the @table rows (draft §3.4.4)"); + "@dataset directive cannot coexist with top-level field entries; the document's payload is the @dataset rows (draft §3.4.4)"); } } @@ -112,8 +114,8 @@ private Ast.Document parseDocument() { // reserved for the inside of a `{ ... }` block. entries.add(parseEntry(false)); } - return new Ast.Document(typeUrl, List.copyOf(directives), List.copyOf(tables), - List.copyOf(entries), leading); + return new Ast.Document(typeUrl, List.copyOf(directives), List.copyOf(datasets), + List.copyOf(protos), List.copyOf(entries), leading); } /** @@ -130,6 +132,10 @@ private Ast.Directive parseDirective() { List leading = flushComments(); Position pp = current.pos(); String name = current.value(); + if (SchemaValidator.FUTURE_RESERVED_DIRECTIVES.contains(name)) { + throw new PxfException(pp, + "@" + name + " is a spec-reserved directive name with no v1 semantics (draft §3.4.6)"); + } advance(); // consume AT_DIRECTIVE List prefixes = new ArrayList<>(); @@ -161,30 +167,33 @@ private Ast.Directive parseDirective() { } /** - * Parse a {@code @table ( col1, col2, ... ) row*} directive. - * AT_TABLE is current on entry (draft §3.4.4). + * Parse a {@code @dataset ( col1, col2, ... ) row*} directive. + * AT_DATASET is current on entry (draft §3.4.4). + * + *

The row message type MAY be omitted when an anonymous + * {@code @proto} directive precedes the dataset (draft §3.4.4 + * Anonymous binding). */ - private Ast.TableDirective parseTableDirective() { + private Ast.DatasetDirective parseDatasetDirective() { List leading = flushComments(); Position pp = current.pos(); - advance(); // consume @table + advance(); // consume @dataset - if (current.kind() != TokenKind.IDENT) { - throw new PxfException(current.pos(), - "expected row message type after @table, got " + current.kind()); + String type = ""; + if (current.kind() == TokenKind.IDENT) { + type = current.value(); + advance(); } - String type = current.value(); - advance(); if (current.kind() != TokenKind.LPAREN) { throw new PxfException(current.pos(), - "expected '(' to start @table column list, got " + current.kind()); + "expected '(' to start @dataset column list, got " + current.kind()); } advance(); if (current.kind() != TokenKind.IDENT) { throw new PxfException(current.pos(), - "@table column list must contain at least one field name, got " + current.kind()); + "@dataset column list must contain at least one field name, got " + current.kind()); } List columns = new ArrayList<>(); while (true) { @@ -197,7 +206,7 @@ private Ast.TableDirective parseTableDirective() { // reserved for a future revision. if (colName.indexOf('.') >= 0) { throw new PxfException(current.pos(), - "@table column \"" + colName + "\": dotted column paths are not supported in v1 (draft §3.4.4)"); + "@dataset column \"" + colName + "\": dotted column paths are not supported in v1 (draft §3.4.4)"); } columns.add(colName); advance(); @@ -207,19 +216,19 @@ private Ast.TableDirective parseTableDirective() { } if (current.kind() == TokenKind.RPAREN) break; throw new PxfException(current.pos(), - "expected ',' or ')' in @table column list, got " + current.kind()); + "expected ',' or ')' in @dataset column list, got " + current.kind()); } advance(); // consume ) // Zero or more rows. - List rows = new ArrayList<>(); + List rows = new ArrayList<>(); while (current.kind() == TokenKind.LPAREN) { - rows.add(parseTableRow(columns.size())); + rows.add(parseDatasetRow(columns.size())); } - return new Ast.TableDirective(pp, type, List.copyOf(columns), List.copyOf(rows), leading); + return new Ast.DatasetDirective(pp, type, List.copyOf(columns), List.copyOf(rows), leading); } - private Ast.TableRow parseTableRow(int expected) { + private Ast.DatasetRow parseDatasetRow(int expected) { Position pp = current.pos(); advance(); // consume ( @@ -231,20 +240,94 @@ private Ast.TableRow parseTableRow(int expected) { } if (current.kind() != TokenKind.RPAREN) { throw new PxfException(current.pos(), - "expected ',' or ')' in @table row, got " + current.kind()); + "expected ',' or ')' in @dataset row, got " + current.kind()); } advance(); if (cells.size() != expected) { throw new PxfException(pp, - "@table row has " + cells.size() + " cells, expected " + expected + " (column count)"); + "@dataset row has " + cells.size() + " cells, expected " + expected + " (column count)"); } // Row cells legitimately contain null for empty cells. List.copyOf // rejects nulls, so wrap an ArrayList copy via Collections instead. - return new Ast.TableRow(pp, java.util.Collections.unmodifiableList(new ArrayList<>(cells))); + return new Ast.DatasetRow(pp, java.util.Collections.unmodifiableList(new ArrayList<>(cells))); + } + + /** + * Parse a {@code @proto } directive (draft §3.4.5). AT_PROTO is + * current on entry. Four body shapes are lexically distinguished: + * anonymous ({@code { ... }}), named ({@code { ... }}), + * source-form ({@code """..."""}) and descriptor ({@code b"..."}). + * + *

For the brace-bounded shapes the body is captured as raw bytes + * between {@code {} and the matching {@code }} (both exclusive); the + * contents are protobuf source and are NOT decoded as PXF entries. + */ + private Ast.ProtoDirective parseProtoDirective() { + List leading = flushComments(); + Position pp = current.pos(); + advance(); // consume @proto + + switch (current.kind()) { + case LBRACE -> { + byte[] body = captureBraceBody("@proto (anonymous form)"); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.ANONYMOUS, "", body, leading); + } + case IDENT -> { + String typeName = current.value(); + advance(); + if (current.kind() != TokenKind.LBRACE) { + throw new PxfException(current.pos(), + "expected '{' after @proto " + typeName + ", got " + current.kind()); + } + byte[] body = captureBraceBody("@proto " + typeName); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.NAMED, typeName, body, leading); + } + case STRING -> { + byte[] body = current.value().getBytes(java.nio.charset.StandardCharsets.UTF_8); + advance(); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.SOURCE, "", body, leading); + } + case BYTES -> { + String raw = current.value(); + byte[] decoded; + try { + decoded = java.util.Base64.getDecoder().decode(raw); + } catch (IllegalArgumentException e1) { + try { + decoded = java.util.Base64.getUrlDecoder().decode(raw); + } catch (IllegalArgumentException e2) { + throw new PxfException(current.pos(), + "@proto descriptor body: invalid base64: " + e1.getMessage()); + } + } + advance(); + return new Ast.ProtoDirective(pp, Ast.ProtoShape.DESCRIPTOR, "", decoded, leading); + } + default -> throw new PxfException(current.pos(), + "expected '{', dotted identifier, triple-quoted string, or b\"...\" after @proto, got " + current.kind()); + } + } + + /** + * Slice the raw bytes between {@code {} and the matching {@code }} + * (both exclusive) without decoding the contents as PXF entries. + * LBRACE is current on entry. Repositions the lexer past the closing + * {@code }} and primes the parser to that next token. + */ + private byte[] captureBraceBody(String label) { + int open = current.pos().offset(); + int close = lex.findMatchingBrace(open); + if (close < 0) { + throw new PxfException(current.pos(), label + ": unmatched '{'"); + } + byte[] body = lex.sliceBytes(open + 1, close); + lex.repositionTo(close + 1); + advance(); // prime current token past `}` + return body; } /** - * Consume one cell of a @table row. Returns {@code null} for an empty + * Consume one cell of a @dataset row. Returns {@code null} for an empty * cell (no value between commas, or at row start/end). Rejects list * and block values per v1 cell-grammar (draft §3.4.4). */ @@ -254,9 +337,9 @@ private Ast.Value parseRowCell() { return null; } case LBRACKET -> throw new PxfException(current.pos(), - "@table cells cannot contain list values in v1 (draft §3.4.4)"); + "@dataset cells cannot contain list values in v1 (draft §3.4.4)"); case LBRACE -> throw new PxfException(current.pos(), - "@table cells cannot contain block values in v1 (draft §3.4.4)"); + "@dataset cells cannot contain block values in v1 (draft §3.4.4)"); default -> { /* fall through */ } } return parseValue(); diff --git a/pxf/src/main/java/org/protowire/pxf/Result.java b/pxf/src/main/java/org/protowire/pxf/Result.java index 9e68a1e..3c2d0fa 100644 --- a/pxf/src/main/java/org/protowire/pxf/Result.java +++ b/pxf/src/main/java/org/protowire/pxf/Result.java @@ -11,14 +11,15 @@ * Field-presence metadata + side-channel directives produced by * {@link Pxf#unmarshalFull}. Tracks set, null, and absent fields by * dotted path (e.g. {@code "name"}, {@code "nested.value"}), plus the - * {@code @} directives and {@code @table} directives the decoder + * {@code @} directives and {@code @dataset} directives the decoder * saw at the document root. */ public final class Result { private final Set nullFields = new HashSet<>(); private final Set presentFields = new HashSet<>(); private final List directives = new ArrayList<>(); - private final List tables = new ArrayList<>(); + private final List datasets = new ArrayList<>(); + private final List protos = new ArrayList<>(); Result() {} @@ -32,7 +33,8 @@ void markPresent(String path) { } void addDirective(Ast.Directive d) { directives.add(d); } - void addTable(Ast.TableDirective t) { tables.add(t); } + void addDataset(Ast.DatasetDirective t) { datasets.add(t); } + void addProto(Ast.ProtoDirective p) { protos.add(p); } public boolean isNull(String path) { return nullFields.contains(path); } public boolean isAbsent(String path) { return !presentFields.contains(path); } @@ -43,7 +45,7 @@ void markPresent(String path) { /** * Returns the {@code @ *(prefix) [{ ... }]} directives the decoder * saw at the document root, in source order. Excludes the {@code @type} - * and {@code @table} directives (which have their own accessors). + * and {@code @dataset} directives (which have their own accessors). * Callers typically iterate and hand each {@link Ast.Directive#body()} * back to {@link Pxf#unmarshalFull} against a chosen message — * chameleon's {@code @header} consumption pattern. @@ -51,14 +53,23 @@ void markPresent(String path) { public List directives() { return List.copyOf(directives); } /** - * Returns the {@code @table} directives the decoder saw at the + * Returns the {@code @dataset} directives the decoder saw at the * document root, in source order. Per draft §3.4.4 a document with * any table MUST NOT carry {@code @type} or top-level body entries * — the parser and decoder enforce that. Each - * {@link Ast.TableDirective#rows()} entry is one cell-tuple; cells + * {@link Ast.DatasetDirective#rows()} entry is one cell-tuple; cells * may be {@code null} (empty cell ⇒ field absent), {@link Ast.NullVal} * (explicit null ⇒ field cleared), or any other {@link Ast.Value} * (field set). */ - public List tables() { return List.copyOf(tables); } + public List datasets() { return List.copyOf(datasets); } + + /** + * Returns the {@code @proto} directives the decoder saw at the + * document root, in source order (draft §3.4.5). Each directive + * carries one of four body shapes (anonymous, named, source, + * descriptor); callers inspect {@link Ast.ProtoDirective#shape()} + * and decode {@link Ast.ProtoDirective#body()} accordingly. + */ + public List protos() { return List.copyOf(protos); } } diff --git a/pxf/src/main/java/org/protowire/pxf/SchemaValidator.java b/pxf/src/main/java/org/protowire/pxf/SchemaValidator.java index b026f1a..69f3bcf 100644 --- a/pxf/src/main/java/org/protowire/pxf/SchemaValidator.java +++ b/pxf/src/main/java/org/protowire/pxf/SchemaValidator.java @@ -40,9 +40,30 @@ private SchemaValidator() {} * Reserved-name set per draft §3.13. Case-sensitive — {@code NULL}, * {@code True}, {@code FALSE} lex as ordinary identifiers and are * accepted. + * + *

The full reserved-directive-name set (13 names; draft §3.4.6) is + * separate from this schema-element constraint and lives in + * {@link #FUTURE_RESERVED_DIRECTIVES} — schema-element name + * collisions with directive names are not problematic because field + * names and directive names live in disjoint lexical contexts. */ static final Set RESERVED_NAMES = Set.of("null", "true", "false"); + /** + * Directive names the spec reserves for future allocation (draft + * §3.4.6). v1 decoders MUST reject these as unknown reserved + * directives so applications cannot squat the names before the spec + * allocates semantics to them. + * + *

The names with their own production ({@code type}, + * {@code dataset}, {@code proto}) don't appear here — they're + * handled directly by the lexer. The spec-registered {@code entry} + * doesn't appear either — it's a valid named-directive with + * documented shape (draft §3.4.3). + */ + public static final Set FUTURE_RESERVED_DIRECTIVES = Set.of( + "table", "datasource", "view", "procedure", "function", "permissions"); + /** Which kind of schema element a {@link Violation} refers to. */ public enum Kind { FIELD("message field"), diff --git a/pxf/src/main/java/org/protowire/pxf/TokenKind.java b/pxf/src/main/java/org/protowire/pxf/TokenKind.java index 465d8b6..e375bef 100644 --- a/pxf/src/main/java/org/protowire/pxf/TokenKind.java +++ b/pxf/src/main/java/org/protowire/pxf/TokenKind.java @@ -22,15 +22,16 @@ public enum TokenKind { RBRACE("}"), LBRACKET("["), RBRACKET("]"), - LPAREN("("), // @table column list and row tuples + LPAREN("("), // @dataset column list and row tuples RPAREN(")"), EQUALS("="), COLON(":"), COMMA(","), AT_TYPE("@type"), - AT_TABLE("@table"), // bulk-row directive (draft §3.4.4) - AT_DIRECTIVE("@directive"); // @ for name != "type"/"table" + AT_DATASET("@dataset"), // row-oriented bulk-data directive (draft §3.4.4) + AT_PROTO("@proto"), // embedded protobuf schema (draft §3.4.5) + AT_DIRECTIVE("@directive"); // @ for any non-reserved name private final String display; diff --git a/pxf/src/test/java/org/protowire/pxf/TableParserTest.java b/pxf/src/test/java/org/protowire/pxf/DatasetParserTest.java similarity index 75% rename from pxf/src/test/java/org/protowire/pxf/TableParserTest.java rename to pxf/src/test/java/org/protowire/pxf/DatasetParserTest.java index 1623d14..d0b94da 100644 --- a/pxf/src/test/java/org/protowire/pxf/TableParserTest.java +++ b/pxf/src/test/java/org/protowire/pxf/DatasetParserTest.java @@ -13,20 +13,20 @@ import static org.junit.jupiter.api.Assertions.assertTrue; /** - * Parser-side tests for the {@code @table} directive (draft §3.4.4). + * Parser-side tests for the {@code @dataset} directive (draft §3.4.4). * Mirrors the Go-port tests in encoding/pxf/table_test.go. */ -class TableParserTest { +class DatasetParserTest { @Test void basicTable() { var doc = Parser.parse(""" - @table trades.v1.Trade (symbol, price, qty) + @dataset trades.v1.Trade (symbol, price, qty) ("AAPL", 192.34, 100) ("MSFT", 410.10, 50) """); - assertEquals(1, doc.tables().size()); - var t = doc.tables().get(0); + assertEquals(1, doc.datasets().size()); + var t = doc.datasets().get(0); assertEquals("trades.v1.Trade", t.type()); assertEquals(List.of("symbol", "price", "qty"), t.columns()); assertEquals(2, t.rows().size()); @@ -35,10 +35,10 @@ void basicTable() { @Test void emptyTable() { - var doc = Parser.parse("@table trades.v1.Trade (symbol, price)"); - assertEquals(1, doc.tables().size()); - assertEquals(List.of("symbol", "price"), doc.tables().get(0).columns()); - assertEquals(0, doc.tables().get(0).rows().size()); + var doc = Parser.parse("@dataset trades.v1.Trade (symbol, price)"); + assertEquals(1, doc.datasets().size()); + assertEquals(List.of("symbol", "price"), doc.datasets().get(0).columns()); + assertEquals(0, doc.datasets().get(0).rows().size()); } // --- Three cell states --- @@ -46,12 +46,12 @@ void emptyTable() { @Test void threeCellStates() { var doc = Parser.parse(""" - @table trades.v1.Trade (symbol, price, qty) + @dataset trades.v1.Trade (symbol, price, qty) ("AAPL", 192.34, 100) ("MSFT", null, 50) ("GOOG", , ) """); - var rows = doc.tables().get(0).rows(); + var rows = doc.datasets().get(0).rows(); // Row 1: all present. for (Ast.Value c : rows.get(0).cells()) assertNotNull(c); // Row 2: middle is *NullVal. @@ -67,10 +67,10 @@ void threeCellStates() { @Test void leadingEmptyCell() { var doc = Parser.parse(""" - @table T (a, b) + @dataset T (a, b) ( , 192.34) """); - var row = doc.tables().get(0).rows().get(0); + var row = doc.datasets().get(0).rows().get(0); assertNull(row.cells().get(0)); assertNotNull(row.cells().get(1)); } @@ -78,10 +78,10 @@ void leadingEmptyCell() { @Test void allEmptyRow() { var doc = Parser.parse(""" - @table T (a, b, c) + @dataset T (a, b, c) (,,) """); - var row = doc.tables().get(0).rows().get(0); + var row = doc.datasets().get(0).rows().get(0); assertEquals(3, row.cells().size()); for (Ast.Value c : row.cells()) assertNull(c); } @@ -91,7 +91,7 @@ void allEmptyRow() { @Test void arityShortRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" - @table T (symbol, price, qty) + @dataset T (symbol, price, qty) ("AAPL", 1.0) """)); assertTrue(ex.getMessage().contains("2 cells, expected 3")); @@ -100,7 +100,7 @@ void arityShortRejected() { @Test void arityLongRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" - @table T (symbol, price) + @dataset T (symbol, price) ("AAPL", 1.0, 100) """)); assertTrue(ex.getMessage().contains("3 cells, expected 2")); @@ -111,7 +111,7 @@ void arityLongRejected() { @Test void listCellRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" - @table T (symbol, tags) + @dataset T (symbol, tags) ("AAPL", ["tech", "blue-chip"]) """)); assertTrue(ex.getMessage().contains("list values")); @@ -120,7 +120,7 @@ void listCellRejected() { @Test void blockCellRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" - @table T (symbol, meta) + @dataset T (symbol, meta) ("AAPL", { exchange = "NASDAQ" }) """)); assertTrue(ex.getMessage().contains("block values")); @@ -131,7 +131,7 @@ void blockCellRejected() { @Test void dottedColumnRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" - @table T (symbol, meta.exchange) + @dataset T (symbol, meta.exchange) ("AAPL", "NASDAQ") """)); assertTrue(ex.getMessage().contains("dotted column paths")); @@ -139,7 +139,7 @@ void dottedColumnRejected() { @Test void emptyColumnListRejected() { - var ex = assertThrows(PxfException.class, () -> Parser.parse("@table T ()")); + var ex = assertThrows(PxfException.class, () -> Parser.parse("@dataset T ()")); assertTrue(ex.getMessage().contains("at least one field name")); } @@ -149,7 +149,7 @@ void emptyColumnListRejected() { void atTypeWithAtTableRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" @type trades.v1.Wrapper - @table trades.v1.Trade (symbol) + @dataset trades.v1.Trade (symbol) ("AAPL") """)); assertTrue(ex.getMessage().contains("@type")); @@ -158,7 +158,7 @@ void atTypeWithAtTableRejected() { @Test void atTableWithBodyEntriesRejected() { var ex = assertThrows(PxfException.class, () -> Parser.parse(""" - @table trades.v1.Trade (symbol) + @dataset trades.v1.Trade (symbol) ("AAPL") extra = "stray" """)); @@ -170,17 +170,17 @@ void atTableWithBodyEntriesRejected() { @Test void multipleTablesOrderPreserved() { var doc = Parser.parse(""" - @table events.v1.Created (id) + @dataset events.v1.Created (id) ("e-1") ("e-2") - @table events.v1.Deleted (id) + @dataset events.v1.Deleted (id) ("e-9") """); - assertEquals(2, doc.tables().size()); - assertEquals("events.v1.Created", doc.tables().get(0).type()); - assertEquals("events.v1.Deleted", doc.tables().get(1).type()); - assertEquals(2, doc.tables().get(0).rows().size()); - assertEquals(1, doc.tables().get(1).rows().size()); + assertEquals(2, doc.datasets().size()); + assertEquals("events.v1.Created", doc.datasets().get(0).type()); + assertEquals("events.v1.Deleted", doc.datasets().get(1).type()); + assertEquals(2, doc.datasets().get(0).rows().size()); + assertEquals(1, doc.datasets().get(1).rows().size()); } // --- Cell variants (smoke check that timestamp, duration, bytes, etc. land correctly) --- @@ -188,10 +188,10 @@ void multipleTablesOrderPreserved() { @Test void cellVariants() { var doc = Parser.parse(""" - @table t.T (s, i, f, b, by, ts, d, e, n) + @dataset t.T (s, i, f, b, by, ts, d, e, n) ("hi", 42, 3.14, true, b"aGVsbG8=", 2026-05-12T10:00:00Z, 1h30m, ENUM_VAL, null) """); - var cells = doc.tables().get(0).rows().get(0).cells(); + var cells = doc.datasets().get(0).rows().get(0).cells(); assertTrue(cells.get(0) instanceof Ast.StringVal); assertTrue(cells.get(1) instanceof Ast.IntVal); assertTrue(cells.get(2) instanceof Ast.FloatVal); diff --git a/pxf/src/test/java/org/protowire/pxf/TableReaderTest.java b/pxf/src/test/java/org/protowire/pxf/DatasetReaderTest.java similarity index 82% rename from pxf/src/test/java/org/protowire/pxf/TableReaderTest.java rename to pxf/src/test/java/org/protowire/pxf/DatasetReaderTest.java index 7479358..acdd9ce 100644 --- a/pxf/src/test/java/org/protowire/pxf/TableReaderTest.java +++ b/pxf/src/test/java/org/protowire/pxf/DatasetReaderTest.java @@ -24,10 +24,10 @@ import static org.junit.jupiter.api.Assertions.fail; /** - * Streaming {@code @table} consumption tests. Mirrors protowire-go's + * Streaming {@code @dataset} consumption tests. Mirrors protowire-go's * table_stream_test.go. */ -class TableReaderTest { +class DatasetReaderTest { private static InputStream s(String in) { return new ByteArrayInputStream(in.getBytes(StandardCharsets.UTF_8)); @@ -37,8 +37,8 @@ private static InputStream s(String in) { @Test void basicStreaming() throws IOException { - var tr = new TableReader(s(""" - @table trades.v1.Trade (symbol, price, qty) + var tr = new DatasetReader(s(""" + @dataset trades.v1.Trade (symbol, price, qty) ("AAPL", 192.34, 100) ("MSFT", 410.10, 50) ("GOOG", 142.00, 25)""")); @@ -46,8 +46,8 @@ void basicStreaming() throws IOException { assertEquals(List.of("symbol", "price", "qty"), tr.columns()); assertEquals(List.of(), tr.directives()); - List rows = new ArrayList<>(); - for (Ast.TableRow r; (r = tr.next()) != null; ) rows.add(r); + List rows = new ArrayList<>(); + for (Ast.DatasetRow r; (r = tr.next()) != null; ) rows.add(r); assertEquals(3, rows.size()); var sv = (Ast.StringVal) rows.get(0).cells().get(0); assertEquals("AAPL", sv.value()); @@ -55,7 +55,7 @@ void basicStreaming() throws IOException { @Test void emptyTableReturnsNullImmediately() throws IOException { - var tr = new TableReader(s("@table trades.v1.Trade (symbol, price)")); + var tr = new DatasetReader(s("@dataset trades.v1.Trade (symbol, price)")); assertNull(tr.next()); assertNull(tr.next()); // sticky } @@ -64,8 +64,8 @@ void emptyTableReturnsNullImmediately() throws IOException { @Test void cellStates() throws IOException { - var tr = new TableReader(s(""" - @table t.T (a, b, c) + var tr = new DatasetReader(s(""" + @dataset t.T (a, b, c) ("x", 1, true) (null, , 3) (, "y", null)""")); @@ -93,9 +93,9 @@ void cellStates() throws IOException { @Test void sideChannelDirectivesBeforeHeader() throws IOException { - var tr = new TableReader(s(""" + var tr = new DatasetReader(s(""" @header meta.v1.H { generated_at = 2026-05-12T10:00:00Z } - @table trades.v1.Trade (symbol) + @dataset trades.v1.Trade (symbol) ("AAPL") ("MSFT")""")); @@ -113,32 +113,32 @@ void sideChannelDirectivesBeforeHeader() throws IOException { @Test void rejectsAtTypeWithAtTable() { var ex = assertThrows(PxfException.class, () -> - new TableReader(s(""" + new DatasetReader(s(""" @type some.Other - @table trades.v1.Trade (symbol) + @dataset trades.v1.Trade (symbol) ("AAPL")"""))); assertTrue(ex.getMessage().contains("@type")); } - // --- No @table --- + // --- No @dataset --- @Test void noTableInStream() { assertThrows(NoSuchElementException.class, () -> - new TableReader(s("string_field = \"x\""))); + new DatasetReader(s("string_field = \"x\""))); } @Test void emptyInput() { - assertThrows(NoSuchElementException.class, () -> new TableReader(s(""))); + assertThrows(NoSuchElementException.class, () -> new DatasetReader(s(""))); } // --- Errors mid-stream are sticky --- @Test void errorsAreSticky() throws IOException { - var tr = new TableReader(s(""" - @table T (a, b, c) + var tr = new DatasetReader(s(""" + @dataset T (a, b, c) ("x", 1, 2) ("y", 1)""")); // arity mismatch assertNotNull(tr.next()); @@ -150,8 +150,8 @@ void errorsAreSticky() throws IOException { @Test void rejectsListCellMidStream() throws IOException { - var tr = new TableReader(s(""" - @table T (a, b) + var tr = new DatasetReader(s(""" + @dataset T (a, b) ("ok", 1) ("bad", [1, 2])""")); assertNotNull(tr.next()); @@ -163,8 +163,8 @@ void rejectsListCellMidStream() throws IOException { @Test void stringWithParens() throws IOException { - var tr = new TableReader(s(""" - @table T (note, n) + var tr = new DatasetReader(s(""" + @dataset T (note, n) ("contains (paren) inside", 1) ("normal", 2)""")); var r1 = tr.next(); @@ -175,8 +175,8 @@ void stringWithParens() throws IOException { @Test void blockCommentBetweenRows() throws IOException { - var tr = new TableReader(s(""" - @table T (a) + var tr = new DatasetReader(s(""" + @dataset T (a) ("x") /* this comment ) has ( parens spanning multiple lines */ @@ -188,8 +188,8 @@ void blockCommentBetweenRows() throws IOException { @Test void lineCommentBetweenRows() throws IOException { - var tr = new TableReader(s(""" - @table T (a) + var tr = new DatasetReader(s(""" + @dataset T (a) ("x") # this is a comment, with ( a paren ) inside ("y") @@ -223,8 +223,8 @@ public int read(byte[] b, int off, int len) { @Test void handlesByteAtATimeReader() throws IOException { - var tr = new TableReader(new ChunkedStream(""" - @table T (a, b, c) + var tr = new DatasetReader(new ChunkedStream(""" + @dataset T (a, b, c) ("hello", 42, true) ("world", 99, false) ("end", 0, null)""".getBytes(StandardCharsets.UTF_8))); @@ -237,18 +237,18 @@ void handlesByteAtATimeReader() throws IOException { @Test void multipleTablesViaTail() throws IOException { - var tr1 = new TableReader(s(""" - @table events.v1.Created (id, ts) + var tr1 = new DatasetReader(s(""" + @dataset events.v1.Created (id, ts) ("e-1", 2026-05-12T10:00:00Z) ("e-2", 2026-05-12T10:00:01Z) - @table events.v1.Deleted (id, ts) + @dataset events.v1.Deleted (id, ts) ("e-9", 2026-05-12T10:00:02Z)""")); assertEquals("events.v1.Created", tr1.type()); int c1 = 0; while (tr1.next() != null) c1++; assertEquals(2, c1); - var tr2 = new TableReader(tr1.tail()); + var tr2 = new DatasetReader(tr1.tail()); assertEquals("events.v1.Deleted", tr2.type()); int c2 = 0; while (tr2.next() != null) c2++; @@ -260,20 +260,20 @@ void multipleTablesViaTail() throws IOException { @Test void equivalentToMaterializingPath() throws IOException { String in = """ - @table t.T (a, b, c) + @dataset t.T (a, b, c) ("alpha", 1, true) ("beta", null, false) (, , ) ("gamma", 99, true)"""; // Materializing. var doc = Parser.parse(in); - assertEquals(1, doc.tables().size()); - var mat = doc.tables().get(0).rows(); + assertEquals(1, doc.datasets().size()); + var mat = doc.datasets().get(0).rows(); // Streaming. - var tr = new TableReader(s(in)); - List stream = new ArrayList<>(); - for (Ast.TableRow r; (r = tr.next()) != null; ) stream.add(r); + var tr = new DatasetReader(s(in)); + List stream = new ArrayList<>(); + for (Ast.DatasetRow r; (r = tr.next()) != null; ) stream.add(r); assertEquals(mat.size(), stream.size()); for (int i = 0; i < mat.size(); i++) { @@ -297,7 +297,7 @@ void rejectsOversizedHeader() { // 70 KiB identifier > 64 KiB cap. String long_ = "a".repeat(70 * 1024); var ex = assertThrows(PxfException.class, () -> - new TableReader(s("@table " + long_ + ".T (col)\n(1)"))); + new DatasetReader(s("@dataset " + long_ + ".T (col)\n(1)"))); assertTrue(ex.getMessage().contains("header exceeds")); } @@ -305,8 +305,8 @@ void rejectsOversizedHeader() { @Test void scanHappyPath() throws IOException { - var tr = new TableReader(s(""" - @table test.v1.AllTypes (string_field, int32_field, bool_field, enum_field) + var tr = new DatasetReader(s(""" + @dataset test.v1.AllTypes (string_field, int32_field, bool_field, enum_field) ("alpha", 1, true, STATUS_ACTIVE) ("beta", 2, false, STATUS_INACTIVE) ("gamma", 3, true, STATUS_UNSPECIFIED)""")); @@ -322,8 +322,8 @@ void scanHappyPath() throws IOException { @Test void scanReturnsFalseOnEof() throws IOException { - var tr = new TableReader(s(""" - @table test.v1.AllTypes (string_field) + var tr = new DatasetReader(s(""" + @dataset test.v1.AllTypes (string_field) ("x")""")); var b1 = DynamicMessage.newBuilder(AllTypes.getDescriptor()); assertTrue(tr.scan(b1)); @@ -333,8 +333,8 @@ void scanReturnsFalseOnEof() throws IOException { @Test void scanEmptyCellLeavesFieldUnset() throws IOException { - var tr = new TableReader(s(""" - @table test.v1.AllTypes (string_field, int32_field) + var tr = new DatasetReader(s(""" + @dataset test.v1.AllTypes (string_field, int32_field) ("present", 7) (, 99) ("set", )""")); @@ -359,8 +359,8 @@ void scanEmptyCellLeavesFieldUnset() throws IOException { @Test void scanNullOnWrapperClears() throws IOException { - var tr = new TableReader(s(""" - @table test.v1.AllTypes (string_field, nullable_int) + var tr = new DatasetReader(s(""" + @dataset test.v1.AllTypes (string_field, nullable_int) ("with-value", 42) ("nullified", null)""")); var nullableIntFd = AllTypes.getDescriptor().findFieldByName("nullable_int"); @@ -376,8 +376,8 @@ void scanNullOnWrapperClears() throws IOException { @Test void scanWellKnownTimestamp() throws IOException { - var tr = new TableReader(s(""" - @table test.v1.AllTypes (string_field, ts_field) + var tr = new DatasetReader(s(""" + @dataset test.v1.AllTypes (string_field, ts_field) ("first", 2026-05-12T10:30:00Z)""")); var tsFd = AllTypes.getDescriptor().findFieldByName("ts_field"); @@ -394,10 +394,10 @@ void scanWellKnownTimestamp() throws IOException { @Test void bindRowAgainstMaterializingPath() { var doc = Parser.parse(""" - @table test.v1.AllTypes (string_field, int32_field) + @dataset test.v1.AllTypes (string_field, int32_field) ("alpha", 1) ("beta", 2)"""); - var tbl = doc.tables().get(0); + var tbl = doc.datasets().get(0); for (int i = 0; i < tbl.rows().size(); i++) { var b = DynamicMessage.newBuilder(AllTypes.getDescriptor()); BindRow.bindRow(b, tbl.columns(), tbl.rows().get(i)); @@ -407,7 +407,7 @@ void bindRowAgainstMaterializingPath() { @Test void bindRowArityMismatch() { var b = DynamicMessage.newBuilder(AllTypes.getDescriptor()); - var row = new Ast.TableRow(Position.UNKNOWN, + var row = new Ast.DatasetRow(Position.UNKNOWN, java.util.Collections.singletonList(new Ast.StringVal(Position.UNKNOWN, "x"))); var ex = assertThrows(IllegalArgumentException.class, () -> BindRow.bindRow(b, List.of("a", "b"), row)); @@ -417,10 +417,10 @@ void bindRowArityMismatch() { @Test void bindRowRejectsNonLeafCell() { // Hand-construct a row with a ListVal cell — the parser rejects - // these earlier, but a caller that builds a TableRow manually + // these earlier, but a caller that builds a DatasetRow manually // bypasses that check. var b = DynamicMessage.newBuilder(AllTypes.getDescriptor()); - var row = new Ast.TableRow(Position.UNKNOWN, + var row = new Ast.DatasetRow(Position.UNKNOWN, java.util.Collections.singletonList( new Ast.ListVal(Position.UNKNOWN, List.of(new Ast.StringVal(Position.UNKNOWN, "x"))))); diff --git a/pxf/src/test/java/org/protowire/pxf/ProtoDirectiveTest.java b/pxf/src/test/java/org/protowire/pxf/ProtoDirectiveTest.java new file mode 100644 index 0000000..9a2a504 --- /dev/null +++ b/pxf/src/test/java/org/protowire/pxf/ProtoDirectiveTest.java @@ -0,0 +1,182 @@ +// SPDX-License-Identifier: MIT +// Copyright (c) 2026 TrendVidia, LLC. +package org.protowire.pxf; + +import org.junit.jupiter.api.Test; + +import java.nio.charset.StandardCharsets; +import java.util.Base64; + +import static org.junit.jupiter.api.Assertions.assertArrayEquals; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +/** + * Parser tests for the {@code @proto} directive (draft §3.4.5). + * Four body shapes lexically distinguished: anonymous, named, source, + * descriptor. Plus reserved-directive-name rejection (draft §3.4.6). + */ +class ProtoDirectiveTest { + + @Test + void anonymous() { + var doc = Parser.parse(""" + @proto { + string symbol = 1; + double price = 2; + } + """); + assertEquals(1, doc.protos().size()); + var pd = doc.protos().get(0); + assertEquals(Ast.ProtoShape.ANONYMOUS, pd.shape()); + assertEquals("", pd.typeName()); + String body = new String(pd.body(), StandardCharsets.UTF_8); + assertTrue(body.contains("string symbol = 1;")); + assertTrue(body.contains("double price = 2;")); + } + + @Test + void named() { + var doc = Parser.parse(""" + @proto trades.v1.Trade { + string symbol = 1; + double price = 2; + } + """); + var pd = doc.protos().get(0); + assertEquals(Ast.ProtoShape.NAMED, pd.shape()); + assertEquals("trades.v1.Trade", pd.typeName()); + assertTrue(new String(pd.body(), StandardCharsets.UTF_8).contains("string symbol = 1;")); + } + + @Test + void source() { + var doc = Parser.parse(""" + @proto \""" + syntax = "proto3"; + package trades.v1; + message Trade { string symbol = 1; } + \""" + """); + var pd = doc.protos().get(0); + assertEquals(Ast.ProtoShape.SOURCE, pd.shape()); + String body = new String(pd.body(), StandardCharsets.UTF_8); + assertTrue(body.contains("syntax = \"proto3\"")); + assertTrue(body.contains("message Trade")); + } + + @Test + void descriptor() { + byte[] raw = {0x0a, 0x05, 'h', 'e', 'l', 'l', 'o'}; + String b64 = Base64.getEncoder().encodeToString(raw); + var doc = Parser.parse("@proto b\"" + b64 + "\""); + var pd = doc.protos().get(0); + assertEquals(Ast.ProtoShape.DESCRIPTOR, pd.shape()); + assertArrayEquals(raw, pd.body()); + } + + @Test + void multipleProtos() { + var doc = Parser.parse(""" + @proto trades.v1.Trade { string symbol = 1; } + @proto orders.v1.Order { string id = 1; } + """); + assertEquals(2, doc.protos().size()); + assertEquals("trades.v1.Trade", doc.protos().get(0).typeName()); + assertEquals("orders.v1.Order", doc.protos().get(1).typeName()); + } + + @Test + void anonymousFollowedByDataset() { + // One-shot binding: anonymous @proto types the next directive that + // requires a typed binding — here, an untyped @dataset. + var doc = Parser.parse(""" + @proto { + string symbol = 1; + double price = 2; + } + @dataset (symbol, price) + ("AAPL", 192.34) + ("MSFT", 410.10) + """); + assertEquals(1, doc.protos().size()); + assertEquals(Ast.ProtoShape.ANONYMOUS, doc.protos().get(0).shape()); + var ds = doc.datasets().get(0); + assertEquals("", ds.type()); + assertEquals(2, ds.rows().size()); + } + + @Test + void braceNestingInBody() { + // captureBraceBody must find the matching `}` across nested + // `message Side { ... }` braces in the proto body. + var doc = Parser.parse(""" + @proto { + message Side { + string label = 1; + } + Side side = 1; + } + """); + String body = new String(doc.protos().get(0).body(), StandardCharsets.UTF_8); + assertTrue(body.contains("message Side")); + assertTrue(body.contains("Side side = 1;")); + } + + @Test + void rejectsBadShape() { + var ex = assertThrows(PxfException.class, () -> Parser.parse("@proto 42")); + assertTrue(ex.getMessage().contains("after @proto")); + } + + @Test + void rejectsNamedMissingBrace() { + var ex = assertThrows(PxfException.class, () -> Parser.parse("@proto trades.v1.Trade 42")); + assertTrue(ex.getMessage().contains("'{'")); + } + + @Test + void rejectsAnonymousUnmatchedBrace() { + var ex = assertThrows(PxfException.class, + () -> Parser.parse("@proto { string symbol = 1;")); + assertTrue(ex.getMessage().contains("unmatched")); + } + + @Test + void coexistsWithType() { + var doc = Parser.parse(""" + @type some.pkg.Foo + @proto some.pkg.Foo { + string name = 1; + } + """); + assertEquals("some.pkg.Foo", doc.typeUrl()); + assertEquals(1, doc.protos().size()); + assertEquals(Ast.ProtoShape.NAMED, doc.protos().get(0).shape()); + } + + // --- Reserved directive names (draft §3.4.6) --- + + @Test + void rejectsReservedDirectiveNames() { + for (String name : new String[]{"table", "datasource", "view", "procedure", "function", "permissions"}) { + var ex = assertThrows(PxfException.class, + () -> Parser.parse("@" + name + " { x = 1 }"), + "@" + name + " should be rejected"); + assertTrue(ex.getMessage().contains("spec-reserved"), + "@" + name + " error should mention spec-reserved"); + } + } + + // --- ProtoShape enum coverage --- + + @Test + void protoShapeValues() { + assertEquals(4, Ast.ProtoShape.values().length); + assertEquals(Ast.ProtoShape.ANONYMOUS, Ast.ProtoShape.valueOf("ANONYMOUS")); + assertEquals(Ast.ProtoShape.NAMED, Ast.ProtoShape.valueOf("NAMED")); + assertEquals(Ast.ProtoShape.SOURCE, Ast.ProtoShape.valueOf("SOURCE")); + assertEquals(Ast.ProtoShape.DESCRIPTOR, Ast.ProtoShape.valueOf("DESCRIPTOR")); + } +} diff --git a/pxf/src/test/java/org/protowire/pxf/ResultAccessorsTest.java b/pxf/src/test/java/org/protowire/pxf/ResultAccessorsTest.java index 0378ea7..132873a 100644 --- a/pxf/src/test/java/org/protowire/pxf/ResultAccessorsTest.java +++ b/pxf/src/test/java/org/protowire/pxf/ResultAccessorsTest.java @@ -91,31 +91,31 @@ void zeroPrefixesAnonymousDirective() { void recordsTablesInOrder() { var b = DynamicMessage.newBuilder(AllTypes.getDescriptor()); var result = UnmarshalOptions.defaults().unmarshalFull(""" - @table events.v1.Created (id) + @dataset events.v1.Created (id) ("e-1") ("e-2") - @table events.v1.Deleted (id) + @dataset events.v1.Deleted (id) ("e-9") """.getBytes(), b); - assertEquals(2, result.tables().size()); - assertEquals("events.v1.Created", result.tables().get(0).type()); - assertEquals(2, result.tables().get(0).rows().size()); - assertEquals("events.v1.Deleted", result.tables().get(1).type()); - assertEquals(1, result.tables().get(1).rows().size()); + assertEquals(2, result.datasets().size()); + assertEquals("events.v1.Created", result.datasets().get(0).type()); + assertEquals(2, result.datasets().get(0).rows().size()); + assertEquals("events.v1.Deleted", result.datasets().get(1).type()); + assertEquals(1, result.datasets().get(1).rows().size()); } @Test void tableCellStatesRoundTrip() { var b = DynamicMessage.newBuilder(AllTypes.getDescriptor()); var result = UnmarshalOptions.defaults().unmarshalFull(""" - @table t.T (a, b, c) + @dataset t.T (a, b, c) ("x", 1, true) (null, , 3) (, "y", null) """.getBytes(), b); - var rows = result.tables().get(0).rows(); + var rows = result.datasets().get(0).rows(); // Row 1: all present, distinct types. assertTrue(rows.get(0).cells().get(0) instanceof Ast.StringVal); assertTrue(rows.get(0).cells().get(1) instanceof Ast.IntVal); @@ -136,10 +136,10 @@ void allCellVariants() { // PXF leaf value type that v1 cell-grammar permits. var b = DynamicMessage.newBuilder(AllTypes.getDescriptor()); var result = UnmarshalOptions.defaults().unmarshalFull(""" - @table t.T (s, i, f, b, by, ts, d, e, n) + @dataset t.T (s, i, f, b, by, ts, d, e, n) ("hi", 42, 3.14, true, b"aGVsbG8=", 2026-05-12T10:00:00Z, 1h30m, ENUM_VAL, null) """.getBytes(), b); - var cells = result.tables().get(0).rows().get(0).cells(); + var cells = result.datasets().get(0).rows().get(0).cells(); assertTrue(cells.get(0) instanceof Ast.StringVal); assertTrue(cells.get(1) instanceof Ast.IntVal); assertTrue(cells.get(2) instanceof Ast.FloatVal); @@ -157,6 +157,6 @@ void directivesAndTablesEmptyForBodyOnlyDocs() { var result = UnmarshalOptions.defaults().unmarshalFull( "string_field = \"x\"".getBytes(), b); assertEquals(java.util.List.of(), result.directives()); - assertEquals(java.util.List.of(), result.tables()); + assertEquals(java.util.List.of(), result.datasets()); } }