Skip to content

pxf: TableReader streaming + bind_row per-row binding#7

Merged
trendvidia merged 2 commits into
mainfrom
table-reader-and-bind-row
May 12, 2026
Merged

pxf: TableReader streaming + bind_row per-row binding#7
trendvidia merged 2 commits into
mainfrom
table-reader-and-bind-row

Conversation

@trendvidia

Copy link
Copy Markdown
Owner

Summary

Second of three PRs in the Python v0.72–v0.75 catch-up. Wraps `protowire-cpp`'s `TableReader` (cpp #13) so consumers can stream `@table` rows from an in-memory buffer instead of materializing the whole row sequence up front.

Python surface (in `src/protowire/pxf.py`):

```python
reader = pxf.TableReader.from_bytes(data)
print(reader.type, reader.columns)
for row in reader:
msg = TradeMsg()
pxf.bind_row(msg, reader.columns, row)
handle(msg)

Optional: chain a second @table from the same input

tr2 = pxf.TableReader.from_bytes(reader.tail())
```

  • `from_bytes(data)` accepts `bytes` or `str`. (A file-like / chunked-IO bridge is a possible follow-up.)
  • `type` / `columns` / `directives` properties expose the parsed header.
  • Standard iterator protocol + a `next_or_none()` non-raising variant.
  • `tail()` returns the unconsumed buffer for multi-`@table` chaining.
  • `scan(msg)` is `Next` + `bind_row` in one call.

`pxf.bind_row(msg, columns, row, *, skip_validate=True)` is exported as a free function for callers iterating `Result.tables[i].rows` from the materializing path. Strategy is format-and-reparse — matches the cpp port's `BindRow`. Cells render as a synthetic PXF body and run through `unmarshal`, reusing every branch of the existing decoder (WKT timestamps / durations, wrapper-nullability, enum-by-name, `pxf.required` / `pxf.default`, oneof).

FFI (`src/_protowire/module.cc`):

  • `PyTableReader` class wraps `protowire::pxf::TableReader`. The wrapped `istringstream` is held alongside the reader so its lifetime is bound to the Python object (cpp `TableReader` takes a non-owning `std::istream*`).
  • Cells marshal through the same `CellToPyTuple` helper introduced in PR pxf: cpp pin to v0.75.0 + Result.directives/tables + validate_descriptor #6.
  • `iter` returns self; `next` raises `StopIteration` at EOF.

Test plan

  • 22 new tests in `tests/test_pxf_table_reader.py` covering header parsing (happy path, str input, no-@table, empty input, leading directives, 64 KiB cap), iteration (ordered rows, zero-rows-stops, next_or_none EOF, three-state cells, sticky arity, parens-in-strings, comments between rows), `tail` chaining, `bind_row` + `scan` (binding by name, absent-leaves-default, null-clears-wrapper, bytes cell round-trip, mismatched-columns / unknown-column errors, string escape)
  • All 84 tests pass locally (62 on main → +22 new)
  • CI green (Linux / macOS / Windows × Python 3.10–3.13, codeql)

Second PR of the Python v0.72-v0.75 catch-up. Wraps protowire-cpp's
TableReader (PR cpp#13) so consumers can stream @table rows from an
in-memory buffer instead of materializing the whole row sequence up
front.

API surface in src/protowire/pxf.py:

  pxf.TableReader.from_bytes(data) → TableReader
    .type      — row message type, e.g. "trades.v1.Trade"
    .columns   — tuple of column field names
    .directives — side-channel @<name> directives before the header
    .done      — True once the row sequence is exhausted
    .next_or_none() — next row, or None at EOF
    __iter__/__next__ — standard Python iterator protocol
    .tail()    — bytes for chaining a second TableReader
    .scan(msg) — Next + bind_row in one call

  pxf.bind_row(msg, columns, row, *, skip_validate=True) → None
    Free function for callers iterating Result.tables[i].rows from
    the materializing path. Strategy: format-and-reparse — render
    cells as a synthetic PXF body and run through unmarshal. This
    matches protowire-cpp's BindRow and reuses every branch of the
    existing decoder (WKT, wrappers, enums, oneof, pxf.required /
    pxf.default).

FFI (src/_protowire/module.cc):
  - PyTableReader class wraps protowire::pxf::TableReader. The
    wrapped istringstream is held alongside the reader so its
    lifetime is bound to the Python object (the cpp TableReader
    takes a non-owning std::istream*).
  - Cells marshal through the same CellToPyTuple helper used by
    PxfUnmarshalFull's Result.tables.
  - __iter__ returns self; __next__ raises StopIteration at EOF.

Tests (22 new in tests/test_pxf_table_reader.py, 84 total):
  - Header parsing: happy path, str input, no-@table error, empty
    input, leading directives, 64 KiB header cap
  - Iteration: ordered rows, zero-rows-stops, next_or_none EOF,
    three-state cells, sticky arity error, parens-in-strings,
    comments between rows
  - Tail chaining to a second @table
  - bind_row + scan: column-by-name binding, absent-leaves-default,
    null-clears-wrapper (StringValue), bytes cell round-trip,
    mismatched-columns error, unknown-column error, string escape
Comment thread src/protowire/pxf.py Fixed
@trendvidia trendvidia merged commit 2fa3207 into main May 12, 2026
17 checks passed
@trendvidia trendvidia deleted the table-reader-and-bind-row branch May 12, 2026 10:58
@trendvidia trendvidia mentioned this pull request May 12, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants