feat(core): support null merge results in key-value readers#269
Open
zjw1111 wants to merge 2 commits intoalibaba:mainfrom
Open
feat(core): support null merge results in key-value readers#269zjw1111 wants to merge 2 commits intoalibaba:mainfrom
zjw1111 wants to merge 2 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the key-value reader iterator contract to support merge functions that may produce std::nullopt (e.g., delete-only groups when delete messages are ignored). It moves merge/prefetch work into MergedKeyValueRecordReader::Iterator::HasNext() and changes KeyValueRecordReader::Iterator::HasNext() to return Result<bool> so prefetch/merge errors can be surfaced before Next().
Changes:
- Change
KeyValueRecordReader::Iterator::HasNext()frombooltoResult<bool>and update call sites to handle errors. - Rework
MergedKeyValueRecordReaderiterator to prepare/skip mergednulloptresults duringHasNext()(instead of failing inNext()). - Update merge-tree components/tests and add coverage for delete-message scenarios and
nulloptmerge results.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/testing/utils/read_result_collector.h | Updates collection loop to handle Result<bool> HasNext(). |
| src/paimon/core/mergetree/write_buffer_test.cpp | Adapts iteration loops to the new HasNext() contract. |
| src/paimon/core/mergetree/spill_reader.h | Updates SpillReader::Iterator::HasNext() signature to Result<bool>. |
| src/paimon/core/mergetree/spill_reader.cpp | Implements Result<bool> HasNext() for spill iteration. |
| src/paimon/core/mergetree/spill_reader_writer_test.cpp | Updates tests to assert/unwrap Result<bool> from HasNext(). |
| src/paimon/core/mergetree/sort_buffer_test.cpp | Updates iteration loops to unwrap Result<bool> from HasNext(). |
| src/paimon/core/mergetree/lookup_levels.cpp | Propagates Result<bool> HasNext() handling when scanning KV data files. |
| src/paimon/core/mergetree/compact/sort_merge_reader_with_min_heap.h | Updates min-heap element advancement to unwrap Result<bool> HasNext(). |
| src/paimon/core/mergetree/compact/sort_merge_reader_with_min_heap.cpp | Updates reader initialization logic to handle Result<bool> HasNext(). |
| src/paimon/core/mergetree/compact/sort_merge_reader_test.cpp | Adds delete-message coverage and threads ignore-delete option through helpers. |
| src/paimon/core/mergetree/compact/loser_tree.h | Updates loser-tree advancement to unwrap Result<bool> HasNext(). |
| src/paimon/core/io/merged_key_value_record_reader.h | Refactors iterator state to support HasNext()-driven merge/prefetch and nullopt skipping. |
| src/paimon/core/io/merged_key_value_record_reader.cpp | Implements new merge/prefetch flow in HasNext() and supports nullopt merge results. |
| src/paimon/core/io/merged_key_value_record_reader_test.cpp | Adds test ensuring HasNext() skips merged nullopt safely. |
| src/paimon/core/io/key_value_record_reader.h | Changes iterator HasNext() contract to Result<bool>. |
| src/paimon/core/io/key_value_in_memory_record_reader.h | Updates in-memory reader iterator HasNext() signature. |
| src/paimon/core/io/key_value_data_file_record_reader.h | Updates data-file reader iterator HasNext() signature. |
| src/paimon/core/io/key_value_data_file_record_reader.cpp | Updates HasNext() return type; removes debug assert in Next(). |
| src/paimon/core/io/key_value_data_file_record_reader_test.cpp | Updates test iteration to unwrap Result<bool> HasNext(). |
Comments suppressed due to low confidence (1)
src/paimon/core/io/key_value_data_file_record_reader.cpp:80
KeyValueDataFileRecordReader::Iterator::Next()no longer verifies that there is a next element (the previousassert(HasNext())was removed). IfNext()is called whencursor_is at/after the end, this will read Arrow arrays out of bounds; additionally, for non-full selection bitmaps, skippingHasNext()meanscursor_may not advance to the next selected row. Consider explicitly callingHasNext()(and returning an Invalid status when it’s false) or adding a bounds/selection check insideNext()to avoid UB and enforce the iterator contract in release builds too.
Result<KeyValue> KeyValueDataFileRecordReader::Iterator::Next() {
// key is only used in merge sort; key context does not hold parent struct array
auto key = std::make_unique<ColumnarRowRef>(reader_->key_ctx_, cursor_);
// value is used in merge sort and projection (maybe async and multi-thread), so value context
// holds parent struct array to ensure data remains valid
auto value = std::make_unique<ColumnarRowRef>(reader_->value_ctx_, cursor_);
PAIMON_ASSIGN_OR_RAISE(const RowKind* row_kind,
RowKind::FromByteValue(reader_->row_kind_array_->Value(cursor_)));
int64_t sequence_number = reader_->sequence_number_array_->Value(cursor_);
cursor_++;
return KeyValue(row_kind, sequence_number, reader_->level_, std::move(key), std::move(value));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lxy-9602
reviewed
May 10, 2026
| Status MergedKeyValueRecordReader::Iterator::LoadNextRawKeyValue() const { | ||
| if (next_raw_key_value_.has_value() || eof_) { | ||
| return Status::OK(); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: N/A
Move calculate merged key-value operation into
MergedKeyValueRecordReader::Iterator::HasNext()so merge functions that returnnulloptcan be skipped safely beforeNext()is called.This fixes the iterator contract for delete-only groups when delete messages are ignored, and propagates the new
Result<bool>HasNext()contract through key-value readers and related merge-tree call sites.Tests
TEST_F(MergedKeyValueRecordReaderTest, TestSkipMergedNulloptResultInHasNext)
TEST_F(SortMergeReaderTest, TestSortMergeWithDeleteMessages)
API and Format
Changes the internal
KeyValueRecordReader::Iterator::HasNext()contract to returnResult<bool>so iterator prefetch errors can be returned directly.No public API, storage format and protocol change.
Documentation
No user-facing documentation change.
Generative AI tooling
Generated-by: Codex (GPT-5.5)