#845 Fix in place processing to retain bytes from start and end offsets or the original file when the processing strategy is 'InPlace' by yruslan · Pull Request #849 · AbsaOSS/cobrix

yruslan · 2026-05-14T07:06:39Z

Summary by CodeRabbit

Bug Fixes
- In-place processing now preserves and emits unmodified leading and trailing bytes when file start/end offsets are used, ensuring correct output composition.
Improvements
- Stream handling updated to track skipped regions so boundary bytes are read and written reliably across single-node and distributed workflows.
Tests
- Updated tests to validate preserved boundary bytes when start/end offsets are applied.

…s in-place using 'CobolProcessor'.

…s in-place using 'SparkCobolProcessor'.

coderabbitai · 2026-05-14T07:06:53Z

Warning

Rate limit exceeded

@yruslan has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 48 minutes and 46 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e03f56a2-d22c-433c-8f63-5bb187339fe6

📥 Commits

Reviewing files that changed from the base of the PR and between 8234cee and 78b6660.

📒 Files selected for processing (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala

Walkthrough

The PR extends in-place COBOL file processing to preserve unmodified file start- and end-offset boundary bytes. FSStream gains boundary tracking via a skipped flag and methods to retrieve skipped bytes; CobolProcessorInPlace conditionally emits those bytes around stream processing; SparkCobolProcessor threads a retainStartAndEndOffsets flag through distributed file processing and performs parallel boundary writes.

Changes

Preserve file start/end offset bytes in in-place COBOL processing

Layer / File(s)	Summary
FSStream boundary byte tracking and retrieval `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala`	FSStream introduces a mutable `skipped` flag to record whether start-offset bytes have been consumed, and implements `getSkippedStartBytes` and `getSkippedEndBytes` to conditionally retrieve unmodified boundary bytes. Start-offset skipping in `next` occurs only once, guarded by `!skipped`, and `skipFully` sets the flag to prevent redundancy.
CobolProcessorInPlace conditional boundary output for FSStream `cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/CobolProcessorInPlace.scala`	CobolProcessorInPlace pattern-matches on the input stream, and when it is an FSStream, writes the skipped-start bytes before calling `StreamProcessor.processStreamInPlace` and the skipped-end bytes after, preserving boundary regions in the output.
SparkCobolProcessor distributed boundary preservation flag and wiring `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessor.scala`, `spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala`	SparkCobolProcessor derives a `retainStartAndEndOffsets` flag from `CobolProcessingStrategy`, threads it through the RDD-based file processing pipeline, and conditionally performs extra boundary-byte writes in parallel tasks before and after main processing when the flag is enabled.
Test assertions for boundary byte preservation `cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/CobolProcessorBuilderSuite.scala`, `spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessorSuite.scala`	Test assertions are updated to expect the complete output including unmodified boundary bytes, and the Spark reader test configuration is extended to supply `file_start_offset` and `file_end_offset` options for round-trip verification.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

AbsaOSS/cobrix#844: Both PRs implement honoring file_start_offset/file_end_offset for in-place processing by extending/modifying FSStream and wiring those offsets into the in-place processing flow (including boundary byte handling).
AbsaOSS/cobrix#800: Modifies CobolProcessorInPlace.scala's in-place stream handling; this PR builds on that foundation to add boundary-byte preservation for FSStream.

Poem

A rabbit hops through offset streams,
I tuck the skipped bytes neat between,
Start and end returned once more,
Files stitched back as they were before,
Hooray for preserved boundary dreams! 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective: fixing in-place processing to retain bytes from start and end offsets, which is directly reflected in changes across multiple files.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/845-in-place-processing-retain-start-end-offsets

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-14T07:09:30Z

JaCoCo code coverage report - 'cobol-parser'

Overall Project	90.98% `-0.1%`	🍏
Files changed	69.63%	🍏

File	Coverage
CobolProcessorInPlace.scala	100%	🍏
FSStream.scala	73.06% `-16.84%`	🍏

github-actions · 2026-05-14T07:09:31Z

JaCoCo code coverage report - 'spark-cobol'

Overall Project	83.4% `-0.04%`	🍏
Files changed	90.7%	🍏

File	Coverage
FileStreamer.scala	99.07%	🍏
SparkCobolProcessor.scala	96.36% `-0.56%`	🍏

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala (1)

53-66: ⚡ Quick win

Clarify the semantics of getSkippedEndBytes or add defensive checks.

The method closes the stream (line 63) when not at the effective end, which could cause silent failures if called prematurely. The else-branch semantics are unclear: should this method be called only when byteIndex >= effectiveSize, or is the close a defensive cleanup?

Consider either:

Documenting that this method must only be called after reading all effective bytes, or
Throwing an exception instead of silently closing when called prematurely.

♻️ Option: Add explicit precondition check

 def getSkippedEndBytes: Array[Byte] = {
-  if (byteIndex >= effectiveSize && !isClosed) {
+  if (byteIndex < effectiveSize) {
+    throw new IllegalStateException(s"Cannot retrieve end bytes before reaching effective end (at $byteIndex of $effectiveSize)")
+  }
+  if (!isClosed) {
     val b = new Array[Byte](fileEndOffset.toInt)
     val actual = bytesStream.read(b, 0, fileEndOffset.toInt)
     if (actual <= 0) {
       Array.empty[Byte]
     } else {
       b.take(actual)
     }
   } else {
-    close()
     Array.empty[Byte]
   }
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala`
around lines 53 - 66, The getSkippedEndBytes method currently closes the stream
and returns empty bytes when called before byteIndex >= effectiveSize, which can
hide incorrect usage; update getSkippedEndBytes to enforce a clear precondition:
check that byteIndex >= effectiveSize and if not, either throw an
IllegalStateException with a descriptive message (e.g., referencing
getSkippedEndBytes, byteIndex and effectiveSize) or document the method contract
at its declaration so callers know it must be called only after reading all
effective bytes; ensure the code references the existing symbols
(getSkippedEndBytes, byteIndex, effectiveSize, close, bytesStream,
fileEndOffset) so the precondition is implemented in place of silently closing
the stream.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala`:
- Around line 38-51: Validate fileStartOffset/fileEndOffset before casting to
Int in getSkippedStartBytes and getSkippedEndBytes: check that each offset is
within 0 to Int.MaxValue and reject (e.g., throw IllegalArgumentException with a
clear message) if out of bounds to avoid silent overflow and huge array
allocation; only perform the toInt cast after this validation and then proceed
with array allocation and reads as before. Ensure the same guard (non-negative
and <= Int.MaxValue) is applied to both getSkippedStartBytes and
getSkippedEndBytes and include the offset value in the exception message for
debugging.

---

Nitpick comments:
In
`@cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala`:
- Around line 53-66: The getSkippedEndBytes method currently closes the stream
and returns empty bytes when called before byteIndex >= effectiveSize, which can
hide incorrect usage; update getSkippedEndBytes to enforce a clear precondition:
check that byteIndex >= effectiveSize and if not, either throw an
IllegalStateException with a descriptive message (e.g., referencing
getSkippedEndBytes, byteIndex and effectiveSize) or document the method contract
at its declaration so callers know it must be called only after reading all
effective bytes; ensure the code references the existing symbols
(getSkippedEndBytes, byteIndex, effectiveSize, close, bytesStream,
fileEndOffset) so the precondition is implemented in place of silently closing
the stream.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c89094d1-0b52-4fa4-a511-9611bbbcd248

📥 Commits

Reviewing files that changed from the base of the PR and between e7a09b5 and 0b65967.

📒 Files selected for processing (6)

cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/processor/impl/CobolProcessorInPlace.scala
cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala
cobol-parser/src/test/scala/za/co/absa/cobrix/cobol/processor/CobolProcessorBuilderSuite.scala
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessor.scala
spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala
spark-cobol/src/test/scala/za/co/absa/cobrix/spark/cobol/SparkCobolProcessorSuite.scala

💤 Files with no reviewable changes (1)

spark-cobol/src/main/scala/za/co/absa/cobrix/spark/cobol/source/streaming/FileStreamer.scala

yruslan · 2026-05-14T09:08:39Z

@coderabbitai review

coderabbitai · 2026-05-14T09:08:45Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

yruslan added 2 commits May 13, 2026 09:56

#845 Retain start and end file offsets when converting mainframe file…

30c101d

…s in-place using 'CobolProcessor'.

#845 Retain start and end file offsets when converting mainframe file…

0b65967

…s in-place using 'SparkCobolProcessor'.

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread cobol-parser/src/main/scala/za/co/absa/cobrix/cobol/reader/stream/FSStream.scala

#845 Fix PR suggestions.

78b6660

yruslan force-pushed the feature/845-in-place-processing-retain-start-end-offsets branch from 8234cee to 78b6660 Compare May 14, 2026 09:19

yruslan merged commit d307a35 into master May 14, 2026
7 checks passed

yruslan deleted the feature/845-in-place-processing-retain-start-end-offsets branch May 14, 2026 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#845 Fix in place processing to retain bytes from start and end offsets or the original file when the processing strategy is 'InPlace'#849

#845 Fix in place processing to retain bytes from start and end offsets or the original file when the processing strategy is 'InPlace'#849
yruslan merged 3 commits into
masterfrom
feature/845-in-place-processing-retain-start-end-offsets

yruslan commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

yruslan commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yruslan commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Possibly Related PRs

Poem

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'cobol-parser'

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JaCoCo code coverage report - 'spark-cobol'

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yruslan commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yruslan commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading