Add native OCR to screenshot editor#1799
Open
richiemcilroy wants to merge 1 commit into
Open
Conversation
Comment on lines
+1099
to
+1105
| let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( | ||
| &buffer, | ||
| BitmapPixelFormat::Bgra8, | ||
| width, | ||
| height, | ||
| BitmapAlphaMode::Premultiplied, | ||
| ) |
Contributor
There was a problem hiding this comment.
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but
BitmapAlphaMode::Premultiplied tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. BitmapAlphaMode::Straight (or BitmapAlphaMode::Ignore since OCR doesn't need transparency) is the correct choice.
Suggested change
| let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( | |
| &buffer, | |
| BitmapPixelFormat::Bgra8, | |
| width, | |
| height, | |
| BitmapAlphaMode::Premultiplied, | |
| ) | |
| let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( | |
| &buffer, | |
| BitmapPixelFormat::Bgra8, | |
| width, | |
| height, | |
| BitmapAlphaMode::Straight, | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/screenshot_editor.rs
Line: 1099-1105
Comment:
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but `BitmapAlphaMode::Premultiplied` tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. `BitmapAlphaMode::Straight` (or `BitmapAlphaMode::Ignore` since OCR doesn't need transparency) is the correct choice.
```suggestion
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Straight,
)
```
How can I resolve this? If you propose a fix, please make it concise.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds native OCR to the screenshot editor using macOS Vision and Windows Media OCR.
Selected regions are cropped from the source image and processed off the UI path before copying recognized text.
Validated with Rust, Biome, diff checks, and a Windows OCR API compile check.
Greptile Summary
This PR wires native OCR into the screenshot editor, adding a drag-to-select "Copy Text" tool that crops the chosen region from the source image and runs recognition off the UI thread — macOS Vision on macOS and Windows Media OCR on Windows.
recognize_screenshot_textTauri command handles region clamping, RGBA→BGRA conversion with full bounds checking, and platform-specific dispatch viaspawn_blocking. The macOS Vision path sets up aCVPixelBufferwith a custom release callback and correctly converts bottom-left-origin normalized rects to pixel coordinates. The Windows path initializes the WinRT runtime per-call with aDropguard, then constructs aSoftwareBitmapand invokesOcrEngine.OcrSelectionOverlaycomponent renders an SVG drag-selection over the image, maps CSS coordinates back to source-image pixels (accounting for the active crop), and calls the backend command on pointer-up. TheAnnotationLayeris guarded to skip annotation drawing while the OCR tool is active.Confidence Score: 4/5
Safe to merge; both findings are minor quality issues that don't affect correctness for typical fully-opaque screenshots.
The Windows OCR path declares straight-alpha pixel data as premultiplied, which is technically wrong but harmless for fully-opaque screenshots. The OcrSelectionOverlay invokes the backend with raw invoke rather than the generated type-safe commands wrapper, duplicating the result type locally and leaving the call unprotected against future signature changes.
apps/desktop/src-tauri/src/screenshot_editor.rs (Windows alpha mode) and apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx (raw invoke vs generated commands).
Important Files Changed
commandswrapper and redefinesScreenshotOcrResultlocally.recognize_screenshot_textTauri command in the invoke handler; change is minimal and correct.ScreenshotEditorToolunion type adding"ocr"to the tool set; straightforward and correct.ToolButtonprop type toScreenshotEditorTool; no issues.OcrSelectionOverlaywith correct props including bounds, image rect, original image size, and crop; no issues."vn"(Vision framework) feature to the cidre dependency for macOS OCR; correct.Comments Outside Diff (1)
apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx, line 777-780 (link)invokeused instead of the generated type-safe wrapper.recognize_screenshot_textis registered with#[specta::specta]and added to the invoke handler, so the binding generator should produce a typedcommands.recognizeScreenshotText(...). Using rawinvokemeansScreenshotOcrResultis duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already importscommandsfrom~/utils/tauriforwriteClipboardString; the bindings need to be regenerated to include the new command and then used here.Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (1): Last reviewed commit: "feat: add native OCR to screenshot edito..." | Re-trigger Greptile