Skip to content

Add native OCR to screenshot editor#1799

Open
richiemcilroy wants to merge 1 commit into
mainfrom
codex/screenshot-editor-ocr
Open

Add native OCR to screenshot editor#1799
richiemcilroy wants to merge 1 commit into
mainfrom
codex/screenshot-editor-ocr

Conversation

@richiemcilroy
Copy link
Copy Markdown
Member

@richiemcilroy richiemcilroy commented May 11, 2026

Adds native OCR to the screenshot editor using macOS Vision and Windows Media OCR.
Selected regions are cropped from the source image and processed off the UI path before copying recognized text.
Validated with Rust, Biome, diff checks, and a Windows OCR API compile check.

Greptile Summary

This PR wires native OCR into the screenshot editor, adding a drag-to-select "Copy Text" tool that crops the chosen region from the source image and runs recognition off the UI thread — macOS Vision on macOS and Windows Media OCR on Windows.

  • Rust backend: A new recognize_screenshot_text Tauri command handles region clamping, RGBA→BGRA conversion with full bounds checking, and platform-specific dispatch via spawn_blocking. The macOS Vision path sets up a CVPixelBuffer with a custom release callback and correctly converts bottom-left-origin normalized rects to pixel coordinates. The Windows path initializes the WinRT runtime per-call with a Drop guard, then constructs a SoftwareBitmap and invokes OcrEngine.
  • Frontend: A new OcrSelectionOverlay component renders an SVG drag-selection over the image, maps CSS coordinates back to source-image pixels (accounting for the active crop), and calls the backend command on pointer-up. The AnnotationLayer is guarded to skip annotation drawing while the OCR tool is active.

Confidence Score: 4/5

Safe to merge; both findings are minor quality issues that don't affect correctness for typical fully-opaque screenshots.

The Windows OCR path declares straight-alpha pixel data as premultiplied, which is technically wrong but harmless for fully-opaque screenshots. The OcrSelectionOverlay invokes the backend with raw invoke rather than the generated type-safe commands wrapper, duplicating the result type locally and leaving the call unprotected against future signature changes.

apps/desktop/src-tauri/src/screenshot_editor.rs (Windows alpha mode) and apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx (raw invoke vs generated commands).

Important Files Changed

Filename Overview
apps/desktop/src-tauri/src/screenshot_editor.rs Adds ~460 lines implementing platform-specific OCR (macOS Vision, Windows Media OCR) with safe pixel buffer management, coordinate conversion, and region clamping; Windows path incorrectly declares straight-alpha pixel data as premultiplied.
apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx New SolidJS component for drag-to-select OCR; correctly maps CSS coordinates back to source image pixels, but bypasses the generated type-safe commands wrapper and redefines ScreenshotOcrResult locally.
apps/desktop/src-tauri/src/lib.rs Registers the new recognize_screenshot_text Tauri command in the invoke handler; change is minimal and correct.
apps/desktop/src/routes/screenshot-editor/context.tsx Exports new ScreenshotEditorTool union type adding "ocr" to the tool set; straightforward and correct.
apps/desktop/src/routes/screenshot-editor/AnnotationLayer.tsx Guards the annotation drawing path against the OCR tool; correct early-return prevents unwanted annotation creation when OCR mode is active.
apps/desktop/src/routes/screenshot-editor/AnnotationTools.tsx Adds OCR tool button and updates the ToolButton prop type to ScreenshotEditorTool; no issues.
apps/desktop/src/routes/screenshot-editor/Preview.tsx Mounts OcrSelectionOverlay with correct props including bounds, image rect, original image size, and crop; no issues.
Cargo.toml Adds "vn" (Vision framework) feature to the cidre dependency for macOS OCR; correct.
apps/desktop/src-tauri/Cargo.toml Adds Windows Media OCR and WinRT foundation features to the desktop crate's Windows dependency block; correct and complete.

Comments Outside Diff (1)

  1. apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx, line 777-780 (link)

    P2 Raw invoke used instead of the generated type-safe wrapper. recognize_screenshot_text is registered with #[specta::specta] and added to the invoke handler, so the binding generator should produce a typed commands.recognizeScreenshotText(...). Using raw invoke means ScreenshotOcrResult is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports commands from ~/utils/tauri for writeClipboardString; the bindings need to be regenerated to include the new command and then used here.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx
    Line: 777-780
    
    Comment:
    Raw `invoke` used instead of the generated type-safe wrapper. `recognize_screenshot_text` is registered with `#[specta::specta]` and added to the invoke handler, so the binding generator should produce a typed `commands.recognizeScreenshotText(...)`. Using raw `invoke` means `ScreenshotOcrResult` is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports `commands` from `~/utils/tauri` for `writeClipboardString`; the bindings need to be regenerated to include the new command and then used here.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
apps/desktop/src-tauri/src/screenshot_editor.rs:1099-1105
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but `BitmapAlphaMode::Premultiplied` tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. `BitmapAlphaMode::Straight` (or `BitmapAlphaMode::Ignore` since OCR doesn't need transparency) is the correct choice.

```suggestion
    let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
        &buffer,
        BitmapPixelFormat::Bgra8,
        width,
        height,
        BitmapAlphaMode::Straight,
    )
```

### Issue 2 of 2
apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx:777-780
Raw `invoke` used instead of the generated type-safe wrapper. `recognize_screenshot_text` is registered with `#[specta::specta]` and added to the invoke handler, so the binding generator should produce a typed `commands.recognizeScreenshotText(...)`. Using raw `invoke` means `ScreenshotOcrResult` is duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already imports `commands` from `~/utils/tauri` for `writeClipboardString`; the bindings need to be regenerated to include the new command and then used here.

Reviews (1): Last reviewed commit: "feat: add native OCR to screenshot edito..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

@richiemcilroy richiemcilroy marked this pull request as ready for review May 11, 2026 19:38
@brin-security-scanner brin-security-scanner Bot added contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis. labels May 11, 2026
Comment on lines +1099 to +1105
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Premultiplied,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but BitmapAlphaMode::Premultiplied tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. BitmapAlphaMode::Straight (or BitmapAlphaMode::Ignore since OCR doesn't need transparency) is the correct choice.

Suggested change
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Premultiplied,
)
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Straight,
)
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/screenshot_editor.rs
Line: 1099-1105

Comment:
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but `BitmapAlphaMode::Premultiplied` tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. `BitmapAlphaMode::Straight` (or `BitmapAlphaMode::Ignore` since OCR doesn't need transparency) is the correct choice.

```suggestion
    let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
        &buffer,
        BitmapPixelFormat::Bgra8,
        width,
        height,
        BitmapAlphaMode::Straight,
    )
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor:verified Contributor passed trust analysis. pr:verified PR passed security analysis.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant