Skip to content

Fix video preprocessing bug in OpenCV loader#522

Open
OrangeSodahub wants to merge 1 commit intofacebookresearch:mainfrom
OrangeSodahub:main
Open

Fix video preprocessing bug in OpenCV loader#522
OrangeSodahub wants to merge 1 commit intofacebookresearch:mainfrom
OrangeSodahub:main

Conversation

@OrangeSodahub
Copy link
Copy Markdown

Summary

This PR fixes a bug in the OpenCV video loader used for video-file inputs. Specifically, the function load_video_frames_from_video_file_using_cv2 in

sam3/sam3/model/io_utils.py

Lines 332 to 345 in 44ef224

# Convert to tensor
frames_np = np.stack(frames, axis=0).astype(np.float32) # (T, H, W, C)
video_tensor = torch.from_numpy(frames_np).permute(0, 3, 1, 2) # (T, C, H, W)
img_mean = torch.tensor(img_mean, dtype=torch.float16).view(1, 3, 1, 1)
img_std = torch.tensor(img_std, dtype=torch.float16).view(1, 3, 1, 1)
if not offload_video_to_cpu:
video_tensor = video_tensor.cuda()
img_mean = img_mean.cuda()
img_std = img_std.cuda()
# normalize by mean and std
video_tensor -= img_mean
video_tensor /= img_std
return video_tensor, original_height, original_width

where decoded video frames were normalized without first being scaled from [0, 255] to [0, 1], even though the normalization parameters assume [0, 1] inputs. This leads to incorrectly scaled model inputs during video inference.
While in image folder loadings, / 255.0 is correctly placed:
img_np = img_np / 255.0

And in torchcodec, too:
frame_resized /= 255

Changes

  • divide decoded OpenCV video frames by 255.0 before mean/std normalization
  • convert video tensor to torch.float16 to align with other loading approaches

Validation

  • Without fixing, the images tensor after init_state as below is in [-1, 509] when loading mp4 video through opencv which is incorrect, and I have also seen the unusual results
    images, orig_height, orig_width = load_resource_as_video_frames(
    resource_path=resource_path,
    image_size=self.image_size,
    offload_video_to_cpu=offload_video_to_cpu,
    img_mean=self.image_mean,
    img_std=self.image_std,
    async_loading_frames=async_loading_frames,
    video_loader_type=video_loader_type,
    )
  • After fixing, the images tensor is always in [-1, 1], and the segmentation results are good

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant