Add Marengo embedding predictor for zero-shot moment retrieval by mohit-twelvelabs · Pull Request #76 · line/lighthouse

mohit-twelvelabs · 2026-06-25T21:01:20Z

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

This PR adds an optional, zero-shot moment retrieval predictor backed by TwelveLabs Marengo embeddings.

What it adds

lighthouse/marengo_predictor.py — a MarengoPredictor that segments and embeds a video server-side, embeds the text query into the same 512-dim Marengo space, and ranks clips by cosine similarity. Its output matches the existing predictors: {"pred_relevant_windows": [[start, end, score], ...]}.
Re-exported from lighthouse.models alongside the existing predictors.
api_example/marengo_demo.py mirroring the existing demo.py / amr_demo.py.
A README section, a marengo optional extra in setup.py, a mypy.ini entry, and tests.

Why it helps Lighthouse

A training-free baseline that needs no local checkpoint, feature files, or GPU.
Not bound by the 150s benchmark limit, since embedding happens server-side — handy for longer videos.

Opt-in and non-breaking

The twelvelabs SDK is an optional extra (pip install 'lighthouse[marengo]') and is only imported when MarengoPredictor is instantiated, so existing users pay no import or dependency cost. No defaults or existing predictors change.

How it was tested

No-network unit tests in tests/test_marengo.py cover the cosine-similarity ranking and the empty-input path.
A live test (skipped unless TWELVELABS_API_KEY is set) asserts a Marengo text embedding is 512-dim — verified passing locally against the API.
ruff and mypy pass on the new files.

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Per the contributing note, happy to open a tracking issue first if you'd prefer to discuss before reviewing — opening the PR so the proposed change is concrete.

awkrail · 2026-06-29T02:08:47Z

Thank you for your contribution. We will look into the code today.

awkrail · 2026-06-29T13:38:26Z

+Install the extra and set your API key (a free key with a generous free tier is available at
+[twelvelabs.io](https://twelvelabs.io)):
+```
+pip install 'lighthouse[marengo]'


Does this work well? As shown in README, lighthouse is installed from github repo directly. So, correctly, pip install 'lighthouse[marengo] @ git+https://github.com/line/lighthouse.git' works?

In addition, we would like to think about the naming. Now, Twelvelabs have Marengo as a video embedding model, but in the future, Twelvelabs may release other video embedding models, right? If so, I would like to use lighthouse[twelvelabs] rather than Marengo itself.

awkrail · 2026-06-29T13:45:07Z

@mohit-twelvelabs I added some comments to your code. I will check the logit part tomorrow by running the code...

Addresses review feedback on line#76: - Rename the optional dependency extra from [marengo] to [twelvelabs] so it is vendor-scoped and future-proof for other TwelveLabs embedding models. - Fix the README install command to the git form that actually works, since lighthouse is installed from GitHub (not PyPI): pip install 'lighthouse[twelvelabs] @ git+https://github.com/line/lighthouse.git'

mohit-twelvelabs · 2026-06-29T14:37:09Z

Thanks for the careful review, @awkrail! Both great points — addressed in f3ee766:

Install command — you're right, since lighthouse installs from GitHub (not PyPI), pip install 'lighthouse[marengo]' wouldn't resolve. Fixed the README to the git form you suggested:
```
pip install 'lighthouse[twelvelabs] @ git+https://github.com/line/lighthouse.git'
```
Naming — fully agree on future-proofing. I renamed the optional extra from [marengo] to [twelvelabs] (it installs the twelvelabs SDK, which will cover future embedding models too), so the install is now lighthouse[twelvelabs].

I left the predictor class as MarengoPredictor since it's specific to the Marengo embedding model, but happy to rename it (e.g. TwelveLabsPredictor with a model_name arg) if you'd prefer the vendor-scoped naming there as well — just say the word.

No rush on the logit part — thanks for offering to run it, and let me know if anything comes up when you do.

Add Marengo embedding predictor for zero-shot moment retrieval

30d4510

awkrail reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Marengo embedding predictor for zero-shot moment retrieval#76

Add Marengo embedding predictor for zero-shot moment retrieval#76
mohit-twelvelabs wants to merge 2 commits into
line:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

mohit-twelvelabs commented Jun 25, 2026

Uh oh!

awkrail commented Jun 29, 2026

Uh oh!

awkrail Jun 29, 2026

Uh oh!

awkrail Jun 29, 2026

Uh oh!

awkrail commented Jun 29, 2026

Uh oh!

mohit-twelvelabs commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mohit-twelvelabs commented Jun 25, 2026

What it adds

Why it helps Lighthouse

Opt-in and non-breaking

How it was tested

Uh oh!

awkrail commented Jun 29, 2026

Uh oh!

awkrail Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

awkrail Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

awkrail commented Jun 29, 2026

Uh oh!

mohit-twelvelabs commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants