Skip to content

Add Marengo embedding predictor for zero-shot moment retrieval#76

Open
mohit-twelvelabs wants to merge 2 commits into
line:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration
Open

Add Marengo embedding predictor for zero-shot moment retrieval#76
mohit-twelvelabs wants to merge 2 commits into
line:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

Conversation

@mohit-twelvelabs

Copy link
Copy Markdown

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

This PR adds an optional, zero-shot moment retrieval predictor backed by TwelveLabs Marengo embeddings.

What it adds

  • lighthouse/marengo_predictor.py — a MarengoPredictor that segments and embeds a video server-side, embeds the text query into the same 512-dim Marengo space, and ranks clips by cosine similarity. Its output matches the existing predictors: {"pred_relevant_windows": [[start, end, score], ...]}.
  • Re-exported from lighthouse.models alongside the existing predictors.
  • api_example/marengo_demo.py mirroring the existing demo.py / amr_demo.py.
  • A README section, a marengo optional extra in setup.py, a mypy.ini entry, and tests.

Why it helps Lighthouse

  • A training-free baseline that needs no local checkpoint, feature files, or GPU.
  • Not bound by the 150s benchmark limit, since embedding happens server-side — handy for longer videos.

Opt-in and non-breaking

  • The twelvelabs SDK is an optional extra (pip install 'lighthouse[marengo]') and is only imported when MarengoPredictor is instantiated, so existing users pay no import or dependency cost. No defaults or existing predictors change.

How it was tested

  • No-network unit tests in tests/test_marengo.py cover the cosine-similarity ranking and the empty-input path.
  • A live test (skipped unless TWELVELABS_API_KEY is set) asserts a Marengo text embedding is 512-dim — verified passing locally against the API.
  • ruff and mypy pass on the new files.

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

Per the contributing note, happy to open a tracking issue first if you'd prefer to discuss before reviewing — opening the PR so the proposed change is concrete.

@awkrail

awkrail commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Thank you for your contribution. We will look into the code today.

Comment thread README.md Outdated
Install the extra and set your API key (a free key with a generous free tier is available at
[twelvelabs.io](https://twelvelabs.io)):
```
pip install 'lighthouse[marengo]'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work well? As shown in README, lighthouse is installed from github repo directly. So, correctly, pip install 'lighthouse[marengo] @ git+https://github.com/line/lighthouse.git' works?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we would like to think about the naming. Now, Twelvelabs have Marengo as a video embedding model, but in the future, Twelvelabs may release other video embedding models, right? If so, I would like to use lighthouse[twelvelabs] rather than Marengo itself.

@awkrail

awkrail commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@mohit-twelvelabs I added some comments to your code. I will check the logit part tomorrow by running the code...

Addresses review feedback on line#76:
- Rename the optional dependency extra from [marengo] to [twelvelabs] so it
  is vendor-scoped and future-proof for other TwelveLabs embedding models.
- Fix the README install command to the git form that actually works, since
  lighthouse is installed from GitHub (not PyPI):
  pip install 'lighthouse[twelvelabs] @ git+https://github.com/line/lighthouse.git'
@mohit-twelvelabs

Copy link
Copy Markdown
Author

Thanks for the careful review, @awkrail! Both great points — addressed in f3ee766:

  1. Install command — you're right, since lighthouse installs from GitHub (not PyPI), pip install 'lighthouse[marengo]' wouldn't resolve. Fixed the README to the git form you suggested:

    pip install 'lighthouse[twelvelabs] @ git+https://github.com/line/lighthouse.git'
    
  2. Naming — fully agree on future-proofing. I renamed the optional extra from [marengo] to [twelvelabs] (it installs the twelvelabs SDK, which will cover future embedding models too), so the install is now lighthouse[twelvelabs].

I left the predictor class as MarengoPredictor since it's specific to the Marengo embedding model, but happy to rename it (e.g. TwelveLabsPredictor with a model_name arg) if you'd prefer the vendor-scoped naming there as well — just say the word.

No rush on the logit part — thanks for offering to run it, and let me know if anything comes up when you do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants