Add Marengo embedding predictor for zero-shot moment retrieval#76
Add Marengo embedding predictor for zero-shot moment retrieval#76mohit-twelvelabs wants to merge 2 commits into
Conversation
|
Thank you for your contribution. We will look into the code today. |
| Install the extra and set your API key (a free key with a generous free tier is available at | ||
| [twelvelabs.io](https://twelvelabs.io)): | ||
| ``` | ||
| pip install 'lighthouse[marengo]' |
There was a problem hiding this comment.
Does this work well? As shown in README, lighthouse is installed from github repo directly. So, correctly, pip install 'lighthouse[marengo] @ git+https://github.com/line/lighthouse.git' works?
There was a problem hiding this comment.
In addition, we would like to think about the naming. Now, Twelvelabs have Marengo as a video embedding model, but in the future, Twelvelabs may release other video embedding models, right? If so, I would like to use lighthouse[twelvelabs] rather than Marengo itself.
|
@mohit-twelvelabs I added some comments to your code. I will check the logit part tomorrow by running the code... |
Addresses review feedback on line#76: - Rename the optional dependency extra from [marengo] to [twelvelabs] so it is vendor-scoped and future-proof for other TwelveLabs embedding models. - Fix the README install command to the git form that actually works, since lighthouse is installed from GitHub (not PyPI): pip install 'lighthouse[twelvelabs] @ git+https://github.com/line/lighthouse.git'
|
Thanks for the careful review, @awkrail! Both great points — addressed in f3ee766:
I left the predictor class as No rush on the logit part — thanks for offering to run it, and let me know if anything comes up when you do. |
Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).
This PR adds an optional, zero-shot moment retrieval predictor backed by TwelveLabs Marengo embeddings.
What it adds
lighthouse/marengo_predictor.py— aMarengoPredictorthat segments and embeds a video server-side, embeds the text query into the same 512-dim Marengo space, and ranks clips by cosine similarity. Its output matches the existing predictors:{"pred_relevant_windows": [[start, end, score], ...]}.lighthouse.modelsalongside the existing predictors.api_example/marengo_demo.pymirroring the existingdemo.py/amr_demo.py.marengooptional extra insetup.py, amypy.inientry, and tests.Why it helps Lighthouse
Opt-in and non-breaking
twelvelabsSDK is an optional extra (pip install 'lighthouse[marengo]') and is only imported whenMarengoPredictoris instantiated, so existing users pay no import or dependency cost. No defaults or existing predictors change.How it was tested
tests/test_marengo.pycover the cosine-similarity ranking and the empty-input path.TWELVELABS_API_KEYis set) asserts a Marengo text embedding is 512-dim — verified passing locally against the API.ruffandmypypass on the new files.You can grab a free API key at https://twelvelabs.io — there's a generous free tier.
Per the contributing note, happy to open a tracking issue first if you'd prefer to discuss before reviewing — opening the PR so the proposed change is concrete.