Move examples from cog-examples to cog#3055
Conversation
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
This PR moves example models from the external cog-examples repo into the main cog repo and updates the test harness to support base_dir for local models. Good cleanup overall.
I found two actionable issues:
- HIGH: The
managed-weightsweights.lockwas regenerated from incomplete/stub weights. The parakeetmodel.safetensorsis listed as 135 bytes instead of ~2.5GB. - MEDIUM: The
hello-trainmodel lost itstrain_testsentry in the manifest, removing training test coverage even though the example still supports training.
|
I'm Bonk, and I've done a quick review of your PR. Summary: Moves example models from Issues found:
|
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.
Issues found:
-
HIGH —
examples/managed-weights/weights.lockwas regenerated from incomplete/stub weights. The parakeetmodel.safetensorsis listed as 135 bytes instead of ~2.5GB, and the total size dropped from ~2.5GB to ~1.1MB. Please regenerate from the actual weights or revert this file. -
MEDIUM —
hello-trainlost itstrain_testsentry intools/test-harness/manifest.yamleven though the example still containstrain.pyand atrain:field incog.yaml. This removes training test coverage. I've posted a suggestion to restore it. -
MEDIUM —
examples/z-image-turbo/run.pyusestempfile.mktemp()which creates a predictable temporary file path (flagged by CodeQL). Replace withtempfile.NamedTemporaryFile(delete=False)for safety. -
MEDIUM —
examples/hello-concurrency/run.pyhas an emptyexcept FileNotFoundError: passblock. Add a logging statement to document the intent.
|
I'm Bonk, and I've done a quick review of your PR. Summary: Moves example models from Issues found:
|
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.
Issues found:
- MEDIUM —
examples/blur/run.pyreturns the built-ininputfunction instead of theimageparameter whenblur == 0. This is a logic bug. - MEDIUM —
hello-trainis still missing itstrain_testsentry intools/test-harness/manifest.yamleven though the example containstrain.pyand atrain:field incog.yaml. This removes training test coverage.
|
I'm Bonk, and I've done a quick review of your PR. Summary: Moves example models from Issues found:
Both issues have suggestions posted as review comments. |
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.
Issues found:
- MEDIUM —
hello-trainis still missing itstrain_testsentry intools/test-harness/manifest.yamleven though the example containstrain.pyand atrain:field incog.yaml. The test harness still fully supportstrain_tests(seemanifest.goandrunner.go), and there is no visible replacement mechanism or deprecation marker in the codebase. Removing the only training test without a documented alternative creates a coverage gap.
|
I'm Bonk, and I've done a quick review of your PR. Summary: Moves example models from Issues found:
Posted as review comment on PR #3055. |
There was a problem hiding this comment.
I'm Bonk, and I've done a quick review of your PR.
Summary: Moves example models from cog-examples into the main repo, updates the test harness for base_dir support, and adds Secret type coercion in coglet.
Issues found:
-
HIGH — The new Secret coercion in
crates/coglet-python/src/input.rscallsSecret(value)in Python, but the PythonSecretclass inpython/cog/types.pydoes not define an__init__that accepts arguments. This will raiseTypeError: Secret() takes no argumentsat runtime when any predictor declares aSecret-typed input. -
MEDIUM —
hello-trainis still missing itstrain_testsentry intools/test-harness/manifest.yamleven though the example containstrain.pyand atrain:field incog.yaml. The test harness still processestrain_tests(seemanifest.goandrunner.go), and there is no deprecation marker or replacement mechanism visible in the code. Removing the only training test creates a coverage gap.
|
Done. I posted a review on PR #3055 with two actionable issues:
|
- Document list[Secret] coercion exclusion in coglet input handling - Fix stale 'predict.py' comments in example cog.yaml files - Correct hello-train README to use 'cog predict' - Add trailing newlines to example cog.yaml files - Only pass --setup-timeout for 'cog predict' (not 'cog train') - Re-add train_tests for hello-train
|
@anish-sahoo Bonk workflow was cancelled. View workflow run · To retry, trigger Bonk again. |
markphelps
left a comment
There was a problem hiding this comment.
Non-Secret review comments from a focused pass on the example migration and test harness. I left Secret/coglet scope comments out separately.
| repo: local | ||
| base_dir: ../../examples | ||
| path: managed-weights | ||
| gpu: false |
There was a problem hiding this comment.
This makes managed-weights part of the default CPU harness run, but that example depends on manually imported/git-ignored weights. A fresh contributor running the default manifest can fail before they have populated those weights. If we want to keep this in the manifest, can we gate it explicitly?
| gpu: false | |
| gpu: false | |
| requires_env: | |
| - COG_MANAGED_WEIGHTS_READY |
Alternatively, remove it from the default manifest or replace it with a lightweight managed-weights fixture using checked-in test data.
| @@ -0,0 +1,3 @@ | |||
| ## hello-context | |||
|
|
|||
| A simple model that takes no inputs but will echo back any context provided with the prediction as the output. | |||
There was a problem hiding this comment.
This README says the model takes no inputs, but run.py requires a text input and the harness supplies one. Can we make the README match the actual example?
| A simple model that takes no inputs but will echo back any context provided with the prediction as the output. | |
| A simple model that echoes the `text` input and any prediction context in the output. |
| ```yaml | ||
| concurrency: | ||
| max: 32 | ||
| ``` | ||
|
|
||
| This combined with the async setup and predict methods in the predict.py allows Cog to run up to | ||
| 32 concurrent predictions. If cog reaches the max concurrency threshold it will reject subsequent | ||
| predictions with a `409 Conflict` response. |
There was a problem hiding this comment.
This snippet/prose is stale after the migration: cog.yaml has max: 4, and the implementation is run.py:Runner.run rather than predict.py.
| ```yaml | |
| concurrency: | |
| max: 32 | |
| ``` | |
| This combined with the async setup and predict methods in the predict.py allows Cog to run up to | |
| 32 concurrent predictions. If cog reaches the max concurrency threshold it will reject subsequent | |
| predictions with a `409 Conflict` response. | |
| ```yaml | |
| concurrency: | |
| max: 4 | |
| ``` | |
| This combined with the async setup and run methods in `run.py` allows Cog to run up to | |
| 4 concurrent predictions. If Cog reaches the max concurrency threshold it will reject subsequent | |
| predictions with a `409 Conflict` response. |
| It will then start sending events to the `cog-model` data source. You can configure this by | ||
| editing the `OTEL_SERVICE_NAME`. If you use a custom endpoint this can be configured via `OTEL_EXPORTER_OTLP_ENDPOINT`. | ||
|
|
||
| Lastly, there is a section in predict.py that can be uncommented to run telemetry locally and print events to the console for debugging. |
There was a problem hiding this comment.
One more stale filename reference from the predict.py -> run.py migration.
| Lastly, there is a section in predict.py that can be uncommented to run telemetry locally and print events to the console for debugging. | |
| Lastly, there is a section in `run.py` that can be uncommented to run telemetry locally and print events to the console for debugging. |
| cog push | ||
| ``` | ||
|
|
||
| This builds the model image and pushes it to the registry specified by `image:` |
There was a problem hiding this comment.
This still refers to the old image: field, but this PR changed the example config to use model:.
| This builds the model image and pushes it to the registry specified by `image:` | |
| This builds the model image and pushes it as the model named by `model:` |
| - Change `source.uri` to your HuggingFace repo (or a local path) | ||
| - Adjust `exclude` patterns for the formats you don't need | ||
| - Set `target` to wherever your code expects to find the weights | ||
| - Set `image` to your registry destination (required for `cog push`) |
There was a problem hiding this comment.
Same stale image: guidance here; users following this would edit a field the example no longer uses.
| - Set `image` to your registry destination (required for `cog push`) | |
| - Set `model` to your model name (required for `cog push`) |
| This model tells you what's in an image. It's a good example of a deep | ||
| learning model that's small enough to run without a GPU if you're demoing it. | ||
|
|
||
| Use this as a starting point for packaging a real model with managed weights. | ||
| It uses ResNet50 with the ImageNet weights that ship with torchvision, so | ||
| there are no weight files to download or import -- torchvision fetches them | ||
| the first time the model runs. Takes an image, returns the top-3 ImageNet |
There was a problem hiding this comment.
This description is internally inconsistent with the checked-in example: cog.yaml marks it GPU-enabled, and torchvision fetches/caches the ResNet checkpoint at setup rather than shipping it in the repo/image. Can we describe the actual runtime behavior?
| This model tells you what's in an image. It's a good example of a deep | |
| learning model that's small enough to run without a GPU if you're demoing it. | |
| Use this as a starting point for packaging a real model with managed weights. | |
| It uses ResNet50 with the ImageNet weights that ship with torchvision, so | |
| there are no weight files to download or import -- torchvision fetches them | |
| the first time the model runs. Takes an image, returns the top-3 ImageNet | |
| This model tells you what's in an image. It's configured as a GPU example in `cog.yaml`. | |
| It uses ResNet50 with ImageNet weights from torchvision. Torchvision fetches and caches the checkpoint the first time the model starts, so startup requires network access unless the checkpoint is already cached. Takes an image, returns the top-3 ImageNet |
markphelps
left a comment
There was a problem hiding this comment.
I think the coglet Secret input support should be pulled out into a separate PR. This PR is already large and is primarily about moving/modernizing examples; adding new runtime input coercion behavior changes the review surface and deserves its own focused tests/review.
For this PR, can we either:
- remove the coglet Secret changes and any example/test-harness dependency on them, or
- keep the hello-replicate example only if it works with existing released behavior / reads REPLICATE_API_TOKEN from env rather than requiring new coglet Secret coercion?
Then a follow-up PR can add Secret, Optional[Secret], and any list/edge-case behavior with dedicated coglet tests.
Moves the example models from the separate
replicate/cog-examplesrepo into this repo underexamples/, migrates them to theBaseRunner/runAPI, and addscog.Secretinput support to coglet.run.py/BaseRunner:blur,canary,hello-concurrency,hello-context,hello-image,hello-replicate,hello-train,hello-world,notebook,z-image-turbostreaming-textexample torun.py/BaseRunnerexamples/resnetwith the classic torchvision ResNet50 classifier (BaseRunner, ImageNet weights bundled with torchvision -- no managed weights or import step), ported and modernized fromreplicate/cog-examplesexamples/experimental/resnet-managed-weights(renamedmodel: resnet-managed-weights)managed-weightscog.yamlto use themodel:field (replacingimage: <your-registry>/...) and refresh itsweights.lockhello-replicatedemonstratesSecretusage by calling the Replicate APIcoglet
cog.Secretinput support: classify and coerceSecret,Optional[Secret], andSecret | Nonefields, wrapping string values incog.types.Secret(list[Secret]is intentionally not coerced)cargo-deny(RUSTSEC-2026-0176,RUSTSEC-2026-0177) with justification; pinned due to numpy/pyo3-async-runtimes lag, and neither code path is reachable in coglettest harness
base_dirfield forrepo: localmodels so local models resolve relative to the manifest (defaults tofixtures/models/)manifest.yamlat the localexamples/directory instead ofreplicate/cog-examplespredict.py/BasePredictortorun.py/BaseRunner--setup-timeouttocog predict(cog traindoes not support it)