Skip to content

Move examples from cog-examples to cog#3055

Open
anish-sahoo wants to merge 7 commits into
mainfrom
update-examples
Open

Move examples from cog-examples to cog#3055
anish-sahoo wants to merge 7 commits into
mainfrom
update-examples

Conversation

@anish-sahoo

@anish-sahoo anish-sahoo commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Moves the example models from the separate replicate/cog-examples repo into this repo under examples/, migrates them to the BaseRunner/run API, and adds cog.Secret input support to coglet.

  • Add example models using run.py/BaseRunner: blur, canary, hello-concurrency, hello-context, hello-image, hello-replicate, hello-train, hello-world, notebook, z-image-turbo
  • Migrate the existing streaming-text example to run.py/BaseRunner
  • Replace examples/resnet with the classic torchvision ResNet50 classifier (BaseRunner, ImageNet weights bundled with torchvision -- no managed weights or import step), ported and modernized from replicate/cog-examples
  • Move the previous managed-weights ResNet example to examples/experimental/resnet-managed-weights (renamed model: resnet-managed-weights)
  • Update managed-weights cog.yaml to use the model: field (replacing image: <your-registry>/...) and refresh its weights.lock
  • hello-replicate demonstrates Secret usage by calling the Replicate API

coglet

  • Add cog.Secret input support: classify and coerce Secret, Optional[Secret], and Secret | None fields, wrapping string values in cog.types.Secret (list[Secret] is intentionally not coerced)
  • Ignore two pyo3 0.27 advisories in cargo-deny (RUSTSEC-2026-0176, RUSTSEC-2026-0177) with justification; pinned due to numpy/pyo3-async-runtimes lag, and neither code path is reachable in coglet

test harness

  • Add a base_dir field for repo: local models so local models resolve relative to the manifest (defaults to fixtures/models/)
  • Point all example models in manifest.yaml at the local examples/ directory instead of replicate/cog-examples
  • Migrate fixture models from predict.py/BasePredictor to run.py/BaseRunner
  • Only pass --setup-timeout to cog predict (cog train does not support it)

Comment thread examples/z-image-turbo/run.py Fixed
Comment thread examples/hello-concurrency/run.py Fixed

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR moves example models from the external cog-examples repo into the main cog repo and updates the test harness to support base_dir for local models. Good cleanup overall.

I found two actionable issues:

  1. HIGH: The managed-weights weights.lock was regenerated from incomplete/stub weights. The parakeet model.safetensors is listed as 135 bytes instead of ~2.5GB.
  2. MEDIUM: The hello-train model lost its train_tests entry in the manifest, removing training test coverage even though the example still supports training.

Comment thread examples/managed-weights/weights.lock Outdated
Comment thread tools/test-harness/manifest.yaml
@ask-bonk

ask-bonk Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. HIGHexamples/managed-weights/weights.lock was regenerated from incomplete/stub weights. The parakeet model.safetensors is listed as 135 bytes instead of ~2.5GB, and the total size dropped from ~2.5GB to ~1.1MB. Please regenerate from the actual weights or revert this file.

  2. MEDIUMhello-train lost its train_tests entry in tools/test-harness/manifest.yaml even though the example still contains train.py and a train: field in cog.yaml. This removes training test coverage. I've posted a suggestion to restore it.

github run

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. HIGHexamples/managed-weights/weights.lock was regenerated from incomplete/stub weights. The parakeet model.safetensors is listed as 135 bytes instead of ~2.5GB, and the total size dropped from ~2.5GB to ~1.1MB. Please regenerate from the actual weights or revert this file.

  2. MEDIUMhello-train lost its train_tests entry in tools/test-harness/manifest.yaml even though the example still contains train.py and a train: field in cog.yaml. This removes training test coverage. I've posted a suggestion to restore it.

  3. MEDIUMexamples/z-image-turbo/run.py uses tempfile.mktemp() which creates a predictable temporary file path (flagged by CodeQL). Replace with tempfile.NamedTemporaryFile(delete=False) for safety.

  4. MEDIUMexamples/hello-concurrency/run.py has an empty except FileNotFoundError: pass block. Add a logging statement to document the intent.

Comment thread examples/managed-weights/weights.lock
Comment thread tools/test-harness/manifest.yaml
Comment thread examples/z-image-turbo/run.py Outdated
Comment thread examples/hello-concurrency/run.py
@ask-bonk

ask-bonk Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. HIGHexamples/managed-weights/weights.lock was regenerated from incomplete/stub weights. The parakeet model.safetensors is listed as 135 bytes instead of ~2.5GB, and the total size dropped from ~2.5GB to ~1.1MB. Please regenerate from the actual weights or revert this file.

  2. MEDIUMhello-train lost its train_tests entry in tools/test-harness/manifest.yaml even though the example still contains train.py and a train: field in cog.yaml. This removes training test coverage. I've posted a suggestion to restore it.

  3. MEDIUMexamples/z-image-turbo/run.py uses tempfile.mktemp() which creates a predictable temporary file path (flagged by CodeQL). Replace with tempfile.NamedTemporaryFile(delete=False) for safety.

  4. MEDIUMexamples/hello-concurrency/run.py has an empty except FileNotFoundError: pass block. Add a logging statement to document the intent.

github run

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. MEDIUMexamples/blur/run.py returns the built-in input function instead of the image parameter when blur == 0. This is a logic bug.
  2. MEDIUMhello-train is still missing its train_tests entry in tools/test-harness/manifest.yaml even though the example contains train.py and a train: field in cog.yaml. This removes training test coverage.

Comment thread examples/blur/run.py Outdated
Comment thread tools/test-harness/manifest.yaml
@ask-bonk

ask-bonk Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. MEDIUMexamples/blur/run.py returns the built-in input function instead of the image parameter when blur == 0. This is a logic bug.
  2. MEDIUMhello-train is still missing its train_tests entry in tools/test-harness/manifest.yaml even though the example contains train.py and a train: field in cog.yaml. This removes training test coverage.

Both issues have suggestions posted as review comments.

github run

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. MEDIUMhello-train is still missing its train_tests entry in tools/test-harness/manifest.yaml even though the example contains train.py and a train: field in cog.yaml. The test harness still fully supports train_tests (see manifest.go and runner.go), and there is no visible replacement mechanism or deprecation marker in the codebase. Removing the only training test without a documented alternative creates a coverage gap.

Comment thread tools/test-harness/manifest.yaml
@ask-bonk

ask-bonk Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo and updates the test harness to support base_dir for local models.

Issues found:

  1. MEDIUMhello-train is still missing its train_tests entry in tools/test-harness/manifest.yaml even though the example contains train.py and a train: field in cog.yaml. The test harness still fully supports train_tests (see manifest.go and runner.go), and there is no visible replacement mechanism or deprecation marker in the codebase. Removing the only training test without a documented alternative creates a coverage gap.

Posted as review comment on PR #3055.

github run

@ask-bonk ask-bonk Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

Summary: Moves example models from cog-examples into the main repo, updates the test harness for base_dir support, and adds Secret type coercion in coglet.

Issues found:

  1. HIGH — The new Secret coercion in crates/coglet-python/src/input.rs calls Secret(value) in Python, but the Python Secret class in python/cog/types.py does not define an __init__ that accepts arguments. This will raise TypeError: Secret() takes no arguments at runtime when any predictor declares a Secret-typed input.

  2. MEDIUMhello-train is still missing its train_tests entry in tools/test-harness/manifest.yaml even though the example contains train.py and a train: field in cog.yaml. The test harness still processes train_tests (see manifest.go and runner.go), and there is no deprecation marker or replacement mechanism visible in the code. Removing the only training test creates a coverage gap.

Comment thread crates/coglet-python/src/input.rs
Comment thread tools/test-harness/manifest.yaml
@ask-bonk

ask-bonk Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Done. I posted a review on PR #3055 with two actionable issues:

  1. HIGH — The new Secret coercion in crates/coglet-python/src/input.rs:288 calls Secret(value), but Python's Secret class has no __init__ and will raise TypeError. I suggested creating the instance with call0() and setting the secret_value attribute directly.

  2. MEDIUMhello-train still lacks train_tests coverage despite having train.py and a train: field, with no deprecation marker in the codebase.

github run

- Document list[Secret] coercion exclusion in coglet input handling
- Fix stale 'predict.py' comments in example cog.yaml files
- Correct hello-train README to use 'cog predict'
- Add trailing newlines to example cog.yaml files
- Only pass --setup-timeout for 'cog predict' (not 'cog train')
- Re-add train_tests for hello-train
@anish-sahoo anish-sahoo marked this pull request as ready for review June 15, 2026 18:13
@anish-sahoo anish-sahoo requested a review from a team as a code owner June 15, 2026 18:13
@ask-bonk

ask-bonk Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

@anish-sahoo Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

@markphelps markphelps left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-Secret review comments from a focused pass on the example migration and test harness. I left Secret/coglet scope comments out separately.

repo: local
base_dir: ../../examples
path: managed-weights
gpu: false

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes managed-weights part of the default CPU harness run, but that example depends on manually imported/git-ignored weights. A fresh contributor running the default manifest can fail before they have populated those weights. If we want to keep this in the manifest, can we gate it explicitly?

Suggested change
gpu: false
gpu: false
requires_env:
- COG_MANAGED_WEIGHTS_READY

Alternatively, remove it from the default manifest or replace it with a lightweight managed-weights fixture using checked-in test data.

@@ -0,0 +1,3 @@
## hello-context

A simple model that takes no inputs but will echo back any context provided with the prediction as the output.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This README says the model takes no inputs, but run.py requires a text input and the harness supplies one. Can we make the README match the actual example?

Suggested change
A simple model that takes no inputs but will echo back any context provided with the prediction as the output.
A simple model that echoes the `text` input and any prediction context in the output.

Comment on lines +8 to +15
```yaml
concurrency:
max: 32
```

This combined with the async setup and predict methods in the predict.py allows Cog to run up to
32 concurrent predictions. If cog reaches the max concurrency threshold it will reject subsequent
predictions with a `409 Conflict` response.

@markphelps markphelps Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet/prose is stale after the migration: cog.yaml has max: 4, and the implementation is run.py:Runner.run rather than predict.py.

Suggested change
```yaml
concurrency:
max: 32
```
This combined with the async setup and predict methods in the predict.py allows Cog to run up to
32 concurrent predictions. If cog reaches the max concurrency threshold it will reject subsequent
predictions with a `409 Conflict` response.
```yaml
concurrency:
max: 4
```
This combined with the async setup and run methods in `run.py` allows Cog to run up to
4 concurrent predictions. If Cog reaches the max concurrency threshold it will reject subsequent
predictions with a `409 Conflict` response.

It will then start sending events to the `cog-model` data source. You can configure this by
editing the `OTEL_SERVICE_NAME`. If you use a custom endpoint this can be configured via `OTEL_EXPORTER_OTLP_ENDPOINT`.

Lastly, there is a section in predict.py that can be uncommented to run telemetry locally and print events to the console for debugging.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more stale filename reference from the predict.py -> run.py migration.

Suggested change
Lastly, there is a section in predict.py that can be uncommented to run telemetry locally and print events to the console for debugging.
Lastly, there is a section in `run.py` that can be uncommented to run telemetry locally and print events to the console for debugging.

cog push
```

This builds the model image and pushes it to the registry specified by `image:`

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still refers to the old image: field, but this PR changed the example config to use model:.

Suggested change
This builds the model image and pushes it to the registry specified by `image:`
This builds the model image and pushes it as the model named by `model:`

- Change `source.uri` to your HuggingFace repo (or a local path)
- Adjust `exclude` patterns for the formats you don't need
- Set `target` to wherever your code expects to find the weights
- Set `image` to your registry destination (required for `cog push`)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same stale image: guidance here; users following this would edit a field the example no longer uses.

Suggested change
- Set `image` to your registry destination (required for `cog push`)
- Set `model` to your model name (required for `cog push`)

Comment thread examples/resnet/README.md
Comment on lines +3 to +8
This model tells you what's in an image. It's a good example of a deep
learning model that's small enough to run without a GPU if you're demoing it.

Use this as a starting point for packaging a real model with managed weights.
It uses ResNet50 with the ImageNet weights that ship with torchvision, so
there are no weight files to download or import -- torchvision fetches them
the first time the model runs. Takes an image, returns the top-3 ImageNet

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is internally inconsistent with the checked-in example: cog.yaml marks it GPU-enabled, and torchvision fetches/caches the ResNet checkpoint at setup rather than shipping it in the repo/image. Can we describe the actual runtime behavior?

Suggested change
This model tells you what's in an image. It's a good example of a deep
learning model that's small enough to run without a GPU if you're demoing it.
Use this as a starting point for packaging a real model with managed weights.
It uses ResNet50 with the ImageNet weights that ship with torchvision, so
there are no weight files to download or import -- torchvision fetches them
the first time the model runs. Takes an image, returns the top-3 ImageNet
This model tells you what's in an image. It's configured as a GPU example in `cog.yaml`.
It uses ResNet50 with ImageNet weights from torchvision. Torchvision fetches and caches the checkpoint the first time the model starts, so startup requires network access unless the checkpoint is already cached. Takes an image, returns the top-3 ImageNet

@markphelps markphelps left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the coglet Secret input support should be pulled out into a separate PR. This PR is already large and is primarily about moving/modernizing examples; adding new runtime input coercion behavior changes the review surface and deserves its own focused tests/review.

For this PR, can we either:

  1. remove the coglet Secret changes and any example/test-harness dependency on them, or
  2. keep the hello-replicate example only if it works with existing released behavior / reads REPLICATE_API_TOKEN from env rather than requiring new coglet Secret coercion?

Then a follow-up PR can add Secret, Optional[Secret], and any list/edge-case behavior with dedicated coglet tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants