feat(testing): add public testing surface under nemoguardrails.testing by Pouyanpi · Pull Request #1860 · NVIDIA-NeMo/Guardrails

Pouyanpi · 2026-05-07T09:41:16Z

Description

Promote FakeLLMModel and TestChat from tests/utils.py to a public nemoguardrails.testing subpackage so downstream users can test their guardrails configurations without copying internal test helpers (dogfooding).

nemoguardrails.testing.fake_model.FakeLLMModel: framework-agnostic fake LLM that implements the LLMModel protocol.
nemoguardrails.testing.test_chat.TestChat: ergonomic helper for user/bot conversation assertions.
nemoguardrails.testing.fixtures: opt-in pytest plugin exposing fake_llm, make_fake_llm, and make_test_chat fixtures.

tests/utils.py is reduced to a thin compatibility shim that re-exports from the new package, so the 100+ existing tests that import from there continue to work unchanged.

Adds tests/testing/test_public_surface.py covering the new exports and fixtures, plus docs/user-guides/testing-your-config.md describing the three supported usage patterns.

Related Issue(s)

testing subpackage referenced in #1857

Summary by CodeRabbit

Release Notes

New Features
- Introduced testing utilities for NeMo Guardrails configurations, including deterministic fake LLM support and ergonomic multi-turn conversation testing.
- Added pytest fixtures and plugin integration for seamless test setup and assertions.
Documentation
- Added comprehensive user guide on testing configurations, covering error handling, streaming validation, structured outputs, and fixture-based testing approaches.

github-actions · 2026-05-07T09:43:20Z

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1860

greptile-apps · 2026-05-07T09:44:44Z

Greptile Summary

This PR promotes FakeLLMModel and TestChat from tests/utils.py into a new public nemoguardrails.testing subpackage, and reduces tests/utils.py to a thin compatibility shim so the 100+ existing internal tests continue to work unchanged. Several previously-reported issues are addressed in the new code: stream_async now safely handles content=None via or \"\", asyncio.sleep is reduced to 0, the llm_exception-only constructor path now correctly instantiates FakeLLMModel, and TestChat.__init__ narrows its config annotation to RailsConfig.

New nemoguardrails.testing package: fake_model.py, test_chat.py, and fixtures.py (opt-in pytest plugin) form the public surface; __init__.py re-exports both classes and declares __all__.
Backwards-compatible shim: tests/utils.py now simply imports from the new package, keeping all existing test imports valid without changes.
New test and docs: tests/testing/test_public_surface.py exercises imports, ordering, streaming, round-trips, and all three fixtures; docs/user-guides/testing-your-config.md covers all documented usage patterns.

Confidence Score: 5/5

Safe to merge; adds a new public testing subpackage while keeping all existing internal test imports working via a compatibility shim.

The core functional bugs raised in earlier rounds are all addressed in this iteration. The new package has no side effects on the runtime path and the shim preserves full backward compatibility for the existing test suite.

nemoguardrails/testing/fake_model.py for the self.responses initialization; nemoguardrails/testing/fixtures.py for the Union[str, RailsConfig] annotation mismatch (previously flagged).

Important Files Changed

Filename	Overview
nemoguardrails/testing/fake_model.py	New FakeLLMModel with fixed stream_async (content or ""), asyncio.sleep(0), and generate_async copy. One minor invariant issue: self.responses populated from llm_responses.content can introduce None entries.
nemoguardrails/testing/test_chat.py	TestChat promoted to public surface; llm_exception-only path now correctly creates FakeLLMModel, config parameter narrowed to RailsConfig only.
nemoguardrails/testing/fixtures.py	Pytest fixtures for FakeLLMModel and TestChat. The _factory config parameter is still typed Union[str, RailsConfig] while TestChat only accepts RailsConfig (previously flagged).
tests/utils.py	Reduced to a thin compatibility shim re-exporting FakeLLMModel and TestChat; backwards-compatible.
tests/testing/test_public_surface.py	New test suite covering public surface imports, shim compatibility, FakeLLMModel ordering/streaming, TestChat round-trip, and all three fixtures.

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemoguardrails/testing/fake_model.py:69
`self.responses` is silently populated with `None` entries when `llm_responses` is the source of truth. When any `LLMResponse` in `llm_responses` has `content=None` (e.g. a tool-call-only response), the list comprehension produces `[None, ...]`. Downstream code that inspects `fake.responses` relies on this attribute containing valid strings. A cleaner invariant is to leave `self.responses` empty when `llm_responses` is provided so that callers who inspect it for plain-string debugging don't see a misleading mix of strings and Nones.

```suggestion
        self.responses = responses if responses is not None else []
```

_{Reviews (4): Last reviewed commit: "apply review suggestions" | Re-trigger Greptile}

coderabbitai · 2026-05-07T09:45:54Z

📝 Walkthrough

Walkthrough

This PR introduces a new public testing package nemoguardrails.testing containing FakeLLMModel for deterministic scripted LLM responses and TestChat for ergonomic conversation testing. Both utilities support Colang v1 and v2, are exposed via pytest fixtures, and are documented in a new user guide. The implementations are migrated from tests/utils.py to establish them as a stable public API.

Changes

Testing Public Surface

Layer / File(s)	Summary
Public API Definition `nemoguardrails/testing/__init__.py`	Defines the `nemoguardrails.testing` package public surface with `__all__ = ["FakeLLMModel", "TestChat"]` and Apache-2.0 license header.
FakeLLMModel Implementation `nemoguardrails/testing/fake_model.py`	Implements deterministic fake LLM with scripted `responses` or `llm_responses`, optional token usage, exception injection, and streaming via `LLMResponseChunk` chunks. Tracks call order and raises on exhaustion.
TestChat Implementation `nemoguardrails/testing/test_chat.py`	Implements ergonomic conversation testing helper supporting both Colang v1 (history-based, `generate`) and v2 (event-driven, `process_events`). Provides `user()`, `bot()`, `bot_async()` methods and operator overloads (`>>`, `<<`).
Pytest Fixtures & Integration `nemoguardrails/testing/fixtures.py`	Exposes test utilities as pytest fixtures: `fake_llm` (preconfigured with `"Hello!"`), `make_fake_llm` (factory), and `make_test_chat` (factory accepting `RailsConfig` or path).
User Guide Documentation `docs/user-guides/index.rst`, `docs/user-guides/testing-your-config.md`	Documents testing strategies including `FakeLLMModel`, `TestChat`, structured responses via `llm_responses`, streaming via `stream_async`, and pytest fixture opt-in with examples.
Test Infrastructure & Validation `tests/testing/__init__.py`, `tests/testing/conftest.py`, `tests/testing/test_public_surface.py`	Establishes test package, enables `nemoguardrails.testing.fixtures` via `pytest_plugins`, and validates public surface through export checks, FakeLLMModel behavior (ordering, streaming, exceptions), TestChat round-trip assertions, and fixture factory tests.
Migration from tests/utils.py `tests/utils.py`	Removes `FakeLLMModel` and `TestChat` implementations; reduces imports and updates `__all__` to maintain backward-compatible shims while preserving event helper functions and `_init_state`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 16.13% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: introduction of a public testing surface under nemoguardrails.testing package.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes	✅ Passed	Major feature PR includes comprehensive testing: 9 test functions covering FakeLLMModel/TestChat/fixtures, plus 294-line user guide with examples. Testing information satisfies requirements.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/public-testing-surface

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/user-guides/testing-your-config.md (1)
295-295: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add missing newline at end of file.

The CI pipeline failed because the end-of-file-fixer pre-commit hook detected a missing trailing newline. Add a blank line after the closing code fence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user-guides/testing-your-config.md` at line 295, The file
docs/user-guides/testing-your-config.md is missing a trailing newline after the
closing code fence; open the file and add a single blank line (a newline
character) after the final ``` closing fence so the file ends with a newline to
satisfy the end-of-file-fixer hook.

🧹 Nitpick comments (1)

tests/utils.py (1)

28-40: 💤 Low value

Good backward-compatibility shim.

Re-exporting from the new canonical locations preserves existing imports across 100+ tests.

Static analysis flags that __all__ is unsorted. Consider sorting alphabetically for consistency:

♻️ Optional: Sort __all__

 __all__ = [
     "FakeLLMModel",
     "TestChat",
+    "any_event_conforms",
     "clean_events",
     "event_conforms",
     "event_sequence_conforms",
-    "any_event_conforms",
     "is_data_in_events",
 ]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/utils.py` around lines 28 - 40, The __all__ export list is unsorted;
please alphabetize the string entries in the __all__ list (e.g., reorder items
so "FakeLLMModel", "TestChat", "any_event_conforms", "clean_events",
"event_conforms", "event_sequence_conforms", "is_data_in_events" or whichever
canonical alphabetical order you choose) to keep exports consistent and satisfy
static analysis — update the __all__ variable in tests/utils.py accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/testing/fake_model.py`:
- Around line 118-121: The stream_async implementation unconditionally splits
response.content which will crash if a scripted LLMResponse is tool-call-only
(content is None/non-string); update stream_async to guard the value returned by
_next_response(): check response.content is a non-empty string (e.g.,
isinstance(response.content, str) and response.content) before calling .split("
"), and otherwise treat it as no-text (set chunks = [] or an equivalent no-op
stream) so the method safely handles tool-call-only responses; keep references
to stream_async and _next_response when making the change.

In `@nemoguardrails/testing/test_chat.py`:
- Around line 34-35: The zip of events and event_data should be explicit about
length equality to satisfy Ruff B905: update the loop `for event, data in
zip(events, event_data):` to call zip with strict=True (i.e., `zip(events,
event_data, strict=True)`) so it will raise if lengths differ; this is safe
because length equality is already being checked and makes the intent explicit
for the variables events and event_data.
- Around line 61-63: The constructor parameter config for TestChat is typed
Union[str, RailsConfig] but later code accesses config.models and
self.config.colang_version; normalize config to a RailsConfig at the start of
__init__ by checking isinstance(config, str) and calling
RailsConfig.from_path(config) (or leave as-is if already a RailsConfig) so that
self.config is always a RailsConfig before any access; update the __init__ code
paths that reference self.config.models and self.config.colang_version to rely
on this normalized self.config.
- Around line 159-169: The test is enqueuing the same UtteranceBotActionStarted
event twice, causing duplicate state transitions; in the method where
self.input_events is appended (look for the block that calls
new_event_dict("UtteranceBotActionStarted", action_uid=event["action_uid"])
inside the test helper in nemoguardrails/testing/test_chat.py), remove the
duplicated append so the UtteranceBotActionStarted event is only added once to
self.input_events (keep a single call that constructs
new_event_dict("UtteranceBotActionStarted", action_uid=event["action_uid"]) and
delete the second identical append).

---

Outside diff comments:
In `@docs/user-guides/testing-your-config.md`:
- Line 295: The file docs/user-guides/testing-your-config.md is missing a
trailing newline after the closing code fence; open the file and add a single
blank line (a newline character) after the final ``` closing fence so the file
ends with a newline to satisfy the end-of-file-fixer hook.

---

Nitpick comments:
In `@tests/utils.py`:
- Around line 28-40: The __all__ export list is unsorted; please alphabetize the
string entries in the __all__ list (e.g., reorder items so "FakeLLMModel",
"TestChat", "any_event_conforms", "clean_events", "event_conforms",
"event_sequence_conforms", "is_data_in_events" or whichever canonical
alphabetical order you choose) to keep exports consistent and satisfy static
analysis — update the __all__ variable in tests/utils.py accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 502f34b0-e293-4d83-9ecf-6c5817f8558d

📥 Commits

Reviewing files that changed from the base of the PR and between c69efe5 and dd310e7.

📒 Files selected for processing (10)

docs/user-guides/index.rst
docs/user-guides/testing-your-config.md
nemoguardrails/testing/__init__.py
nemoguardrails/testing/fake_model.py
nemoguardrails/testing/fixtures.py
nemoguardrails/testing/test_chat.py
tests/testing/__init__.py
tests/testing/conftest.py
tests/testing/test_public_surface.py
tests/utils.py

codecov · 2026-05-07T09:49:42Z

Codecov Report

❌ Patch coverage is 91.13924% with 14 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/testing/test_chat.py	83.75%	13 Missing ⚠️
nemoguardrails/testing/fake_model.py	98.21%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Promote FakeLLMModel and TestChat from tests/utils.py to a public nemoguardrails.testing subpackage so downstream users can test their guardrails configurations without copying internal test helpers. - nemoguardrails.testing.fake_model.FakeLLMModel: framework-agnostic fake LLM that implements the LLMModel protocol. - nemoguardrails.testing.test_chat.TestChat: ergonomic helper for user/bot conversation assertions. - nemoguardrails.testing.fixtures: opt-in pytest plugin exposing fake_llm, make_fake_llm, and make_test_chat fixtures. tests/utils.py is reduced to a thin compatibility shim that re-exports from the new package, so the 100+ existing tests that import from there continue to work unchanged. Adds tests/testing/test_public_surface.py covering the new exports and fixtures, plus docs/user-guides/testing-your-config.md describing the three supported usage patterns.

tgasser-nv

Looks good! Mostly nits and cleanups with some renaming needed before merging.

At a higher-level, what's the purpose of making these test fixtures more publicly accessible? Is it for external contributors to write their own LLMModel-compliant classes?

tgasser-nv · 2026-05-08T15:36:49Z

+    """
+
+    def _factory(
+        config: Union[str, RailsConfig],


The config here has type of Union[str, RailsConfig], but it's passed into the TestChat init which expects a RailsConfig only. If you want to support string configs then it needs to be detected and converted to a RailsConfig before creating the TestChat

tgasser-nv · 2026-05-08T15:41:48Z

+
+    """
+
+    __test__ = False


I know this removes the file from pytest automatic test discovery. But anyone using this would expect a test_chat.py file to contain unit tests for chat. Recommend renaming to anything without a Test suffix, ChatHarness, FakeChat maybe? (for symmetry with FakeLLM?)

tgasser-nv · 2026-05-08T15:43:01Z

+        else:
+            self._llm_responses = []
+        self.responses = responses or [response.content for response in self._llm_responses]
+        self.i = 0


Can you rename i to something more descriptive like inference_count or similar? On a public interface it's not obvious what this is counting from the name alone

tgasser-nv · 2026-05-08T15:45:43Z

+        config: RailsConfig,
+        llm_completions: Optional[List[str]] = None,
+        streaming: bool = False,
+        llm_exception: Optional[Exception] = None,


nit: Should this be renamed exception to match the fake_model.py `exception field? They're both the same type and it gets passed into the FakeLLMModel init as-is without any changes

tgasser-nv · 2026-05-08T15:48:19Z

+        streaming: bool = False,
+        llm_exception: Optional[Exception] = None,
+        token_usage: Optional[List[Dict[str, int]]] = None,
+        llm: Optional[Any] = None,


Should this be type FakeLLMModel rather than Any? It gets assigned to self.llm which has type FakeLLModel on line 88

tgasser-nv · 2026-05-08T15:58:13Z

+            content = chunk + " " if chunk_index < len(chunks) - 1 else chunk
+            await asyncio.sleep(0)
+            yield LLMResponseChunk(delta_content=content)
+        await asyncio.sleep(0)


Could you re-add the comment about why the final sleep is needed (from here)?

tgasser-nv · 2026-05-08T15:59:12Z

+from nemoguardrails.testing.fake_model import FakeLLMModel
+from nemoguardrails.testing.test_chat import TestChat
+
+__all__ = ["FakeLLMModel", "TestChat"]


Could you add the other fixtures here too, so import * works? (["fake_llm", "make_fake_llm", "make_test_chat"])

tgasser-nv · 2026-05-08T16:02:20Z

+        """
+        if llm is not None:
+            self.llm = llm
+        elif llm_completions is not None or llm_exception is not None:


Could you add a unit-test where llm_completions is None but llm_exception isn't to check the second part of this elif?

tgasser-nv · 2026-05-08T16:04:02Z

+
+@pytest.mark.asyncio
+async def test_fake_llm_model_streaming_yields_chunks():
+    llm = FakeLLMModel(responses=["one two three"])


This llm object has streaming set to False (the default). As in earlier comment, either remove the streaming arg and let clients call stream_async() or generate_async() however they want, or keep it and raise if they try to stream when it's not enabled

tgasser-nv · 2026-05-08T16:06:50Z

@@ -0,0 +1,110 @@
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


nit: Is there a better name for this file? Public Surface doesn't make much sense. Maybe test_fake_fixtures.py (hints at FakeLLM or FakeChat if it's renamed to that) or something similar

Pouyanpi added this to the v0.22.0 milestone May 7, 2026

Pouyanpi self-assigned this May 7, 2026

Pouyanpi added the enhancement New feature or request label May 7, 2026

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread nemoguardrails/testing/fake_model.py

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Comment thread nemoguardrails/testing/fake_model.py

Comment thread nemoguardrails/testing/test_chat.py Outdated

Comment thread nemoguardrails/testing/test_chat.py Outdated

Comment thread nemoguardrails/testing/test_chat.py

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread nemoguardrails/testing/fixtures.py

Pouyanpi added 2 commits May 7, 2026 12:49

pre-commits

2471742

Pouyanpi force-pushed the feat/public-testing-surface branch from c27e8e4 to dd2b046 Compare May 7, 2026 10:49

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread nemoguardrails/testing/test_chat.py

Comment thread nemoguardrails/testing/fake_model.py

apply review suggestions

24d6c65

Pouyanpi force-pushed the feat/public-testing-surface branch from dd2b046 to 24d6c65 Compare May 7, 2026 12:15

Pouyanpi mentioned this pull request May 8, 2026

docs(custom-initialization): add customLLM and customFramework guides #1857

Open

tgasser-nv approved these changes May 8, 2026

View reviewed changes

		@@ -0,0 +1,110 @@
		# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Conversation

Pouyanpi commented May 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions Bot commented May 7, 2026

Documentation preview

Uh oh!

greptile-apps Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

coderabbitai Bot commented May 7, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 7, 2026

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgasser-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Pouyanpi commented May 7, 2026 •

edited by coderabbitai Bot

Loading

greptile-apps Bot commented May 7, 2026 •

edited

Loading