Skip to content

feat(testing): add public testing surface under nemoguardrails.testing#1860

Open
Pouyanpi wants to merge 3 commits intodevelopfrom
feat/public-testing-surface
Open

feat(testing): add public testing surface under nemoguardrails.testing#1860
Pouyanpi wants to merge 3 commits intodevelopfrom
feat/public-testing-surface

Conversation

@Pouyanpi
Copy link
Copy Markdown
Collaborator

@Pouyanpi Pouyanpi commented May 7, 2026

Description

Promote FakeLLMModel and TestChat from tests/utils.py to a public nemoguardrails.testing subpackage so downstream users can test their guardrails configurations without copying internal test helpers (dogfooding).

  • nemoguardrails.testing.fake_model.FakeLLMModel: framework-agnostic fake LLM that implements the LLMModel protocol.
  • nemoguardrails.testing.test_chat.TestChat: ergonomic helper for user/bot conversation assertions.
  • nemoguardrails.testing.fixtures: opt-in pytest plugin exposing fake_llm, make_fake_llm, and make_test_chat fixtures.

tests/utils.py is reduced to a thin compatibility shim that re-exports from the new package, so the 100+ existing tests that import from there continue to work unchanged.

Adds tests/testing/test_public_surface.py covering the new exports and fixtures, plus docs/user-guides/testing-your-config.md describing the three supported usage patterns.

Related Issue(s)

testing subpackage referenced in #1857

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced testing utilities for NeMo Guardrails configurations, including deterministic fake LLM support and ergonomic multi-turn conversation testing.
    • Added pytest fixtures and plugin integration for seamless test setup and assertions.
  • Documentation

    • Added comprehensive user guide on testing configurations, covering error handling, streaming validation, structured outputs, and fixture-based testing approaches.

@Pouyanpi Pouyanpi added this to the v0.22.0 milestone May 7, 2026
@Pouyanpi Pouyanpi self-assigned this May 7, 2026
@Pouyanpi Pouyanpi added the enhancement New feature or request label May 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1860

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR promotes FakeLLMModel and TestChat from tests/utils.py into a new public nemoguardrails.testing subpackage, and reduces tests/utils.py to a thin compatibility shim so the 100+ existing internal tests continue to work unchanged. Several previously-reported issues are addressed in the new code: stream_async now safely handles content=None via or \"\", asyncio.sleep is reduced to 0, the llm_exception-only constructor path now correctly instantiates FakeLLMModel, and TestChat.__init__ narrows its config annotation to RailsConfig.

  • New nemoguardrails.testing package: fake_model.py, test_chat.py, and fixtures.py (opt-in pytest plugin) form the public surface; __init__.py re-exports both classes and declares __all__.
  • Backwards-compatible shim: tests/utils.py now simply imports from the new package, keeping all existing test imports valid without changes.
  • New test and docs: tests/testing/test_public_surface.py exercises imports, ordering, streaming, round-trips, and all three fixtures; docs/user-guides/testing-your-config.md covers all documented usage patterns.

Confidence Score: 5/5

Safe to merge; adds a new public testing subpackage while keeping all existing internal test imports working via a compatibility shim.

The core functional bugs raised in earlier rounds are all addressed in this iteration. The new package has no side effects on the runtime path and the shim preserves full backward compatibility for the existing test suite.

nemoguardrails/testing/fake_model.py for the self.responses initialization; nemoguardrails/testing/fixtures.py for the Union[str, RailsConfig] annotation mismatch (previously flagged).

Important Files Changed

Filename Overview
nemoguardrails/testing/fake_model.py New FakeLLMModel with fixed stream_async (content or ""), asyncio.sleep(0), and generate_async copy. One minor invariant issue: self.responses populated from llm_responses.content can introduce None entries.
nemoguardrails/testing/test_chat.py TestChat promoted to public surface; llm_exception-only path now correctly creates FakeLLMModel, config parameter narrowed to RailsConfig only.
nemoguardrails/testing/fixtures.py Pytest fixtures for FakeLLMModel and TestChat. The _factory config parameter is still typed Union[str, RailsConfig] while TestChat only accepts RailsConfig (previously flagged).
tests/utils.py Reduced to a thin compatibility shim re-exporting FakeLLMModel and TestChat; backwards-compatible.
tests/testing/test_public_surface.py New test suite covering public surface imports, shim compatibility, FakeLLMModel ordering/streaming, TestChat round-trip, and all three fixtures.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
nemoguardrails/testing/fake_model.py:69
`self.responses` is silently populated with `None` entries when `llm_responses` is the source of truth. When any `LLMResponse` in `llm_responses` has `content=None` (e.g. a tool-call-only response), the list comprehension produces `[None, ...]`. Downstream code that inspects `fake.responses` relies on this attribute containing valid strings. A cleaner invariant is to leave `self.responses` empty when `llm_responses` is provided so that callers who inspect it for plain-string debugging don't see a misleading mix of strings and Nones.

```suggestion
        self.responses = responses if responses is not None else []
```

Reviews (4): Last reviewed commit: "apply review suggestions" | Re-trigger Greptile

Comment thread nemoguardrails/testing/fake_model.py
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces a new public testing package nemoguardrails.testing containing FakeLLMModel for deterministic scripted LLM responses and TestChat for ergonomic conversation testing. Both utilities support Colang v1 and v2, are exposed via pytest fixtures, and are documented in a new user guide. The implementations are migrated from tests/utils.py to establish them as a stable public API.

Changes

Testing Public Surface

Layer / File(s) Summary
Public API Definition
nemoguardrails/testing/__init__.py
Defines the nemoguardrails.testing package public surface with __all__ = ["FakeLLMModel", "TestChat"] and Apache-2.0 license header.
FakeLLMModel Implementation
nemoguardrails/testing/fake_model.py
Implements deterministic fake LLM with scripted responses or llm_responses, optional token usage, exception injection, and streaming via LLMResponseChunk chunks. Tracks call order and raises on exhaustion.
TestChat Implementation
nemoguardrails/testing/test_chat.py
Implements ergonomic conversation testing helper supporting both Colang v1 (history-based, generate) and v2 (event-driven, process_events). Provides user(), bot(), bot_async() methods and operator overloads (>>, <<).
Pytest Fixtures & Integration
nemoguardrails/testing/fixtures.py
Exposes test utilities as pytest fixtures: fake_llm (preconfigured with "Hello!"), make_fake_llm (factory), and make_test_chat (factory accepting RailsConfig or path).
User Guide Documentation
docs/user-guides/index.rst, docs/user-guides/testing-your-config.md
Documents testing strategies including FakeLLMModel, TestChat, structured responses via llm_responses, streaming via stream_async, and pytest fixture opt-in with examples.
Test Infrastructure & Validation
tests/testing/__init__.py, tests/testing/conftest.py, tests/testing/test_public_surface.py
Establishes test package, enables nemoguardrails.testing.fixtures via pytest_plugins, and validates public surface through export checks, FakeLLMModel behavior (ordering, streaming, exceptions), TestChat round-trip assertions, and fixture factory tests.
Migration from tests/utils.py
tests/utils.py
Removes FakeLLMModel and TestChat implementations; reduces imports and updates __all__ to maintain backward-compatible shims while preserving event helper functions and _init_state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main change: introduction of a public testing surface under nemoguardrails.testing package.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Results For Major Changes ✅ Passed Major feature PR includes comprehensive testing: 9 test functions covering FakeLLMModel/TestChat/fixtures, plus 294-line user guide with examples. Testing information satisfies requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/public-testing-surface

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user-guides/testing-your-config.md (1)

295-295: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add missing newline at end of file.

The CI pipeline failed because the end-of-file-fixer pre-commit hook detected a missing trailing newline. Add a blank line after the closing code fence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/user-guides/testing-your-config.md` at line 295, The file
docs/user-guides/testing-your-config.md is missing a trailing newline after the
closing code fence; open the file and add a single blank line (a newline
character) after the final ``` closing fence so the file ends with a newline to
satisfy the end-of-file-fixer hook.
🧹 Nitpick comments (1)
tests/utils.py (1)

28-40: 💤 Low value

Good backward-compatibility shim.

Re-exporting from the new canonical locations preserves existing imports across 100+ tests.

Static analysis flags that __all__ is unsorted. Consider sorting alphabetically for consistency:

♻️ Optional: Sort __all__
 __all__ = [
     "FakeLLMModel",
     "TestChat",
+    "any_event_conforms",
     "clean_events",
     "event_conforms",
     "event_sequence_conforms",
-    "any_event_conforms",
     "is_data_in_events",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/utils.py` around lines 28 - 40, The __all__ export list is unsorted;
please alphabetize the string entries in the __all__ list (e.g., reorder items
so "FakeLLMModel", "TestChat", "any_event_conforms", "clean_events",
"event_conforms", "event_sequence_conforms", "is_data_in_events" or whichever
canonical alphabetical order you choose) to keep exports consistent and satisfy
static analysis — update the __all__ variable in tests/utils.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nemoguardrails/testing/fake_model.py`:
- Around line 118-121: The stream_async implementation unconditionally splits
response.content which will crash if a scripted LLMResponse is tool-call-only
(content is None/non-string); update stream_async to guard the value returned by
_next_response(): check response.content is a non-empty string (e.g.,
isinstance(response.content, str) and response.content) before calling .split("
"), and otherwise treat it as no-text (set chunks = [] or an equivalent no-op
stream) so the method safely handles tool-call-only responses; keep references
to stream_async and _next_response when making the change.

In `@nemoguardrails/testing/test_chat.py`:
- Around line 34-35: The zip of events and event_data should be explicit about
length equality to satisfy Ruff B905: update the loop `for event, data in
zip(events, event_data):` to call zip with strict=True (i.e., `zip(events,
event_data, strict=True)`) so it will raise if lengths differ; this is safe
because length equality is already being checked and makes the intent explicit
for the variables events and event_data.
- Around line 61-63: The constructor parameter config for TestChat is typed
Union[str, RailsConfig] but later code accesses config.models and
self.config.colang_version; normalize config to a RailsConfig at the start of
__init__ by checking isinstance(config, str) and calling
RailsConfig.from_path(config) (or leave as-is if already a RailsConfig) so that
self.config is always a RailsConfig before any access; update the __init__ code
paths that reference self.config.models and self.config.colang_version to rely
on this normalized self.config.
- Around line 159-169: The test is enqueuing the same UtteranceBotActionStarted
event twice, causing duplicate state transitions; in the method where
self.input_events is appended (look for the block that calls
new_event_dict("UtteranceBotActionStarted", action_uid=event["action_uid"])
inside the test helper in nemoguardrails/testing/test_chat.py), remove the
duplicated append so the UtteranceBotActionStarted event is only added once to
self.input_events (keep a single call that constructs
new_event_dict("UtteranceBotActionStarted", action_uid=event["action_uid"]) and
delete the second identical append).

---

Outside diff comments:
In `@docs/user-guides/testing-your-config.md`:
- Line 295: The file docs/user-guides/testing-your-config.md is missing a
trailing newline after the closing code fence; open the file and add a single
blank line (a newline character) after the final ``` closing fence so the file
ends with a newline to satisfy the end-of-file-fixer hook.

---

Nitpick comments:
In `@tests/utils.py`:
- Around line 28-40: The __all__ export list is unsorted; please alphabetize the
string entries in the __all__ list (e.g., reorder items so "FakeLLMModel",
"TestChat", "any_event_conforms", "clean_events", "event_conforms",
"event_sequence_conforms", "is_data_in_events" or whichever canonical
alphabetical order you choose) to keep exports consistent and satisfy static
analysis — update the __all__ variable in tests/utils.py accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 502f34b0-e293-4d83-9ecf-6c5817f8558d

📥 Commits

Reviewing files that changed from the base of the PR and between c69efe5 and dd310e7.

📒 Files selected for processing (10)
  • docs/user-guides/index.rst
  • docs/user-guides/testing-your-config.md
  • nemoguardrails/testing/__init__.py
  • nemoguardrails/testing/fake_model.py
  • nemoguardrails/testing/fixtures.py
  • nemoguardrails/testing/test_chat.py
  • tests/testing/__init__.py
  • tests/testing/conftest.py
  • tests/testing/test_public_surface.py
  • tests/utils.py

Comment thread nemoguardrails/testing/fake_model.py
Comment thread nemoguardrails/testing/test_chat.py Outdated
Comment thread nemoguardrails/testing/test_chat.py Outdated
Comment thread nemoguardrails/testing/test_chat.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 91.13924% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
nemoguardrails/testing/test_chat.py 83.75% 13 Missing ⚠️
nemoguardrails/testing/fake_model.py 98.21% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread nemoguardrails/testing/fixtures.py
Pouyanpi added 2 commits May 7, 2026 12:49
Promote FakeLLMModel and TestChat from tests/utils.py to a public
nemoguardrails.testing subpackage so downstream users can test their
guardrails configurations without copying internal test helpers.

- nemoguardrails.testing.fake_model.FakeLLMModel: framework-agnostic
  fake LLM that implements the LLMModel protocol.
- nemoguardrails.testing.test_chat.TestChat: ergonomic helper for
  user/bot conversation assertions.
- nemoguardrails.testing.fixtures: opt-in pytest plugin exposing
  fake_llm, make_fake_llm, and make_test_chat fixtures.

tests/utils.py is reduced to a thin compatibility shim that re-exports
from the new package, so the 100+ existing tests that import from there
continue to work unchanged.

Adds tests/testing/test_public_surface.py covering the new exports and
fixtures, plus docs/user-guides/testing-your-config.md describing the
three supported usage patterns.
@Pouyanpi Pouyanpi force-pushed the feat/public-testing-surface branch from c27e8e4 to dd2b046 Compare May 7, 2026 10:49
Comment thread nemoguardrails/testing/test_chat.py
Comment thread nemoguardrails/testing/fake_model.py
Copy link
Copy Markdown
Collaborator

@tgasser-nv tgasser-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Mostly nits and cleanups with some renaming needed before merging.

At a higher-level, what's the purpose of making these test fixtures more publicly accessible? Is it for external contributors to write their own LLMModel-compliant classes?

"""

def _factory(
config: Union[str, RailsConfig],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config here has type of Union[str, RailsConfig], but it's passed into the TestChat init which expects a RailsConfig only. If you want to support string configs then it needs to be detected and converted to a RailsConfig before creating the TestChat


"""

__test__ = False
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this removes the file from pytest automatic test discovery. But anyone using this would expect a test_chat.py file to contain unit tests for chat. Recommend renaming to anything without a Test suffix, ChatHarness, FakeChat maybe? (for symmetry with FakeLLM?)

else:
self._llm_responses = []
self.responses = responses or [response.content for response in self._llm_responses]
self.i = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename i to something more descriptive like inference_count or similar? On a public interface it's not obvious what this is counting from the name alone

config: RailsConfig,
llm_completions: Optional[List[str]] = None,
streaming: bool = False,
llm_exception: Optional[Exception] = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this be renamed exception to match the fake_model.py `exception field? They're both the same type and it gets passed into the FakeLLMModel init as-is without any changes

streaming: bool = False,
llm_exception: Optional[Exception] = None,
token_usage: Optional[List[Dict[str, int]]] = None,
llm: Optional[Any] = None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be type FakeLLMModel rather than Any? It gets assigned to self.llm which has type FakeLLModel on line 88

content = chunk + " " if chunk_index < len(chunks) - 1 else chunk
await asyncio.sleep(0)
yield LLMResponseChunk(delta_content=content)
await asyncio.sleep(0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you re-add the comment about why the final sleep is needed (from here)?

from nemoguardrails.testing.fake_model import FakeLLMModel
from nemoguardrails.testing.test_chat import TestChat

__all__ = ["FakeLLMModel", "TestChat"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the other fixtures here too, so import * works? (["fake_llm", "make_fake_llm", "make_test_chat"])

"""
if llm is not None:
self.llm = llm
elif llm_completions is not None or llm_exception is not None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a unit-test where llm_completions is None but llm_exception isn't to check the second part of this elif?


@pytest.mark.asyncio
async def test_fake_llm_model_streaming_yields_chunks():
llm = FakeLLMModel(responses=["one two three"])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This llm object has streaming set to False (the default). As in earlier comment, either remove the streaming arg and let clients call stream_async() or generate_async() however they want, or keep it and raise if they try to stream when it's not enabled

@@ -0,0 +1,110 @@
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a better name for this file? Public Surface doesn't make much sense. Maybe test_fake_fixtures.py (hints at FakeLLM or FakeChat if it's renamed to that) or something similar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants