feat(testing): add public testing surface under nemoguardrails.testing#1860
feat(testing): add public testing surface under nemoguardrails.testing#1860
Conversation
Documentation preview |
Greptile SummaryThis PR promotes
|
| Filename | Overview |
|---|---|
| nemoguardrails/testing/fake_model.py | New FakeLLMModel with fixed stream_async (content or ""), asyncio.sleep(0), and generate_async copy. One minor invariant issue: self.responses populated from llm_responses.content can introduce None entries. |
| nemoguardrails/testing/test_chat.py | TestChat promoted to public surface; llm_exception-only path now correctly creates FakeLLMModel, config parameter narrowed to RailsConfig only. |
| nemoguardrails/testing/fixtures.py | Pytest fixtures for FakeLLMModel and TestChat. The _factory config parameter is still typed Union[str, RailsConfig] while TestChat only accepts RailsConfig (previously flagged). |
| tests/utils.py | Reduced to a thin compatibility shim re-exporting FakeLLMModel and TestChat; backwards-compatible. |
| tests/testing/test_public_surface.py | New test suite covering public surface imports, shim compatibility, FakeLLMModel ordering/streaming, TestChat round-trip, and all three fixtures. |
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
nemoguardrails/testing/fake_model.py:69
`self.responses` is silently populated with `None` entries when `llm_responses` is the source of truth. When any `LLMResponse` in `llm_responses` has `content=None` (e.g. a tool-call-only response), the list comprehension produces `[None, ...]`. Downstream code that inspects `fake.responses` relies on this attribute containing valid strings. A cleaner invariant is to leave `self.responses` empty when `llm_responses` is provided so that callers who inspect it for plain-string debugging don't see a misleading mix of strings and Nones.
```suggestion
self.responses = responses if responses is not None else []
```
Reviews (4): Last reviewed commit: "apply review suggestions" | Re-trigger Greptile
📝 WalkthroughWalkthroughThis PR introduces a new public testing package ChangesTesting Public Surface
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/user-guides/testing-your-config.md (1)
295-295:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd missing newline at end of file.
The CI pipeline failed because the
end-of-file-fixerpre-commit hook detected a missing trailing newline. Add a blank line after the closing code fence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/user-guides/testing-your-config.md` at line 295, The file docs/user-guides/testing-your-config.md is missing a trailing newline after the closing code fence; open the file and add a single blank line (a newline character) after the final ``` closing fence so the file ends with a newline to satisfy the end-of-file-fixer hook.
🧹 Nitpick comments (1)
tests/utils.py (1)
28-40: 💤 Low valueGood backward-compatibility shim.
Re-exporting from the new canonical locations preserves existing imports across 100+ tests.
Static analysis flags that
__all__is unsorted. Consider sorting alphabetically for consistency:♻️ Optional: Sort __all__
__all__ = [ "FakeLLMModel", "TestChat", + "any_event_conforms", "clean_events", "event_conforms", "event_sequence_conforms", - "any_event_conforms", "is_data_in_events", ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/utils.py` around lines 28 - 40, The __all__ export list is unsorted; please alphabetize the string entries in the __all__ list (e.g., reorder items so "FakeLLMModel", "TestChat", "any_event_conforms", "clean_events", "event_conforms", "event_sequence_conforms", "is_data_in_events" or whichever canonical alphabetical order you choose) to keep exports consistent and satisfy static analysis — update the __all__ variable in tests/utils.py accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nemoguardrails/testing/fake_model.py`:
- Around line 118-121: The stream_async implementation unconditionally splits
response.content which will crash if a scripted LLMResponse is tool-call-only
(content is None/non-string); update stream_async to guard the value returned by
_next_response(): check response.content is a non-empty string (e.g.,
isinstance(response.content, str) and response.content) before calling .split("
"), and otherwise treat it as no-text (set chunks = [] or an equivalent no-op
stream) so the method safely handles tool-call-only responses; keep references
to stream_async and _next_response when making the change.
In `@nemoguardrails/testing/test_chat.py`:
- Around line 34-35: The zip of events and event_data should be explicit about
length equality to satisfy Ruff B905: update the loop `for event, data in
zip(events, event_data):` to call zip with strict=True (i.e., `zip(events,
event_data, strict=True)`) so it will raise if lengths differ; this is safe
because length equality is already being checked and makes the intent explicit
for the variables events and event_data.
- Around line 61-63: The constructor parameter config for TestChat is typed
Union[str, RailsConfig] but later code accesses config.models and
self.config.colang_version; normalize config to a RailsConfig at the start of
__init__ by checking isinstance(config, str) and calling
RailsConfig.from_path(config) (or leave as-is if already a RailsConfig) so that
self.config is always a RailsConfig before any access; update the __init__ code
paths that reference self.config.models and self.config.colang_version to rely
on this normalized self.config.
- Around line 159-169: The test is enqueuing the same UtteranceBotActionStarted
event twice, causing duplicate state transitions; in the method where
self.input_events is appended (look for the block that calls
new_event_dict("UtteranceBotActionStarted", action_uid=event["action_uid"])
inside the test helper in nemoguardrails/testing/test_chat.py), remove the
duplicated append so the UtteranceBotActionStarted event is only added once to
self.input_events (keep a single call that constructs
new_event_dict("UtteranceBotActionStarted", action_uid=event["action_uid"]) and
delete the second identical append).
---
Outside diff comments:
In `@docs/user-guides/testing-your-config.md`:
- Line 295: The file docs/user-guides/testing-your-config.md is missing a
trailing newline after the closing code fence; open the file and add a single
blank line (a newline character) after the final ``` closing fence so the file
ends with a newline to satisfy the end-of-file-fixer hook.
---
Nitpick comments:
In `@tests/utils.py`:
- Around line 28-40: The __all__ export list is unsorted; please alphabetize the
string entries in the __all__ list (e.g., reorder items so "FakeLLMModel",
"TestChat", "any_event_conforms", "clean_events", "event_conforms",
"event_sequence_conforms", "is_data_in_events" or whichever canonical
alphabetical order you choose) to keep exports consistent and satisfy static
analysis — update the __all__ variable in tests/utils.py accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 502f34b0-e293-4d83-9ecf-6c5817f8558d
📒 Files selected for processing (10)
docs/user-guides/index.rstdocs/user-guides/testing-your-config.mdnemoguardrails/testing/__init__.pynemoguardrails/testing/fake_model.pynemoguardrails/testing/fixtures.pynemoguardrails/testing/test_chat.pytests/testing/__init__.pytests/testing/conftest.pytests/testing/test_public_surface.pytests/utils.py
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Promote FakeLLMModel and TestChat from tests/utils.py to a public nemoguardrails.testing subpackage so downstream users can test their guardrails configurations without copying internal test helpers. - nemoguardrails.testing.fake_model.FakeLLMModel: framework-agnostic fake LLM that implements the LLMModel protocol. - nemoguardrails.testing.test_chat.TestChat: ergonomic helper for user/bot conversation assertions. - nemoguardrails.testing.fixtures: opt-in pytest plugin exposing fake_llm, make_fake_llm, and make_test_chat fixtures. tests/utils.py is reduced to a thin compatibility shim that re-exports from the new package, so the 100+ existing tests that import from there continue to work unchanged. Adds tests/testing/test_public_surface.py covering the new exports and fixtures, plus docs/user-guides/testing-your-config.md describing the three supported usage patterns.
c27e8e4 to
dd2b046
Compare
dd2b046 to
24d6c65
Compare
tgasser-nv
left a comment
There was a problem hiding this comment.
Looks good! Mostly nits and cleanups with some renaming needed before merging.
At a higher-level, what's the purpose of making these test fixtures more publicly accessible? Is it for external contributors to write their own LLMModel-compliant classes?
| """ | ||
|
|
||
| def _factory( | ||
| config: Union[str, RailsConfig], |
There was a problem hiding this comment.
The config here has type of Union[str, RailsConfig], but it's passed into the TestChat init which expects a RailsConfig only. If you want to support string configs then it needs to be detected and converted to a RailsConfig before creating the TestChat
|
|
||
| """ | ||
|
|
||
| __test__ = False |
There was a problem hiding this comment.
I know this removes the file from pytest automatic test discovery. But anyone using this would expect a test_chat.py file to contain unit tests for chat. Recommend renaming to anything without a Test suffix, ChatHarness, FakeChat maybe? (for symmetry with FakeLLM?)
| else: | ||
| self._llm_responses = [] | ||
| self.responses = responses or [response.content for response in self._llm_responses] | ||
| self.i = 0 |
There was a problem hiding this comment.
Can you rename i to something more descriptive like inference_count or similar? On a public interface it's not obvious what this is counting from the name alone
| config: RailsConfig, | ||
| llm_completions: Optional[List[str]] = None, | ||
| streaming: bool = False, | ||
| llm_exception: Optional[Exception] = None, |
There was a problem hiding this comment.
nit: Should this be renamed exception to match the fake_model.py `exception field? They're both the same type and it gets passed into the FakeLLMModel init as-is without any changes
| streaming: bool = False, | ||
| llm_exception: Optional[Exception] = None, | ||
| token_usage: Optional[List[Dict[str, int]]] = None, | ||
| llm: Optional[Any] = None, |
There was a problem hiding this comment.
Should this be type FakeLLMModel rather than Any? It gets assigned to self.llm which has type FakeLLModel on line 88
| content = chunk + " " if chunk_index < len(chunks) - 1 else chunk | ||
| await asyncio.sleep(0) | ||
| yield LLMResponseChunk(delta_content=content) | ||
| await asyncio.sleep(0) |
There was a problem hiding this comment.
Could you re-add the comment about why the final sleep is needed (from here)?
| from nemoguardrails.testing.fake_model import FakeLLMModel | ||
| from nemoguardrails.testing.test_chat import TestChat | ||
|
|
||
| __all__ = ["FakeLLMModel", "TestChat"] |
There was a problem hiding this comment.
Could you add the other fixtures here too, so import * works? (["fake_llm", "make_fake_llm", "make_test_chat"])
| """ | ||
| if llm is not None: | ||
| self.llm = llm | ||
| elif llm_completions is not None or llm_exception is not None: |
There was a problem hiding this comment.
Could you add a unit-test where llm_completions is None but llm_exception isn't to check the second part of this elif?
|
|
||
| @pytest.mark.asyncio | ||
| async def test_fake_llm_model_streaming_yields_chunks(): | ||
| llm = FakeLLMModel(responses=["one two three"]) |
There was a problem hiding this comment.
This llm object has streaming set to False (the default). As in earlier comment, either remove the streaming arg and let clients call stream_async() or generate_async() however they want, or keep it and raise if they try to stream when it's not enabled
| @@ -0,0 +1,110 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
nit: Is there a better name for this file? Public Surface doesn't make much sense. Maybe test_fake_fixtures.py (hints at FakeLLM or FakeChat if it's renamed to that) or something similar
Description
Promote FakeLLMModel and TestChat from tests/utils.py to a public nemoguardrails.testing subpackage so downstream users can test their guardrails configurations without copying internal test helpers (dogfooding).
tests/utils.py is reduced to a thin compatibility shim that re-exports from the new package, so the 100+ existing tests that import from there continue to work unchanged.
Adds tests/testing/test_public_surface.py covering the new exports and fixtures, plus docs/user-guides/testing-your-config.md describing the three supported usage patterns.
Related Issue(s)
testing subpackage referenced in #1857
Summary by CodeRabbit
Release Notes
New Features
Documentation