NVIDIA-NeMo · lipikaramaswamy · May 8, 2026 · May 9, 2026 · May 9, 2026 · May 9, 2026
@@ -103,6 +103,9 @@ CLAUDE.local.md
 .claude/settings.local.json
 ai/tmp/
 
+# Claude worktrees
+.claude/worktrees/
+
 # Anonymizer execution artifacts
 .anonymizer-artifacts/
 docs/notebook_source/data/synth_bios_sample10_anonymized.csv

@@ -0,0 +1,120 @@
+<!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# AGENTS.md
+
+This file is for agents **developing** NeMo Anonymizer — the codebase you are working in.
+If you are an agent helping a user **anonymize data**, use the [product documentation](https://nvidia-nemo.github.io/Anonymizer/) instead.
+
+**NeMo Anonymizer** detects and protects PII through context-aware entity replacement and LLM-powered rewriting. Users supply a text dataset and a strategy; Anonymizer detects entities and transforms the text.
+
+## Module Map
+
+`nemo-anonymizer` is a single package with three modules:
+
+- **`anonymizer.config`** — user-facing configuration: `AnonymizerConfig`, `AnonymizerInput`, replace strategies (`Substitute`, `Redact`, `Annotate`, `Hash`), and rewrite config (`Rewrite`, `EvaluationCriteria`, `RiskTolerance`). New user-facing knobs go here.
+- **`anonymizer.engine`** — internal pipeline implementation: detection, replacement, and rewrite sub-workflows, the NDD adapter, prompt utilities, and all `COL_*` column constants. Never imported directly by users.
+- **`anonymizer.interface`** — user-facing entry points: the `Anonymizer` class, CLI, `AnonymizerResult`, `PreviewResult`, and canonical error types. Thin layer that wires config → engine and exposes results.
+
+NeMo Anonymizer wraps [DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) (NDD) for LLM column generation. `NddAdapter` is the only place this dependency crosses — engine sub-workflows declare NDD column configs and hand them to the adapter, which manages DataDesigner internally.
+
+## Core Concepts
+
+- **Entity** — a detected span of text with a label (e.g. `"Alice"` → `first_name`) and character offsets
+- **Latent entity** — an entity detected in rewrite mode that is sensitive but not directly named; used to guide rewriting without explicit replacement
+- **Replacement map** — a per-record dict mapping entity text → substitute value, built by `LlmReplaceWorkflow` and injected into rewrite prompts
+- **Leakage mass** — a weighted score measuring how much sensitive information survives in a rewritten record; drives the repair loop
+- **Utility score** — a 0–1 score measuring how much semantic content the rewritten record preserves
+- **RiskTolerance** — a preset (`minimal` / `low` / `moderate` / `high`) that bundles the leakage threshold, repair behaviour, and human-review flags into a single user-facing knob
+- **Repair loop** — the evaluate → repair → re-evaluate cycle in `RewriteWorkflow`; runs up to `max_repair_iterations` times on failing rows
+- **FailedRecord** — a record that was dropped by an NDD workflow; surfaced explicitly rather than silently lost
+
+## Pipelines
+
+### Replace mode — `AnonymizerConfig(replace=...)`
+
+```
+input_df
+  → EntityDetectionWorkflow.run()              # engine/detection/detection_workflow.py
+        GLiNER detection
+        → parse + tag
+        → LLM augmentation  (add entities GLiNER missed)
+        → LLM validation    (keep / drop candidates)
+        → merge + finalize  → COL_DETECTED_ENTITIES, COL_FINAL_ENTITIES
+  → ReplacementWorkflow.run()                  # engine/replace/replace_runner.py
+        Redact / Annotate / Hash  → applied locally, no LLM
+        Substitute                → LlmReplaceWorkflow → NddAdapter
+  → output: {text_col}_replaced, {text_col}_with_spans, final_entities
+```
+
+### Rewrite mode — `AnonymizerConfig(rewrite=...)`
+
+```
+input_df
+  → EntityDetectionWorkflow.run()              # same as above, plus latent entity tagging
+  → RewriteWorkflow.run()                      # engine/rewrite/rewrite_workflow.py
+        LlmReplaceWorkflow.generate_map_only() # build replacement map for prompt
+        → single NDD adapter call (pipeline_columns):
+              DomainClassificationWorkflow    → _domain, _domain_supplement
+              SensitivityDispositionWorkflow  → _sensitivity_disposition
+              QAGenerationWorkflow            → _quality_qa, _privacy_qa
+              RewriteGenerationWorkflow       → _rewritten_text
+        → evaluate-repair loop (up to max_repair_iterations):
+              EvaluateWorkflow                → leakage_mass, utility_score, _needs_repair
+              RepairWorkflow                  → _rewritten_text (failing rows only)
+        → FinalJudgeWorkflow (non-critical)   → _judge_evaluation, needs_human_review
+  → output: {text_col}_rewritten, utility_score, leakage_mass, needs_human_review, …
+```
+
+Records with no detected entities skip all LLM sub-workflows and pass through with default metrics (utility=1.0, leakage=0.0).
+
+## Config Pattern
+
+`AnonymizerConfig.rewrite` is the user-facing `Rewrite` model. The engine never receives `Rewrite` directly — it receives `EvaluationCriteria` via the `Rewrite.evaluation` property.
+
+`Rewrite` and `EvaluationCriteria` both hold `max_repair_iterations`. They must stay in sync:
+
+- `Rewrite.max_repair_iterations` is the user-facing field (default 3)
+- `Rewrite.evaluation` constructs `EvaluationCriteria(risk_tolerance=..., max_repair_iterations=self.max_repair_iterations)`
+- **Never construct `EvaluationCriteria` with hardcoded values** — always go through `Rewrite.evaluation`
+
+Leakage thresholds and repair parameters are derived from `RiskTolerance` via `_RiskToleranceBundle` in `config/rewrite.py`. Don't hardcode them elsewhere.
+
+## NDD Adapter
+
+`NddAdapter.run_workflow()` (`engine/ndd/adapter.py`) wraps a DataFrame slice + NDD column configs into a DataDesigner run and returns `WorkflowRunResult(dataframe, failed_records)`. Records missing from the output surface as `FailedRecord` objects rather than silently disappearing. Never access DataDesigner directly from engine workflows — always go through `NddAdapter`.
+
+## Prompt Conventions
+
+All column references in NDD prompt templates go through `_jinja()` (`engine/constants.py`) — never format column names directly into strings. Dynamic prompt values use `substitute_placeholders()` (`engine/prompt_utils.py`) with `<<PLACEHOLDER>>` markers; see its docstring for the substitution contract. Prompts are inline triple-quoted strings in the workflow file that uses them; there is no separate registry.
+
+## Structural Invariants
+
+- `from __future__ import annotations` in every Python file
+- Absolute imports only (enforced by ruff `TID`)
+- Type annotations on all functions, methods, and class attributes
+- SPDX license header on every file
+- All column names defined in `engine/constants.py` — never use string literals for column names
+- `COL_TEXT` is the internal name for the input text column; renamed to the user's original column name in final output
+
+## What NOT To Do
+
+- **Don't bypass `Rewrite.evaluation`** — don't construct `EvaluationCriteria` with hardcoded thresholds
+- **Don't call DataDesigner directly** — always go through `NddAdapter.run_workflow()`
+- **Don't use string literals for column names** — use `COL_*` constants from `engine/constants.py`
+- **Don't add a domain to only one supplement map** — see `engine/rewrite/domain_classification.py` for the sync invariant
+- **Don't hardcode `gliner_threshold`** — it belongs in `Detect` config (default 0.3)
+
+## Development
+
+```bash
+make test          # run all tests
+make bootstrap     # install dev dependencies
+make format        # ruff format + sort imports
+make format-check  # read-only lint check (used in CI)
+make typecheck     # ty type check (advisory)
+make docs-serve    # local MkDocs server at http://127.0.0.1:8000
+```
+
+For contributor workflow and branch naming see [CONTRIBUTING.md](CONTRIBUTING.md).
+For code style and naming conventions see [STYLEGUIDE.md](STYLEGUIDE.md).
@@ -0,0 +1,6 @@
+<!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# In ./CLAUDE.md
+
+@AGENTS.md
@@ -208,6 +208,17 @@ The `main` branch has the following protections:
 - All `src` and `tests` files: `@NVIDIA-NeMo/anonymizer-reviewers`
 - All remaining files (`pyproject.toml`, `uv.lock`, `SECURITY.md`, `LICENSE`, `.github/`, etc.): `@NVIDIA-NeMo/anonymizer-maintainers`
 
+### Agent-Assisted Development
+
+If you use Claude Code, Cursor, Codex, or another coding agent, follow the standard [Pull Request Process](#pull-request-process) plus these additions:
+
+1. **For non-trivial changes, draft a plan first.** Non-trivial includes: changes spanning more than one of the `config` / `engine` / `interface` subsystems, introducing a new public API, or modifying an invariant called out in [AGENTS.md](AGENTS.md) or [STYLEGUIDE.md](STYLEGUIDE.md).
+   - Write a markdown file detailing the approach, trade-offs considered, affected subsystems, and delivery strategy — enough for reviewers to evaluate the design before implementation begins. (Have the agent draft it; review and refine before submitting.)
+   - Save it at `plans/<issue-number>/<short-name>.md` and submit it as its own PR for review.
+   - Once the plan is approved, implement it in a follow-up PR.
+
+2. **Implement following [AGENTS.md](AGENTS.md) and [STYLEGUIDE.md](STYLEGUIDE.md).** Both capture pipeline structure, naming conventions, and invariants ruff and ty cannot enforce. The agent should read these before non-trivial changes.
+
 ## Issues and Discussions
 
 ### Issue Templates

@@ -0,0 +1,174 @@
+<!-- SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
+# Style Guide
+
+Conventions for NeMo Anonymizer that ruff and ty cannot enforce. Read before adding a new module, workflow, or config class.
+
+NeMo Anonymizer wraps [DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) (NDD) for LLM column generation. References to NDD below mean that library.
+
+For architecture and pipeline identity, see [AGENTS.md](AGENTS.md).
+For contribution workflow and branch naming, see [CONTRIBUTING.md](CONTRIBUTING.md).
+
+---
+
+## Pydantic vs Dataclasses
+
+**Pydantic** for config, validation, and serialization. **Dataclasses** for simple typed containers in the engine.
+
+| Need | Use |
+|------|-----|
+| User-facing config, validation, JSON schema | `BaseModel` |
+| Internal result type, frozen value object | `@dataclass(frozen=True)` |
+
+```python
+# Config — Pydantic
+class Detect(BaseModel):
+    gliner_threshold: float = Field(default=0.3, ge=0.0, le=1.0)
+
+# Internal result — dataclass
+@dataclass(frozen=True)
+class WorkflowRunResult:
+    dataframe: pd.DataFrame
+    failed_records: list[FailedRecord]
+```
+
+Use `Field()` only when you need constraints (`ge`, `le`), descriptions, or `default_factory`. Use bare defaults for simple flags and strings.
+
+---
+
+## Error Handling
+
+Wrap exceptions from NDD and other third-party calls at module boundaries into canonical types from `interface/errors.py`. Callers should never see raw NDD exceptions.
+
+Preserve the traceback:
+
+```python
+# Good
+try:
+    run_results = self._data_designer.create(...)
+except Exception as exc:
+    raise AnonymizerWorkflowError(f"Workflow failed: {exc}") from exc
+
+# Bad — swallows the traceback
+except Exception as exc:
+    raise AnonymizerWorkflowError("Workflow failed")
+```
+
+Don't use defensive `try/except` on trusted internal calls that shouldn't fail — only catch at module boundaries. The final judge step is the intentional exception: it's explicitly non-critical and catches broadly, logging with `exc_info=True` and substituting safe defaults.
+
+**Error messages** must identify the actual bad value. Use `!r` to make interpolated values unambiguous:
+
+```python
+# Good
+raise ValueError(f"Unsupported strategy: {strategy!r}")
+
+# Bad
+raise ValueError("Invalid strategy")
+```
+
+**No `assert` for validation** — `assert` statements are stripped when Python runs with `-O`. Use `if/raise` instead:
+
+```python
+# Good
+if not isinstance(config, AnonymizerConfig):
+    raise TypeError(f"Expected AnonymizerConfig, got {type(config)!r}")
+
+# Bad
+assert isinstance(config, AnonymizerConfig)
+```
+
+---
+
+## Column Names
+
+All column names are constants in `engine/constants.py`. Never use string literals for column names.
+
+```python
+# Good
+df[COL_DETECTED_ENTITIES]
+
+# Bad
+df["_detected_entities"]
+```
+
+Internal (intermediate) columns are prefixed with `_`. User-facing output columns use clean names (`final_entities`, `utility_score`). The input text column is always `COL_TEXT` internally and renamed to the user's original column name in `Anonymizer._rename_output_columns()`.
+
+---
+
+## Prompt Construction
+
+**`_jinja(col, key=None)`** from `engine/constants.py` — use for NDD prompt template column references. Never format column names directly into prompt strings; `_jinja` keeps column references grep-able.
+
+```python
+# Good
+f"The text is: {_jinja(COL_TEXT)}"
+
+# Bad
+f"The text is: {{{{ {COL_TEXT} }}}}"
+```
+
+**`substitute_placeholders(template, replacements)`** from `engine/prompt_utils.py` — use for dynamic prompt values. The `<<PLACEHOLDER>>` format avoids collisions with Jinja2 syntax. Never use f-strings or `.format()` for prompt templates with dynamic values; single-pass substitution prevents a replacement value from being interpreted as a placeholder.
+
+Prompts live as inline triple-quoted strings in the workflow file that uses them. There is no separate prompt registry.
+
+---
+
+## Type Annotations
+
+Type annotations are required on all functions, methods, and class attributes including tests.
+
+Use `TYPE_CHECKING` blocks for imports needed *only* in type annotations. This prevents circular imports and avoids loading heavy libraries at import time:
+
+```python
+from typing import TYPE_CHECKING
+
+if TYPE_CHECKING:
+    import pandas as pd
+```
+
+If a module uses `pandas` at runtime — calls `pd.DataFrame`, indexes a DataFrame in a function body, etc. — import it at the top level. A `TYPE_CHECKING` import raises `NameError` if you reference it at runtime. `pandas` is import-time expensive, so keep top-level imports of it limited to modules that genuinely need it.
+
+---
+
+## Code Organization
+
+- Public functions and methods before private (`_`-prefixed) ones within a module or class
+- Define helpers at module or class level — avoid nested functions. Nested functions hide logic, make testing harder, and complicate stack traces. The only acceptable use is a closure that genuinely needs to capture local state.
+
+---
+
+## Naming
+
+- Functions and variables: `snake_case`
+- Classes: `PascalCase`
+- Constants: `UPPER_SNAKE_CASE`
+- Function names start with a verb: `run_workflow`, `build_entity_id`, not `entity_id` or `workflow`
+
+---
+
+## Comments
+
+Only add a comment when the WHY is non-obvious — a hidden constraint, a subtle invariant, a workaround for a specific bug. Don't narrate what the code already says:
+
+```python
+# Good — explains a non-obvious invariant
+# uuid5 is deterministic so input/output IDs match for missing-record tracking.
+
+# Bad — narrates what the code does
+# Loop through the records and append to list
+for record in records:
+    results.append(record)
+```
+
+---
+
+## Future Annotations
+
+Every Python file must include `from __future__ import annotations` after the license header. This defers annotation evaluation, enables forward references, and keeps behavior consistent across the codebase.
+
+---
+
+## Docstrings
+
+Google style (`Args:`, `Returns:`, `Raises:`). Public API classes and methods get docstrings; private helpers (`_`-prefixed) only when the logic is non-obvious. Don't restate the signature — explain why or what, not what the type annotation already says.