feat: add LLM semantic security review for SKILL.md scanner (QEMETRICS-2415) by Benkapner · Pull Request #43 · RHEcosystemAppEng/ABEvalFlow

Benkapner · 2026-06-25T06:35:44Z

Summary

Adds LLM-based semantic security review to the SKILL.md scanner, complementing the deterministic regex checks merged in PR #42. Adapted from harness-eval-lab's security review rubric.

What the LLM checks

Three checks that catch attacks regex cannot detect:

Check	What it catches
Anti-jailbreak	Self-declared safety claims, evaluation bypass attempts
Semantic attacks	Polite reframings of jailbreaks, gradual escalation, split-instruction attacks, conditional triggers
Description-behavior mismatch	Stated purpose vs actual instruction behavior

How it works

Deterministic scan runs first (always)
LLM review runs after, appending findings to the same JSON with source: \"llm\"
Uses llm_client.chat_completion() (same as quality review)
On by default, matching Cisco's use-llm=true convention
Disable with --no-llm flag
If the LLM call fails, deterministic results are preserved

Changes

File	What
`abevalflow/security/skillmd_scanner.py`	Added `SECURITY_REVIEW_PROMPT`, `_extract_json()`, `llm_security_review()`
`abevalflow/gates/security/base.py`	Added `.strip()` to severity parsing to prevent block bypass
`scripts/skillmd_security_scan.py`	Added `--no-llm` flag, LLM review runs by default
`pipeline/tasks/phases/test.yaml`	Updated step 3 with LLM env vars and use-llm flag
`tests/test_skillmd_scanner.py`	Added 4 mock tests for LLM review path

Changes since previous review (PR #39)

This replaces PR #39 (moved from fork to org branch). All review feedback from Guy has been addressed:

Must Fix:

Rebased on latest main (includes merged PR feat: add SKILL.md security scanner gate (QEMETRICS-2414) #42 and lint cleanup fix: resolve ruff linting issues across codebase #41)
Added _extract_json() to strip markdown fences from LLM JSON response before parsing
Added .strip() to severity parsing in base.py to prevent block bypass from trailing whitespace
Added 4 mock tests: valid findings, fenced JSON, LLM failure, invalid JSON

Should Fix:
5. Added content size limit (MAX_TOTAL_CHARS = 40_000) with truncation warning for large submissions
6. Expanded pattern allowlists: added Italian/Arabic/Hindi/Russian/Dutch/Swedish/Turkish to translate evasion; added shields.io and raw.githubusercontent to markdown image

Test plan

76 tests pass (46 deterministic + 4 LLM mock + 26 existing gate registry)
ruff lint + format clean
Run pipeline test script (./scripts/misc/trigger_test_runs.sh)

Ref: QEMETRICS-2415

Replaces PR #39 (moved from fork to org branch per review feedback).">

…S-2415) Add 3 LLM-based security checks adapted from harness-eval-lab rubric: - Anti-jailbreak: self-declared safety claims, evaluation bypass attempts - Semantic attacks: polite reframings, gradual escalation, split-instruction - Description-behavior mismatch: stated purpose vs actual instruction Addresses review feedback: - Strip markdown fences from LLM JSON response - Normalize severity values with .strip() to prevent block bypass - Add content size limit (40k chars) for large submissions - Expand pattern allowlists (Italian/Arabic/Hindi, shields.io, raw.githubusercontent) - Add 4 mock tests for LLM review path (valid, fenced, failure, invalid JSON)

Benkapner assigned GuyZivRH Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add LLM semantic security review for SKILL.md scanner (QEMETRICS-2415)#43

feat: add LLM semantic security review for SKILL.md scanner (QEMETRICS-2415)#43
Benkapner wants to merge 1 commit into
mainfrom
QEMETRICS-2415/security-llm-review

Benkapner commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Benkapner commented Jun 25, 2026

Summary

What the LLM checks

How it works

Changes

Changes since previous review (PR #39)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants