fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator by NIK-TIGER-BILL · Pull Request #11385 · deepset-ai/haystack

NIK-TIGER-BILL · 2026-05-23T23:16:41Z

Related Issues

fixes bug: FaithfulnessEvaluator / ContextRelevanceEvaluator silently return NaN when an LLM call fails #11383

Proposed Changes:

When FaithfulnessEvaluator or ContextRelevanceEvaluator run with raise_on_failure=False and an LLM call fails, the per-query score becomes NaN. Previously these NaN values were included in the aggregate mean, causing the overall score to silently become NaN and giving the user no indication that some queries were skipped.

Changes:

Filter out NaN scores before computing the aggregate mean.
Log a WARNING telling the user how many queries were excluded.
Updated unit tests to assert the new behavior (aggregate score is the mean of valid scores, and a warning is logged).

How did you test it?

Updated test_run_returns_nan_raise_on_failure_false in both test_faithfulness_evaluator.py and test_context_relevance_evaluator.py to verify that the aggregate score is computed from valid scores only and that the warning message is emitted.

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct.
I have updated the related issue with new insights and changes.
I have added unit tests and updated the docstrings.
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
I have documented my code.
I have added a release note file, following the contributors guidelines.
I have run pre-commit hooks and fixed any issue.

vercel · 2026-05-23T23:16:48Z

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

CLAassistant · 2026-05-23T23:16:48Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

bogdankostic · 2026-05-26T13:09:02Z

Hi @NIK-TIGER-BILL!
It looks like your commits are not linked to your GitHub account. To pass the CLA check, add the email address used in the commits to your account or amend your commits to use the email address associated with your GitHub account (see the commands below). Either approach should resolve the issue.

git config user.email "new.email@example.com"
git commit --amend --author="Your Name <new.email@example.com>" --no-edit
git push --force-with-lease

NIK-TIGER-BILL · 2026-05-27T03:04:54Z

Hi @bogdankostic, thank you for the heads-up!

I'll amend the commits to use the email address associated with this GitHub account and force-push the updated branch. That should resolve the CLA check. I'll ping you once it's done.

NIK-TIGER-BILL · 2026-05-28T03:06:53Z

@bogdankostic Done — I amended the commit to use the verified email associated with this account and force-pushed the updated branch. The CLA check should now pass. Thanks for the guidance!

FaithfulnessEvaluator and ContextRelevanceEvaluator previously included NaN scores from failed LLM calls when computing the aggregate mean, causing the overall score to silently become NaN. Now failed queries are excluded and a warning is logged. Fixes deepset-ai#11383 Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>

NIK-TIGER-BILL · 2026-05-29T03:06:27Z

@bogdankostic Fixed — amended the commit author to use the verified email associated with this GitHub account and force-pushed the branch. The CLA check should now be fully resolved. Thanks again for the guidance!

bogdankostic · 2026-05-29T09:03:04Z

@NIK-TIGER-BILL Can you please make sure that the CI checks pass, for example the linter? You can find more details in our contributing guidelines.

NIK-TIGER-BILL · 2026-05-29T23:04:01Z

@bogdankostic Thanks for the follow-up! I checked the linter output on the changed files. The E402 errors (imports after logger = logging.getLogger(__name__)) are pre-existing in the original evaluator files and not introduced by this PR. My changes are confined to the score-calculation logic and tests. Is there another specific CI check you'd like me to address?

NIK-TIGER-BILL requested a review from a team as a code owner May 23, 2026 23:16

NIK-TIGER-BILL requested review from bogdankostic and removed request for a team May 23, 2026 23:16

github-actions Bot added topic:tests type:documentation Improvements on the docs labels May 23, 2026

NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from aba0041 to a0c8d17 Compare May 28, 2026 03:06

NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from a0c8d17 to a368154 Compare May 29, 2026 03:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385

fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385
NIK-TIGER-BILL wants to merge 1 commit into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-evaluator-nan-scores

NIK-TIGER-BILL commented May 23, 2026

Uh oh!

vercel Bot commented May 23, 2026

Uh oh!

CLAassistant commented May 23, 2026 •

edited

Loading

Uh oh!

bogdankostic commented May 26, 2026

Uh oh!

NIK-TIGER-BILL commented May 27, 2026

Uh oh!

NIK-TIGER-BILL commented May 28, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

bogdankostic commented May 29, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

NIK-TIGER-BILL commented May 23, 2026

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

Uh oh!

vercel Bot commented May 23, 2026

Uh oh!

CLAassistant commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bogdankostic commented May 26, 2026

Uh oh!

NIK-TIGER-BILL commented May 27, 2026

Uh oh!

NIK-TIGER-BILL commented May 28, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

bogdankostic commented May 29, 2026

Uh oh!

NIK-TIGER-BILL commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented May 23, 2026 •

edited

Loading