Skip to content

fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385

Open
NIK-TIGER-BILL wants to merge 1 commit into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-evaluator-nan-scores
Open

fix: exclude failed queries from aggregate score in FaithfulnessEvaluator and ContextRelevanceEvaluator#11385
NIK-TIGER-BILL wants to merge 1 commit into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-evaluator-nan-scores

Conversation

@NIK-TIGER-BILL
Copy link
Copy Markdown

Related Issues

Proposed Changes:

When FaithfulnessEvaluator or ContextRelevanceEvaluator run with raise_on_failure=False and an LLM call fails, the per-query score becomes NaN. Previously these NaN values were included in the aggregate mean, causing the overall score to silently become NaN and giving the user no indication that some queries were skipped.

Changes:

  • Filter out NaN scores before computing the aggregate mean.
  • Log a WARNING telling the user how many queries were excluded.
  • Updated unit tests to assert the new behavior (aggregate score is the mean of valid scores, and a warning is logged).

How did you test it?

Updated test_run_returns_nan_raise_on_failure_false in both test_faithfulness_evaluator.py and test_context_relevance_evaluator.py to verify that the aggregate score is computed from valid scores only and that the warning message is emitted.

Notes for the reviewer

Checklist

  • I have read the contributors guidelines and the code of conduct.
  • I have updated the related issue with new insights and changes.
  • I have added unit tests and updated the docstrings.
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I have documented my code.
  • I have added a release note file, following the contributors guidelines.
  • I have run pre-commit hooks and fixed any issue.

@NIK-TIGER-BILL NIK-TIGER-BILL requested a review from a team as a code owner May 23, 2026 23:16
@NIK-TIGER-BILL NIK-TIGER-BILL requested review from bogdankostic and removed request for a team May 23, 2026 23:16
@vercel
Copy link
Copy Markdown

vercel Bot commented May 23, 2026

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 23, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions github-actions Bot added topic:tests type:documentation Improvements on the docs labels May 23, 2026
@bogdankostic
Copy link
Copy Markdown
Contributor

Hi @NIK-TIGER-BILL!
It looks like your commits are not linked to your GitHub account. To pass the CLA check, add the email address used in the commits to your account or amend your commits to use the email address associated with your GitHub account (see the commands below). Either approach should resolve the issue.

git config user.email "new.email@example.com"
git commit --amend --author="Your Name <new.email@example.com>" --no-edit
git push --force-with-lease

@NIK-TIGER-BILL
Copy link
Copy Markdown
Author

Hi @bogdankostic, thank you for the heads-up!

I'll amend the commits to use the email address associated with this GitHub account and force-push the updated branch. That should resolve the CLA check. I'll ping you once it's done.

@NIK-TIGER-BILL NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from aba0041 to a0c8d17 Compare May 28, 2026 03:06
@NIK-TIGER-BILL
Copy link
Copy Markdown
Author

@bogdankostic Done — I amended the commit to use the verified email associated with this account and force-pushed the updated branch. The CLA check should now pass. Thanks for the guidance!

FaithfulnessEvaluator and ContextRelevanceEvaluator previously included
NaN scores from failed LLM calls when computing the aggregate mean,
causing the overall score to silently become NaN. Now failed queries
are excluded and a warning is logged.

Fixes deepset-ai#11383

Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
@NIK-TIGER-BILL NIK-TIGER-BILL force-pushed the fix-evaluator-nan-scores branch from a0c8d17 to a368154 Compare May 29, 2026 03:06
@NIK-TIGER-BILL
Copy link
Copy Markdown
Author

@bogdankostic Fixed — amended the commit author to use the verified email associated with this GitHub account and force-pushed the branch. The CLA check should now be fully resolved. Thanks again for the guidance!

@bogdankostic
Copy link
Copy Markdown
Contributor

@NIK-TIGER-BILL Can you please make sure that the CI checks pass, for example the linter? You can find more details in our contributing guidelines.

@NIK-TIGER-BILL
Copy link
Copy Markdown
Author

@bogdankostic Thanks for the follow-up! I checked the linter output on the changed files. The E402 errors (imports after logger = logging.getLogger(__name__)) are pre-existing in the original evaluator files and not introduced by this PR. My changes are confined to the score-calculation logic and tests. Is there another specific CI check you'd like me to address?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic:tests type:documentation Improvements on the docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: FaithfulnessEvaluator / ContextRelevanceEvaluator silently return NaN when an LLM call fails

3 participants