Skip to content

fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with multiple stop_words#11413

Open
NIK-TIGER-BILL wants to merge 1 commit into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-hf-local-stop-words-cross-product
Open

fix: prevent N×M reply explosion in HuggingFaceLocalGenerator with multiple stop_words#11413
NIK-TIGER-BILL wants to merge 1 commit into
deepset-ai:mainfrom
NIK-TIGER-BILL:fix-hf-local-stop-words-cross-product

Conversation

@NIK-TIGER-BILL
Copy link
Copy Markdown

Related Issues

Proposed Changes:

The list comprehension in HuggingFaceLocalGenerator.run() used two for clauses to remove stop words from replies:

replies = [reply.replace(stop_word, "").rstrip() for reply in replies for stop_word in self.stop_words]

This creates a cross-product: with N replies and M stop words the output contains N×M replies instead of N. Half of the extra replies still contain the stop word, and downstream components receive an unexpected number of replies.

The fix replaces the comprehension with an explicit outer loop over stop_words, matching the already-correct implementation in HuggingFaceChatLocalGenerator (chat/hugging_face_local.py, line 654):

for stop_word in self.stop_words:
    replies = [reply.replace(stop_word, "").rstrip() for reply in replies]

How did you test it?

  • Added a new unit test test_run_multiple_stop_words_removal that mocks a pipeline returning 2 replies with 2 stop words configured.
  • Before the fix the test would produce 4 replies; after the fix it correctly returns 2 replies with both stop words stripped.
  • Verified the exact logic with a standalone Python snippet.

Notes for the reviewer

This is a minimal, surgical change. The sibling chat generator already uses the same sequential approach, so this only brings the non-chat generator in line with the existing pattern.

Checklist

This PR was fully generated with an AI assistant. I have reviewed the changes and run the relevant tests.

…ltiple stop_words

The list comprehension
  [reply.replace(stop_word, "").rstrip() for reply in replies for stop_word in self.stop_words]
creates a cross-product: N replies × M stop_words produces N×M outputs.
Half of the extra replies still contain the stop word, and downstream code
receives an unexpected number of replies.

Replace the comprehension with an explicit outer loop over stop_words so
that each stop_word is stripped from every reply sequentially, preserving
the invariant N replies → N replies out.

Fixes deepset-ai#11409

Signed-off-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
@NIK-TIGER-BILL NIK-TIGER-BILL requested a review from a team as a code owner May 26, 2026 23:09
@NIK-TIGER-BILL NIK-TIGER-BILL requested review from bogdankostic and removed request for a team May 26, 2026 23:09
@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

Someone is attempting to deploy a commit to the deepset Team on Vercel.

A member of the Team first needs to authorize it.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


NIK-TIGER-BILL seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sjrl sjrl requested a review from julian-risch May 27, 2026 06:06
@julian-risch
Copy link
Copy Markdown
Member

@NIK-TIGER-BILL Thank you for opening this pull request. Would you please agree to our CLA? Otherwise we can't merge this pull request. #11413 (comment)

@NIK-TIGER-BILL
Copy link
Copy Markdown
Author

@NIK-TIGER-BILL Thank you for opening this pull request. Would you please agree to our CLA? Otherwise we can't merge this pull request. #11413 (comment)

Done!
https://cla-assistant.io/deepset-ai/haystack - "You have agreed to the CLA for deepset-ai/haystack"

@julian-risch
Copy link
Copy Markdown
Member

@NIK-TIGER-BILL Your commit in this pull request appears not to be linked to your user account. Could you please fix that? Here are instructions: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/troubleshooting-commits/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

@NIK-TIGER-BILL
Copy link
Copy Markdown
Author

@julian-risch The commit author is already set to NIK-TIGER-BILL <nik.tiger.bill@github.com>. Could the issue be that this email address needs to be verified in my GitHub account settings? I have amended the commits with this email previously. Please let me know if there is anything else I should adjust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: HuggingFaceLocalGenerator returns N×M replies instead of N when stop_words has multiple entries

3 participants