Skip to content

Conversation

lruanova
Copy link

@lruanova lruanova commented Jun 6, 2025

Beam search was reversing multi-character tokens when reconstructing the decoded sequence from the beam search path. This happened because the decoded string was built by reversing the output of the suffix tree traversal, which assumes each label corresponds to a single character. When using vocabularies with multi-character tokens (e.g. "TH", "XY"), this leads to incorrect outputs like "HT" instead of "TH". This is a minimal patch to ensure that multi-letter labels remain in their original order.

Changes:

  • In search.rs and duplex.rs, removed the .chars().rev() step when building the final decoded sequence from the beam search path. The sequence is now reconstructed directly from the ordered list of labels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant