Modified beam_search() to work with multi-character tokens #53

lruanova · 2025-06-06T17:08:05Z

Beam search was reversing multi-character tokens when reconstructing the decoded sequence from the beam search path. This happened because the decoded string was built by reversing the output of the suffix tree traversal, which assumes each label corresponds to a single character. When using vocabularies with multi-character tokens (e.g. "TH", "XY"), this leads to incorrect outputs like "HT" instead of "TH". This is a minimal patch to ensure that multi-letter labels remain in their original order.

Changes:

In search.rs and duplex.rs, removed the .chars().rev() step when building the final decoded sequence from the beam search path. The sequence is now reconstructed directly from the ordered list of labels.

Modified beam_search() to work with multi-character tokens

134398b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modified beam_search() to work with multi-character tokens #53

Modified beam_search() to work with multi-character tokens #53

Uh oh!

lruanova commented Jun 6, 2025

Uh oh!

Uh oh!

Modified beam_search() to work with multi-character tokens #53

Are you sure you want to change the base?

Modified beam_search() to work with multi-character tokens #53

Uh oh!

Conversation

lruanova commented Jun 6, 2025

Uh oh!

Uh oh!