Skip to content

Project input #311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

Project input #311

wants to merge 13 commits into from

Conversation

FredHaa
Copy link

@FredHaa FredHaa commented Aug 14, 2025

What does this PR do ?

This PR aims to add a feature which enables projection of the input to the output string.

E.g., when project_input is enabled, running ITN on the string the road is one kilometer long produces the output the road is [1 km][one kilometer] long. Here the content in the left square bracket is the inverse normalized output, and the content in the right bracket which lead to the output.

This is useful in e.g. speech pipelines where ITN is used for processing the output of an ASR model, and correct word level timestamps are required of the processed output. While it is possible to align the input with the output using the fst_alignment script in the repo, this is not as robust as directly computing the input, together with the output, using the fst.

All tests aren't currently passing. I have focused on Swedish and English, as those are the languages I need myself, but will also go through the rest of the languages if the PR is of value, and it will allow it to be merged.

The method currently relies on a custom input tag, which isn't supported by sparrowhawk. I would like to add sparrowhawk support, but currently am not sure how.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

FredHaa and others added 13 commits August 14, 2025 17:26
Signed-off-by: Frederik Haarslev <[email protected]>
Signed-off-by: Frederik Haarslev <[email protected]>
Signed-off-by: Frederik Haarslev <[email protected]>
Signed-off-by: Frederik Haarslev <[email protected]>
Signed-off-by: Frederik Haarslev <[email protected]>
Signed-off-by: Frederik Haarslev <[email protected]>
…upport input_projection

Signed-off-by: Frederik Haarslev <[email protected]>
* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258)

* Future Implementations for classes - Measure, Money, and Date

Signed-off-by: Namrata Gachchi <[email protected]>

* Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases.

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed the unused empty string implementation

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* minor fixes for the tagger files

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reformatted decimal final graph

Signed-off-by: Namrata Gachchi <[email protected]>

* incorporated the suggestion for decimal graph

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Century implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* Working on the yyyy format for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* reverted yyyy code

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on future implementations

Signed-off-by: Namrata Gachchi <[email protected]>

* working on improving the date class accuracy

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added year prefix for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on the commma cases for date class

Signed-off-by: Namrata Gachchi <[email protected]>

* minor fixes

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* implemented mixed fractions

Signed-off-by: Namrata Gachchi <[email protected]>

* rectified the test case

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* working on quarterly measurements

Signed-off-by: Namrata Gachchi <[email protected]>

* reformatted the prefixes and suffixes for date tagger class

Signed-off-by: Namrata Gachchi <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* replaced text tag with era tag for the date class

Signed-off-by: Namrata Gachchi <[email protected]>

* Removed the text tag reference from date class verbalizer

Signed-off-by: Namrata Gachchi <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Potential fix for code scanning alert no. 821: Unused local variable

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Mariana <[email protected]>

---------

Signed-off-by: Namrata Gachchi <[email protected]>
Signed-off-by: Mariana Graterol Fuenmayor <[email protected]>
Signed-off-by: Mariana <[email protected]>
Co-authored-by: Namrata Gachchi <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Signed-off-by: Frederik Haarslev <[email protected]>
for more information, see https://pre-commit.ci

Signed-off-by: Frederik Haarslev <[email protected]>
@FredHaa FredHaa closed this Aug 14, 2025
@FredHaa FredHaa deleted the project-input branch August 14, 2025 15:33
@FredHaa
Copy link
Author

FredHaa commented Aug 14, 2025

Closed because of missing commit signatures. Will create a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants