Skip to content

Merge Hindi ITN v2 from staging #318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Merge Hindi ITN v2 from staging #318

wants to merge 5 commits into from

Conversation

mgrafu
Copy link
Collaborator

@mgrafu mgrafu commented Aug 19, 2025

What does this PR do ?

New classes, class improvements

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

tarushi2k2 and others added 5 commits April 3, 2025 15:46
* Addition of whitelist and word classes

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updation of Jenkins date

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Updation

Signed-off-by: Tarushi V <[email protected]>

* Updation

Signed-off-by: Tarushi V <[email protected]>

* Future implementations for date

Signed-off-by: Tarushi V <[email protected]>

* pushing rough date code for ref

Signed-off-by: Tarushi V <[email protected]>

* Future implementations date.py

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updation of Jenkinsfile

Signed-off-by: Tarushi V <[email protected]>

* Telephone.py-hindi itn

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Telephone.py - Hindi ITN

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Telephone modified tagger and verbalizer

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* telephone tagger with 3,4,5 digit std codes

Signed-off-by: Tarushi V <[email protected]>

* Further additions - telephone.py

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Jenkins update

Signed-off-by: Tarushi V <[email protected]>

* Telephone.py

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated tagger-telephone.py

Signed-off-by: Tarushi V <[email protected]>

* Telephone and Jenkinsfile cleanup

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update Jenkins

Signed-off-by: Tarushi V <[email protected]>

---------

Signed-off-by: Tarushi V <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Anand Joseph <[email protected]>
Signed-off-by: Anand Joseph <[email protected]>
…306)

* Addition of whitelist and word classes

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updation of Jenkins date

Signed-off-by: Tarushi V <[email protected]>

* Cleanup

Signed-off-by: Tarushi V <[email protected]>

* Updation

Signed-off-by: Tarushi V <[email protected]>

* Updation

Signed-off-by: Tarushi V <[email protected]>

* Hindi 2.0

Signed-off-by: Tarushi V <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Tarushi V <[email protected]>
Signed-off-by: tarushi2k2 <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@@ -16,21 +16,21 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.utils import get_abs_path
from nemo_text_processing.text_normalization.en.graph_utils import (
from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'apply_fst' is not used.
Import of 'get_abs_path' is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, the unused import statement should be removed from the file. Specifically, delete the line from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path (line 19). This will clean up the code, remove unnecessary dependencies, and improve readability. No other changes are required, as the removal of this import does not affect any functionality in the file.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py b/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py
--- a/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py
@@ -16,7 +16,6 @@
 import pynini
 from pynini.lib import pynutil
 
-from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path
 from nemo_text_processing.text_normalization.en.utils import load_labels
 from nemo_text_processing.text_normalization.hi.graph_utils import (
     INPUT_CASED,
EOF
@@ -16,7 +16,6 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path
from nemo_text_processing.text_normalization.en.utils import load_labels
from nemo_text_processing.text_normalization.hi.graph_utils import (
INPUT_CASED,
Copilot is powered by AI and may make mistakes. Always verify output.
from nemo_text_processing.inverse_text_normalization.hi.utils import get_abs_path
from nemo_text_processing.text_normalization.en.graph_utils import (
from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path
from nemo_text_processing.text_normalization.en.utils import load_labels

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'load_labels' is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, simply remove the unused import statement from the file. Specifically, delete the line from nemo_text_processing.text_normalization.en.utils import load_labels (line 20) in nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py. No other changes are required, as this will not affect the functionality of the code.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py b/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py
--- a/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/taggers/fraction.py
@@ -17,7 +17,6 @@
 from pynini.lib import pynutil
 
 from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path
-from nemo_text_processing.text_normalization.en.utils import load_labels
 from nemo_text_processing.text_normalization.hi.graph_utils import (
     INPUT_CASED,
     INPUT_LOWER_CASED,
EOF
@@ -17,7 +17,6 @@
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path
from nemo_text_processing.text_normalization.en.utils import load_labels
from nemo_text_processing.text_normalization.hi.graph_utils import (
INPUT_CASED,
INPUT_LOWER_CASED,
Copilot is powered by AI and may make mistakes. Always verify output.
@@ -23,7 +23,7 @@
delete_space,
insert_space,
)
from nemo_text_processing.inverse_text_normalization.hi.utils import get_abs_path
from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'apply_fst' is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, simply remove the unused import statement from the file. Specifically, delete line 26: from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path. This will clean up the code, remove unnecessary dependencies, and improve readability. No other changes are required, as the imported names are not used elsewhere in the shown code.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/taggers/measure.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/taggers/measure.py b/nemo_text_processing/inverse_text_normalization/hi/taggers/measure.py
--- a/nemo_text_processing/inverse_text_normalization/hi/taggers/measure.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/taggers/measure.py
@@ -23,7 +23,6 @@
     delete_space,
     insert_space,
 )
-from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path
 
 
 class MeasureFst(GraphFst):
EOF
@@ -23,7 +23,6 @@
delete_space,
insert_space,
)
from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst, get_abs_path


class MeasureFst(GraphFst):
Copilot is powered by AI and may make mistakes. Always verify output.
self.paise = pynutil.insert("fractional_part: \"") + cardinal_graph + pynutil.insert("\"")
self.fraction = decimal_graph
self.currency = pynutil.insert("currency: \"") + currency_graph + pynutil.insert("\" ")
aur = pynutil.delete("और")
delete_hundred = pynutil.delete("सौ")

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable delete_hundred is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, we should remove the assignment to the unused variable delete_hundred on line 55. Since the right-hand side of the assignment (pynutil.delete("सौ")) does not have side effects, it is safe to delete the entire line. No other changes are necessary, as the variable is not referenced elsewhere in the code. The fix should be made in the file nemo_text_processing/inverse_text_normalization/hi/taggers/money.py, specifically on line 55.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/taggers/money.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py b/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py
--- a/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py
@@ -52,7 +52,6 @@
         self.fraction = decimal_graph
         self.currency = pynutil.insert("currency: \"") + currency_graph + pynutil.insert("\" ")
         aur = pynutil.delete("और")
-        delete_hundred = pynutil.delete("सौ")
         delete_lakh = pynutil.delete("लाख")
         delete_hazar = pynutil.delete("हजार") | pynutil.delete("हज़ार")
         delete_crore = pynutil.delete("करोड़") | pynutil.delete("करोड़")
EOF
@@ -52,7 +52,6 @@
self.fraction = decimal_graph
self.currency = pynutil.insert("currency: \"") + currency_graph + pynutil.insert("\" ")
aur = pynutil.delete("और")
delete_hundred = pynutil.delete("सौ")
delete_lakh = pynutil.delete("लाख")
delete_hazar = pynutil.delete("हजार") | pynutil.delete("हज़ार")
delete_crore = pynutil.delete("करोड़") | pynutil.delete("करोड़")
Copilot is powered by AI and may make mistakes. Always verify output.
self.paise = pynutil.insert("fractional_part: \"") + cardinal_graph + pynutil.insert("\"")
self.fraction = decimal_graph
self.currency = pynutil.insert("currency: \"") + currency_graph + pynutil.insert("\" ")
aur = pynutil.delete("और")
delete_hundred = pynutil.delete("सौ")
delete_lakh = pynutil.delete("लाख")
delete_hazar = pynutil.delete("हजार") | pynutil.delete("हज़ार")

Check notice

Code scanning / CodeQL

Unused local variable Note

Variable delete_hazar is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, the unused variable assignment should be removed. This means deleting the line that assigns a value to delete_hazar (line 57) in nemo_text_processing/inverse_text_normalization/hi/taggers/money.py. This change will not affect any existing functionality, as the variable is not used anywhere in the code. No additional imports, methods, or definitions are required for this fix.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/taggers/money.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py b/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py
--- a/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/taggers/money.py
@@ -54,7 +54,6 @@
         aur = pynutil.delete("और")
         delete_hundred = pynutil.delete("सौ")
         delete_lakh = pynutil.delete("लाख")
-        delete_hazar = pynutil.delete("हजार") | pynutil.delete("हज़ार")
         delete_crore = pynutil.delete("करोड़") | pynutil.delete("करोड़")
 
         graph_currency_decimal = self.fraction + delete_extra_space + self.currency
EOF
@@ -54,7 +54,6 @@
aur = pynutil.delete("और")
delete_hundred = pynutil.delete("सौ")
delete_lakh = pynutil.delete("लाख")
delete_hazar = pynutil.delete("हजार") | pynutil.delete("हज़ार")
delete_crore = pynutil.delete("करोड़") | pynutil.delete("करोड़")

graph_currency_decimal = self.fraction + delete_extra_space + self.currency
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +18 to +25
from nemo_text_processing.inverse_text_normalization.hi.graph_utils import (
DEVANAGARI_DIGIT,
GraphFst,
delete_extra_space,
delete_space,
insert_space,
integer_to_devanagari,
)

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'DEVANAGARI_DIGIT' is not used.
Import of 'insert_space' is not used.
Import of 'delete_extra_space' is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, we should remove the unused import of DEVANAGARI_DIGIT from the import statement on line 18. This can be done by simply deleting DEVANAGARI_DIGIT, from the list of imported symbols. No other changes are necessary, as this will not affect the functionality of the code.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/taggers/time.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/taggers/time.py b/nemo_text_processing/inverse_text_normalization/hi/taggers/time.py
--- a/nemo_text_processing/inverse_text_normalization/hi/taggers/time.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/taggers/time.py
@@ -16,7 +16,6 @@
 from pynini.lib import pynutil
 
 from nemo_text_processing.inverse_text_normalization.hi.graph_utils import (
-    DEVANAGARI_DIGIT,
     GraphFst,
     delete_extra_space,
     delete_space,
EOF
@@ -16,7 +16,6 @@
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.graph_utils import (
DEVANAGARI_DIGIT,
GraphFst,
delete_extra_space,
delete_space,
Copilot is powered by AI and may make mistakes. Always verify output.
@@ -16,6 +16,7 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'apply_fst' is not used.

Copilot Autofix

AI 5 days ago

To fix the problem, the unused import statement should be removed from the file. Specifically, delete the line from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst (line 19). This will clean up the code, remove an unnecessary dependency, and improve readability. No other changes are required, as the rest of the code does not depend on this import.

Suggested changeset 1
nemo_text_processing/inverse_text_normalization/hi/verbalizers/fraction.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo_text_processing/inverse_text_normalization/hi/verbalizers/fraction.py b/nemo_text_processing/inverse_text_normalization/hi/verbalizers/fraction.py
--- a/nemo_text_processing/inverse_text_normalization/hi/verbalizers/fraction.py
+++ b/nemo_text_processing/inverse_text_normalization/hi/verbalizers/fraction.py
@@ -16,7 +16,6 @@
 import pynini
 from pynini.lib import pynutil
 
-from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst
 from nemo_text_processing.text_normalization.en.graph_utils import NEMO_NOT_QUOTE, NEMO_SPACE, GraphFst, delete_space
 
 
EOF
@@ -16,7 +16,6 @@
import pynini
from pynini.lib import pynutil

from nemo_text_processing.inverse_text_normalization.hi.utils import apply_fst
from nemo_text_processing.text_normalization.en.graph_utils import NEMO_NOT_QUOTE, NEMO_SPACE, GraphFst, delete_space


Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants