Skip to content

Commit fe12dbc

Browse files
authored
Merge pull request #164 from BelenSantamaria/encoding-flashtext
Encoding flashtext
2 parents ea92f35 + 01a3562 commit fe12dbc

File tree

2 files changed

+5
-1
lines changed

2 files changed

+5
-1
lines changed

docs/docs/extractors/flashtext.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ use the parameter `non_word_boundaries`
2929
- **entity_name**: the name of the entity to attach to the message
3030
- **case_sensitive**: whether to consider case when matching entities. `False` by default.
3131
- **non_word_boundaries**: characters which shouldn't be considered word boundaries.
32+
- **encoding**: the name of the encoding used to read the lookup text file.
3233

3334
## Base Usage
3435

rasa_nlu_examples/extractors/flashtext_entity_extractor.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ def get_default_config() -> Dict[Text, Any]:
4545
"non_word_boundaries": "",
4646
"path": None,
4747
"entity_name": None,
48+
"encoding": None,
4849
}
4950

5051
def __init__(
@@ -63,7 +64,9 @@ def __init__(
6364
)
6465
for non_word_boundary in config["non_word_boundaries"]:
6566
self.keyword_processor.add_non_word_boundary(non_word_boundary)
66-
words = pathlib.Path(self.path).read_text().split("\n")
67+
words = (
68+
pathlib.Path(self.path).read_text(encoding=config["encoding"]).split("\n")
69+
)
6770
if len(words) == 0:
6871
rasa.shared.utils.io.raise_warning(
6972
f"No words found in the {pathlib.Path(self.path)} file."

0 commit comments

Comments
 (0)