-
Notifications
You must be signed in to change notification settings - Fork 62
Open
Labels
enhancementNew feature or requestNew feature or request
Description
We would like to use these issues to gauge user interest.
The BERT tokenizer is intended as an identical reimplementation of the original BERT tokenization. However it is possible to replace the bert.tokenizer.internal.BasicTokenizer
with a tokenizer using tokenizedDocument
.
The belief is this should not affect the model too much as the wordpiece encoding is still the same, and it is these wordpiece encoded sub-tokens that are the input to the model.
Advantages of this are that tokenizedDocument
is considerably faster than BasicTokenizer
and may offer better integration with Text Analytics Toolbox functionality.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request