Skip to content

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech", change model from phoneme ID based to phonological features based

License

Notifications You must be signed in to change notification settings

DongJiashu/FastSpeech2_phonological_features

 
 

Repository files navigation

FastSpeech 2 - PyTorch Implementation

This repository is an extended PyTorch implementation of Microsoft's FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, initially based on xcmyz's implementation, with the core code structure derived from ming024's original FastSpeech2 implementation.
We introduce several modifications to enable training and inference using phonological features instead of phoneme IDs, supporting cross-lingual and low-resource speech synthesis scenarios. This modification allows more linguistically informed training and better generalization across languages. Using this version, we successfully trained a German baseline TTS model, and further performed transfer learning with a small amount of English data to train an English model.

Our method is inspired by the concept of using cross-lingual phonological information as described in the paper:

"Cross-lingual Transfer of Phonological Features for Low-resource Speech Synthesis"
SSW11 Paper PDF

We also refer to the PHOIBLE database for phonological feature definitions and mappings.

The overall training and synthesis pipeline still follows the original repository structure ming024's original FastSpeech2 implementation. However, we have made the following key modifications to support phonological feature-based modeling:

  • text/ folder: contains several modified files to support phonological feature data preparation.
  • transformer/models.py: updated to allow model input as phonological feature vectors instead of phoneme IDs.
  • synthesis.py: modified to support inference using phonological features as input.

References

About

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech", change model from phoneme ID based to phonological features based

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 81.3%
  • HTML 18.7%