One TTS Alignment To Rule Them All

Badlani, Rohan; Łancucki, Adrian; Shih, Kevin J.; Valle, Rafael; Ping, Wei; Catanzaro, Bryan

Computer Science > Sound

arXiv:2108.10447 (cs)

[Submitted on 23 Aug 2021]

Title:One TTS Alignment To Rule Them All

Authors:Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro

View PDF

Abstract:Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in RAD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.

Subjects:	Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2108.10447 [cs.SD]
	(or arXiv:2108.10447v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2108.10447

Submission history

From: Rohan Badlani [view email]
[v1] Mon, 23 Aug 2021 23:45:48 UTC (1,806 KB)

Computer Science > Sound

Title:One TTS Alignment To Rule Them All

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:One TTS Alignment To Rule Them All

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators