Stable-TTS

Stable Speaker-Adaptive Text-to-Speech Synthesis
via Prosody Prompting under Limited Target Samples


[Demo Pages]

Main Experiments (Fine-tuning)

All results were fine-tuned with target samples.

─ VCTK Samples ─

Speaker Reference Text Ground Truth Stable-TTS (Ours) w/ Prior Prompt Stable-TTS (Ours) w/ Self Prompt Stable-TTS
w/o PLM
UnitSpeech Grad-StyleSpeech

p227

It would create a Scottish secretary with a lot of weight.

p236

Halifax has also been mentioned as a likely predator.

p261

The Greeks used to imagine that it was a sign from the gods to foretell war or heavy rain.

p264

Therefore, this type of aircraft is completely safe.

─ LibriTTS Samples ─

Speaker Reference Text Ground Truth Stable-TTS (Ours) w/ Prior Prompt Stable-TTS (Ours) w/ Self Prompt Stable-TTS
w/o PLM
UnitSpeech Grad-StyleSpeech

237

It gives itself ungrudgingly to the moods of the season, holding nothing back.

1089

He recognized their speech collectively before he distinguished their faces.

5105

Her sea going qualities were excellent, and would have amply sufficed for a circumnavigation of the globe.

─ VoxCeleb Samples ─

Speaker Reference Text Stable-TTS (Ours) w/ Prior Prompt Stable-TTS (Ours) w/ Self Prompt Stable-TTS
w/o PLM
UnitSpeech Grad-StyleSpeech

0hQYDwSjLyU

These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon.

E-vRMuA-ak

However, he held out little hope that the Government would take action.

7RA36uU4WM

It won't be much, but I'm grateful to find a friend. I'm guilty, you know, and there's no one to blame but myself.