Amazon has created the largest text-to-speech model to date

By: Bohdan Kaminskyi | 20.02.2024, 18:46

Christian Wiediger/Unsplash

Amazon's artificial intelligence research group has announced the development of the largest text-to-speech model to date. The "largest" refers to the number of parameters and the amount of data used for training.

Here's What We Know

The presented model, called BASE TTS, contains 980 million parameters. It was trained on 100,000 hours of audio recordings of speech from public resources, mostly in English.

The system was also shown examples of spoken phrases in other languages so that it could correctly pronounce common expressions.

During testing on small data, the Amazon team identified a "jump" in speech synthesis quality when it reached 150 million parameters. At the same time, the model began to demonstrate a number of new language capabilities.

Experts note that BASE TTS will appear in the public domain to avoid unethical use. Instead, it will serve as a training base for improving existing solutions in this area.

Source: TechXplore