Vall-E, Microsoft's new AI model that mimics any human voice based on just a 3-second original

By: Elena Shcherban | 11.01.2023, 00:47
Vall-E, Microsoft's new AI model that mimics any human voice based on just a 3-second original

Microsoft introduced a new artificial intelligence model called Vall-E. It is based on EnCodec technology, which Meta announced in October 2022.

Details

Microsoft calls VALL-E a "neural codec language model." This artificial intelligence is capable of mimicking any human voice, and it only needs to listen to 3 seconds of the original voice to do so. The AI breaks down the information into components and synthesizes variations of its sound in different phrases, as a result of which it can accurately reproduce the timbre and emotional tone of the speaker.

To train Vall-E, Microsoft used recordings of 60,000 hours of conversations recorded by more than 7,000 real people. Mostly they used audiobooks from LibriVox library.

Examples of Vall-E simulated voices can be heard on GitHub.

Microsoft says that Vall-E could be used as a text-to-voice tool, a way to edit speech and an audio creation system by connecting it to other generative AI.

Source: Vall-E