Vall-E, Microsoft's new AI model that mimics any human voice based on just a 3-second original
Microsoft introduced a new artificial intelligence model called Vall-E. It is based on EnCodec technology, which Meta announced in October 2022.
Details
Microsoft calls VALL-E a "neural codec language model." This artificial intelligence is capable of mimicking any human voice, and it only needs to listen to 3 seconds of the original voice to do so. The AI breaks down the information into components and synthesizes variations of its sound in different phrases, as a result of which it can accurately reproduce the timbre and emotional tone of the speaker.
To train Vall-E, Microsoft used recordings of 60,000 hours of conversations recorded by more than 7,000 real people. Mostly they used audiobooks from LibriVox library.
Examples of Vall-E simulated voices can be heard on GitHub.
Microsoft says that Vall-E could be used as a text-to-voice tool, a way to edit speech and an audio creation system by connecting it to other generative AI.
Source: Vall-E