A new Google speech synthesizer talks like a real person

By: Eugene Sherban | 04.01.2018, 10:02

Google introduced the second-generation Tacotron speech synthesizer. Thanks to the competent use of AI engineers of the company managed to synthesize a human voice, which is not easy to distinguish from of the original.

What Tacotron 2 can do

Thanks to AI The synthesizer of speech has learned to speak like a living person, and not the Daleck of "Doctor Who." The there, he pauses after commas and points, focuses on beginning sentence, observes stresses and keeps the pace, like a living person. By link is its comparison with a living person. Try not to guess looking. A unusual words sometimes put him in dead end.

How it works

Google has broken the process of synthesizing speech into 2 stages and divided them between two neuronets. The first neural network turns the text into audio spectrogram - a kind of screenshot of the equalizer with a clear sequence of sound frequencies. A The second WaveNet neural network interprets this file and turns it into a speech. Thanks to this Google you need to connect to internet to talk.

What's next

Google has not yet discloses plans for implementation of Tacotron 2. However, You need to be a genius to assume that if everything works, then already in will soon appear in Google products like voice assistant, translator or Google maps.