OpenAI announces new technology for cloning voice from 15-second audio

By: Nastya Bobkova | 30.03.2024, 03:47

OpenAI has introduced a new innovative tool called Voice Engine, which can clone the voice of any person from a 15-second audio sample.

Here's What We Know

Voice Engine analyses a short audio signal and creates natural-sounding speech with "emotional and realistic voices. This innovative technology, which is based on OpenAI's existing speech synthesis API, can be useful for a variety of purposes: audiobooks, language translation, and helping people with speech disorders.

OpenAI recognises the serious risks of using this technology, including the possibility of its misuse by unscrupulous individuals. Therefore, the company is actively working to ensure privacy and security and is implementing a number of measures, such as watermarking and proactive monitoring of system usage.

According to the announcement, Voice Engine remains at the preview stage, but the company has already conducted successful pilot programmes that demonstrate the potential of Voice Engine. The preview was conducted at Brown University, where the feature was used to help patients with speech impairments.

According to OpenAI, their Voice Engine will be implemented while collecting feedback from partners and adhering to a policy prohibiting the use of cloned voice without the consent of the individual. In addition, they plan to create a "list of prohibited voices" to avoid abuse.

How Much Does It Cost?

The estimated cost of using Voice Engine is approximately $15 per million characters, which is approximately 162,500 words.

Source: Engadget