Microsoft unveiled dipfake and voice cloning tools

By: Bohdan Kaminskyi | 16.11.2023, 16:35

Microsoft

At the Ignite conference, Microsoft announced a service for creating photorealistic avatars of people with lip animation according to a given text. It also showed a tool for voice cloning by audio sample.

Here's What We Know

The new Azure AI Speech text to speech avatar service allows you to upload a photo of a person and compose a script. A video of a speaking avatar is then generated based on this.

The digital doppelgangers can speak several languages. In scripts, they can use artificial intelligence models such as OpenAI's GPT-3.5 to answer customer questions outside of scripts.

Another Personal voice feature can recreate a user's voice in seconds. It requires a one-minute audio recording.

The company suggests using Personal voice to create personalised voice assistants, dubbing content into different languages and creating custom narration for stories, audiobooks and podcasts.

According to Microsoft, both tools will be available to a limited number of users and only for certain scenarios. In addition, customers must give explicit consent for their voice and image to be used.

This is intended to limit the potential misuse of technology to create dipfakes without people's knowledge. Microsoft says it is taking a responsible approach to AI ethics.

Source: Microsoft, Microsoft