NVIDIA has created a "Swiss Knife for audio": a new AI model can create and edit any audio based on textual cues

By: Vlad Cherevko | 26.11.2024, 11:41

Nvidia announced Fugatto, an innovative audio generator with artificial intelligence that can create and edit music, speech and sounds based on textual cues.

Here's What We Know

The Fugatto or Foundational Generative Audio Transformer Opus AI model, described as a "Swiss knife for sound", is capable of creating audio and modifying existing music, voice and sound files based on text commands. Fugatto has been developed by an international team of researchers, enhancing its multilingual and multi-accent capabilities.

The tool can modify voice by adding accents or changing tone, edit music by isolating vocals, adding instruments or replacing melodies. Nvidia claims that Fugatto is trained on millions of audio samples and can perform a wide range of tasks without the need for additional data.

However, the company does not specify when or if the tool will be available to the public. Fugatto stands out among other AI tools such as Stability AI and OpenAI due to its ability to create entirely new sounds.

Source: NVIDIA