Google has unveiled VLOGGER, an artificial intelligence that can bring still photos to life

By: Bohdan Kaminskyi | 19.03.2024, 20:13


Google researchers have developed a new artificial intelligence system called VLOGGER that can generate realistic videos of people moving and talking from just one photo.

Here's What We Know

VLOGGER can take a photo of a person and an audio track as input, and then synthesise a video of that person saying words, making appropriate facial expressions, gestures and head movements. While the generated videos are not perfect, they show significant progress in bringing static images to life.

To create the model, the developers collected a huge MENTOR dataset with more than 800,000 people and 2,200 hours of video. Through this, VLOGGER has learnt to generate a variety of characters of different ages, ethnic backgrounds and in different environments.

The technology opens up many applications, including automatic video dubbing, editing and filling in missing footage, and creating full videos from a single photo. This could be useful for the entertainment industry, virtual reality, training programmes and creating AI-powered virtual assistants.

However, there is a risk of using VLOGGER to create deepfakes - synthetic media files where a real person is replaced by a fake. As such videos become more sophisticated, they could exacerbate the problems of misinformation and spoofing on the internet.

The developers acknowledge that VLOGGER has limitations. The videos created are relatively short, have static backgrounds, and the people do not move in a 3D environment. Nevertheless, the researchers call the model a milestone in AI research.

Source: VentureBeat