Google's Gemini Omni Flash turns any input into video — with a 10-second leash

By: Anton Kratiuk | yesterday, 22:19
Gemini Omni Flash generates video from text, images, audio, and existing clips — all in a single multimodal model. Gemini Omni Flash generates video from text, images, audio, and existing clips — all in a single multimodal model.. Source: Source: Google

Google unveiled Gemini Omni Flash at I/O 2026 on May 19, giving Plus, Pro, and Ultra subscribers access to a new multimodal model that generates video from any combination of text, images, audio, and existing clips. The rollout is global and immediate for paid tiers, with free access arriving on YouTube Shorts and the YouTube Create app this week. API access is expected in the coming weeks.

What Omni Flash actually does

The core pitch is "anything in, video out." Omni Flash combines Gemini's text reasoning with Google's video and simulation research — including the Veo generator and the Genie game engine — into a single model. Feed it a handful of photos, a voice memo, and a text prompt at the same time, and it generates a single coherent clip.

The standout feature is conversational editing. Instead of re-prompting from scratch, you can tell the model things like "shift the camera angle" or "move the scene to a beach" and it reworks the video while keeping characters and object physics intact. Digital avatar creation is also live today: the model can synthesize a visual and vocal likeness that speaks on your behalf, no re-recording needed.

One feature that isn't live yet: voice and speech editing. Google says it is still testing that capability separately before releasing it — a deliberate safety call, not a technical gap.

Every clip Omni Flash produces carries a SynthID watermark, Google's invisible digital signature now adopted by OpenAI, Nvidia, ElevenLabs, and Kakao. The standard is designed to make AI-generated content verifiable and harder to weaponize as deepfakes.

The catches worth knowing

The 10-second clip limit is a deployment choice, not a model ceiling. Nicole Brichtova of Google DeepMind confirmed the cap exists to manage compute demand at scale — a higher-end Pro variant is planned for later. That trade-off matters because independent testers find that ByteDance's Seedance 2.0 and Alibaba's Wan 2.7 still outperform Omni Flash on raw generation quality benchmarks, even if Omni's conversational editing holds its own.

The competitive context: OpenAI pulled Sora back to API-only in April 2026, ceding the consumer video space. Google is moving into that gap with free YouTube access, a price cut on Ultra from $250 to $200 a month, and a model that prioritizes reach over peak quality — at least for now.

For most users, the entry point is the Gemini app or Google Flow, Google's AI filmmaking platform. If you're already a paid subscriber, Omni Flash is available per the Google rollout blog today.