Google just unveiled Gemini Omni! It creates breathtaking videos

  • Google at the I/O 2026 conference introduced a new generative model Gemini Omni, currently focused on video creation
  • The model accepts any combination of inputs – text, image, video, and audio track – and creates a single resulting clip from them
  • The Omni Flash variant is available today for Google AI Plus subscribers and higher, and for free on YouTube Shorts

Sdílejte:
Jakub Kárník
Jakub Kárník
20. 5. 2026 04:30
Advertisement

Last September, Google showcased the Nano Banana model, which quickly became one of the most widely used AI-powered photo editing tools. People used it to restore old family photos, convert sketches into photorealistic images, or visualize design ideas. This year, Google is taking the same principle to the next level – introducing Gemini Omni, a model capable of applying the same logic to video.

Omni is a new family of generative models from Google, which the company introduced at the Google I/O 2026 conference. The first representative – Gemini Omni Flash – is gradually becoming available starting today to users of the Gemini app, the Google Flow platform, and also to creators on YouTube Shorts. Google has only mentioned the second and more powerful model, designated Omni Pro, for now, and will release details progressively.

Video editing by conversation, not sliders

The main novelty of Omni is not so much the video generation itself – competing models like OpenAI Sora, Runway, or Meta Movie Gen have been able to do that for quite some time. Google emphasizes conversational editing. The user inputs a video clip and describes in plain language what should happen to it: change the environment, add a character, adjust camera movement. Each subsequent instruction builds on the previous one, the scene remembers the context, and characters remain visually consistent.

In the published examples, for instance, the command “make that statue out of bubbles” transforms a marble sculpture into a floating structure of soap bubbles, without the need for any manual masking. Another example then transforms a mirror into a moving liquid and a character’s arm into a reflective material upon a hand touch. All this work was previously handled by specialists in studios, often with budgets in the thousands of dollars for a single shot.

Physics, knowledge, and visual explainers

During the Omni presentation, Google repeatedly emphasized that the model not only builds visually convincing scenes but also understands how they should behave. The model’s capabilities in handling gravity, kinetic energy, and fluid dynamics have been improved. In one of the examples, a ball rolls along a winding path in a chain reaction style – the movement and sound effects of each bounce correspond to reality.

The second pillar is the connection with what Gemini “knows” about the world. Omni can produce a short visual explainer of complex issues – one of the presented examples is a stop-motion animation in the claymation style, explaining the process of protein folding. Thus, from a short text prompt, content can be created that would take days to produce manually.

This approach builds on Google’s long-term effort to build a so-called world model – a model that understands the world as a coherent whole, not as a sequence of random pixels. The company applies the same philosophy to the experimental model Genie, which generates interactive game environments. Genie, however, remains available only to subscribers of the highest AI Ultra tier.

Personal avatar and caution with voice

Omni can also insert a user’s digital twin into videos. The Avatar feature creates a digital version of a person based on provided samples, which speaks with their voice in the resulting videos. OpenAI took a similar path last year with its – since canceled – standalone Sora application.

However, Google has deliberately limited the possibilities of sound manipulation for now. Editing spoken words in video – meaning rewriting what someone says – is technically possible, but Google has not included it in the first version. The reason given is that it first needs to establish rules to prevent misuse (typically deepfake). Sound as an input reference currently only works in the form of voice samples; further audio inputs are expected in the coming months.

All videos created by the Omni model also carry an invisible digital watermark SynthID. Its presence can be verified in the Gemini app, in Gemini integrated into the Chrome browser, and via Google Search. The goal is to enable anyone to quickly recognize whether the content is AI-generated.

Where and when to try Omni

Gemini Omni Flash is available starting today for all Google AI Plus, Pro, and Ultra subscribers, both in the Gemini app and on the Google Flow platform. For users of YouTube Shorts and the YouTube Create app, the model is available for free, with a gradual rollout beginning this week. For developers and enterprise customers, Google will open API access within weeks.

Will you try Gemini Omni to create your own videos?

Source: Google Blog

About the author

Jakub Kárník

Jakub is known for his endless curiosity and passion for the latest technologies. His love for mobile phones started with an iPhone 3G, but nowadays… More about the author

Jakub Kárník
Sdílejte: