Google releases new AI video model Veo 3.1 in Flow and API: what it means for enterprises

1 5 minutes read

1760600496 Google releases new AI video model Veo 31 in Flow.png

As expected after days of online leaks and rumours, Google has unveiled Veo 3.1, its latest AI video generation model, which offers a host of creative and technical upgrades aimed at improving narration control, audio integration and realism in AI-generated video.

While the updates expand the possibilities for hobbyists and content creators with Google’s online AI creation app, Flow, the release also signals a growing opportunity for enterprises, developers and creative teams looking for scalable and customizable video tools.

The quality is higher, the physics are better, the price is the same as before, and the control and editing features are more powerful and diverse.

My initial tests showed that it is a powerful, high-performance model that immediately pleases with every generation. However, the look is more cinematic, polished and a little more special "industrial" By default from competitors like OpenAI’s new Sora 2, which was released late last month, which may or may not be what a particular user is after (the Sora excels in mobile and… "sincere" Style videos).

Expanded control over narration and voice

Veo 3.1 builds on its predecessor, Veo 3 (released in May 2025) with improved support for dialogue, surround sound, and other sound effects.

Native audio creation is now available across several key features in Flow, including Frame-to-Video, Video Components, Scale-up," Which gives users the ability to: convert still images to video; Use elements, characters and objects from multiple images in one video; And create clips longer than the first 8 seconds, to more than 30 seconds or up to +1 when continuing from the final frame of the previous clip.

Previously, you had to add audio manually after using these features.

This addition gives users greater control over tone, emotion, and storytelling, capabilities that previously required post-production work.

In enterprise contexts, this level of control may reduce the need for separate audio pipelines, providing an integrated way to create training content, marketing videos, or digital experiences with synchronized audio and visuals.

Google noted in a blog post that the updates reflect user feedback calling for deeper technical control and improved audio support. Gallegos stresses the importance of making adjustments and improvements directly in Flow, without reworking scenes from scratch.

Richer input and editing capabilities

With Veo 3.1, Google is introducing support for multiple input types and more precise control over the outputs generated. The form accepts text prompts, images and videos as input, and also supports:

Reference photos (up to three) To guide the look and feel of the final output
First and last frame interpolation To create seamless scenes between fixed endpoints
Scene extension Which continues the action or movement of the video clip beyond its current duration

These tools aim to give enterprise users a way to fine-tune the look and feel of their content, which is useful for achieving brand consistency or adhering to creative briefs.

Additional capabilities such as “Insert” (adding objects to scenes) and “Remove” (deleting objects or characters) are also offered, although not all of them are immediately available through the Gemini API.

Publishing across platforms

Veo 3.1 can be accessed through several Google AI services:

flowGoogle’s own interface for making movies with the help of artificial intelligence
Gemini APIaimed at developers who build video capabilities into apps
Vertex Artificial Intelligencewhere enterprise integration will soon support Veo’s “Scene Extension” and other key features

Availability through these platforms allows enterprise customers to choose the right environment – GUI-based or automated – based on their teams and workflows.

Pricing and access

The Veo 3.1 model is currently in place Preview It is only available on Paid class Gemini API. The cost structure is the same as Veo 3, Google’s previous generation AI video model.

Standard form: $0.40 per second of video
Quick model: $0.15 per second

There is no free tier, and users are only charged if the video is created successfully. This model is backwards compatible with previous Veo releases and provides predictable pricing for budget-conscious enterprise teams.

Technical specifications and output control

Veo 3.1 outputs video at 720p or 1080p resolutiontogether Frame rate of 24 fps.

Duration options included 4, 6 or 8 seconds From a text message or uploaded photos, with the ability to even extend video clips 148 seconds (more than two and a half minutes!) When using the “Expand” feature.

New functionality also includes tighter control over themes and environments. For example, organizations can upload a product image or visual reference, and Veo 3.1 will create scenes that maintain their look and stylistic cues throughout the video. This can streamline creative production lines for retail, advertising and virtual content production teams.

Initial reactions

The broader creator and developer community has responded to the launch of Veo 3.1 with a mixture of optimism and mild criticism – especially when compared to competing models like OpenAI’s Sora 2.

Matt Schumer, AI founder of Otherside AI/Hyperwrite, and an early user, described his initial reaction as “disappointment,” noting that the Veo 3.1 is “significantly worse than the Sora 2” and also “a little bit more expensive.”

However, he acknowledged that Google’s tools, such as reference support and scene extension, are a bright spot in the release.

Travis Davidsa 3D digital artist and AI-powered content creator, echoed some of that sentiment. While he noted improvements in audio quality, especially in sound effects and dialogue, he raised concerns about limitations that still exist in the system.

These include a lack of custom audio support, the inability to directly choose generated sounds, and the max persistence at 8-second generations – despite some general claims about longer outputs.

Davids also noted that character consistency across changing camera angles still requires precise stimulation, whereas other models like the Sora 2 handle this more automatically. He questioned the lack of 1080p resolution for users on paid tiers like Flow Pro and expressed doubts about feature parity.

On the more positive side, @kimmonismus, an AI newsletter writer, stated that “Veo 3.1 is great,” though he still concluded that OpenAI’s latest model is still the overall favorite.

Collectively, these early impressions suggest that while the Veo 3.1 offers useful tool improvements and new creative control features, expectations have changed as competitors raise the bar on quality and ease of use.

Adoption and measurement

Since launching Flow five months ago, Google has declared it over 275 million videos Created across different Veo models.

The pace of adoption indicates significant interest not only from individuals but also from developers and companies experimenting with automated content creation.

Thomas Iljic, director of product management at Google Labs, highlights that Veo 3.1 brings the capabilities closer to how human filmmakers plan and shoot. These include scene composition, continuity across shots, and orchestrated audio – all of which are areas organizations are increasingly looking to automate or simplify.

Safety and responsible use of artificial intelligence

Videos created with Veo 3.1 are watermarked with Google Synthide ID Technology, which includes an imperceptible identifier to indicate that the content was generated by artificial intelligence.

Google implements security and moderation filters across its APIs to help reduce privacy and copyright risks. The content created is cached and deleted after two days unless downloaded.

For developers and enterprises, these features provide reassurance around provenance and compliance – critical in regulated or brand-sensitive industries.

Where Veo 3.1 stands among the crowded AI video models

Veo 3.1 isn’t just a reiteration of previous models — it represents a deeper integration of multimedia inputs, storytelling control, and enterprise-level tools. While creative professionals may see immediate benefits in workflow editing and fidelity, companies exploring automation in training, advertising, or virtual experiences may find greater value in model configurability and API support.

Early user feedback highlights that while Veo 3.1 provides valuable tools, expectations around realism, voice control, and generation length are evolving rapidly. As Google expands its reach with Vertex AI and continues to improve Veo, its competitive position in enterprise video creation will hinge on how quickly it addresses these user pain points.

Don’t miss more hot News like this! Click here to discover the latest in AI news!

2025-10-15 18:50:00

1 5 minutes read