Google Rolls Out Gemini 2.5 for Web and Mobile Control

Google DeepMind has made public a new AI model, Gemini 2.5 Computer Use, capable of interacting with web and mobile user interfaces by mimicking human actions such as clicking, typing and scrolling. The model is now available in preview through the Gemini API, via Google AI Studio and Vertex AI, allowing developers to build agents that can directly operate interfaces otherwise inaccessible via backend APIs.

The model rests on the visual reasoning and comprehension capacity built into Gemini 2.5 Pro, extended with a specialized “computer_use” tool. Developers feed it a prompt, a screenshot of the interface state, and the history of previous actions; the model returns discrete UI actions, which are executed by client-side code, triggering a new visual state loop. The process continues until the task is completed, an error arises or a safety halts execution. Google says this architecture delivers lower latency in web and mobile benchmarks compared with alternatives.

Gemini 2.5 Computer Use supports actions such as navigation to URLs, domain-level clicking, drag-and-drop, dropdown manipulation, scrolls and text entry. It can also conditionally request user confirmation for risky actions such as purchases or system changes. Google emphasises multi-layered safety guardrails: a per-step safety service checks each action before execution, and developers can impose system instructions limiting or disabling certain actions.

Internal Google teams have already deployed this model in tools like Project Mariner, the Firebase Testing Agent, and AI Mode within Search. Early testers from outside the company report gains in speed and reliability. One automation platform noted performance improvements up to 18 percent in complex tasks; another described it as often operating 50 percent faster than competing systems in interface interactions.

The competitive pressure on agentic AI architectures is significant. Anthropic introduced a Computer Use capability for its Claude model last year, and OpenAI’s ChatGPT Agent now runs with virtual computer-level control including code execution. Google’s approach limits control to the browser/mobile layer rather than full OS access, narrowing attack surface but potentially restricting versatility.

Benchmark results, partly self-reported and validated via the Browserbase evaluation suite, show Gemini 2.5 Computer Use outpaces rivals on metrics such as Online-Mind2Web and WebVoyager. In tests, it achieved notably higher success rates than Claude and OpenAI agents at equivalent latency. The model does not yet support direct file system operations or desktop OS control.

Complementary advancements in open research also signal rising competition. A new technical report describes UI-Venus, an open-source UI agent developed using reinforcement tuning on a multimodal model backbone; it achieves state-of-the-art grounding and navigation success without requiring massive training datasets, underscoring that UI agent research is accelerating.

Yet challenges remain. Real-world digital environments can present dynamic layouts, CAPTCHAs, session timeouts and unpredictable UX changes, which can break agent loops. An evaluation from Carnegie Mellon earlier this year found that even top-tier AI agents struggle with robust business automation tasks in messy real-world settings. Some industry observers caution that deployment viability in complex workflows still faces hurdles in error tracking, fallback logic and interpretability.

Follow Arabian Post

Select Arabian Post as your preferred source on Google and MSN News for trusted business news and Arab politics and updates.

Arabian Post on MSN News Arabia business and politics

Follow Arabian Post on Google News business coverage

Arabian Post Telegram channel Dubai news updates

Arabian Post Medium articles business insights

Notice an issue?

Arabian Post strives to deliver the most accurate and reliable information to its readers. If you believe you have identified an error or inconsistency in this article, please don't hesitate to contact our editorial team at editor[at]thearabianpost[dot]com. We are committed to promptly addressing any concerns and ensuring the highest level of journalistic integrity.

Search

Google Rolls Out Gemini 2.5 for Web and Mobile Control

Follow Arabian Post

Notice an issue?

CleanCore’s DOGE Hoard Reaches 710 Million Tokens

Trade247 Introduces Primetime Trading Technology in Dubai

UK sets overnight social media curfew for teens

Dubai weighs turning organic waste into aviation fuel

Fynd brings AI fashion platform to Gulf

Dubai-Botswana pact opens new commodity trade corridor

Dealing.com claims record for tokenised stock access

Gemcorp closes first Saudi Shariah financing deal

More from Hyphen Digital Network

Contact

Advertising Enquiries

Syndication & PR Distribution

Google Rolls Out Gemini 2.5 for Web and Mobile Control

🕚 Wed 8 Oct 2025 | AP News

Follow Arabian Post

Notice an issue?

Share

CleanCore’s DOGE Hoard Reaches 710 Million Tokens

Trade247 Introduces Primetime Trading Technology in Dubai

Related news

More from Hyphen Digital Network

Contact

Advertising Enquiries

Syndication & PR Distribution