Tech

Google Expands AI Live Speech Translation with Gemini 3.5 Live Translate


Google is positioning Live Translate as a specialized translation model rather than a general-purpose conversational AI system. Documentation contrasts the model with Gemini’s broader Live Agent capabilities, stating that Live Translate functions as a real-time translation pipeline, supports audio-only input, and does not include capabilities such as function calling, search grounding, tools, or system instructions.

Google’s positioning of Live Translate mirrors a distinction recently made by OpenAI with GPT-Realtime-Translate, which the company similarly described as an interpreter rather than a voice assistant.

Deployment Across Products and Platforms

Google is deploying the technology across multiple products. In Google Meet, speech translation will expand from the existing five supported languages and translation limited to and from English to more than 70 languages and over 2,000 language-pair combinations. The company is also redesigning how speech translation is accessed. Earlier versions required users to open a dedicated setup dialog and configure translation before it could be used. In the updated interface, translation controls are surfaced directly within the meeting experience. The feature is entering private preview for selected Workspace customers, with broader availability planned later in 2026.

On the consumer side, the model is also being integrated into Google Translate’s existing Live Translate feature. Users on Android and iOS can access real-time speech-to-speech translation through the app using headphones or device speakers. Google is also introducing a new “listening mode” on Android that streams translated audio through the device earpiece, allowing users to privately follow translations without headphones.

2026 Slator Market Report: Language Solutions & AI

The 130-page Slator Report maps a USD 30.85 billion global market shaped by multilingual AI, enterprise AI operationalization, and the convergence of language technology, media, accessibility, and real-time communication.

Google is also making the model available through the Gemini Live API, extending the technology beyond Google’s own products and allowing developers to build real-time multilingual voice applications. Google highlighted use cases including meetings, customer support, education, travel, and live broadcasting, while highlighting ecosystem partners including Agora, LiveKit, Pipecat, Fishjam, and Vision Agents. Ride-hailing platform Grab is already testing the model for multilingual communication between drivers and passengers.

According to Google’s model card, Gemini 3.5 Live Translate is based on Gemini 3 Pro and was evaluated across three primary dimensions: translation quality, latency, and speech naturalness. The company states that translation quality is assessed using AutoMQM, an error-based machine translation evaluation metric, while latency and synthesized speech quality are measured through separate internal evaluation methods. 

However, the company has not released benchmark scores, language-pair results, latency figures, or comparative performance data, limiting independent assessment of the model’s performance.

Google has also disclosed several limitations. The company says voice replication may become inconsistent during long conversations or multi-speaker sessions, language detection can struggle with strong accents, similar languages, or rapid language switching, and background audio may occasionally affect translated output.

The launch comes amid increasing competition in AI live speech translation, with providers including OpenAI, Zoom, and DeepL expanding real-time translation capabilities across communication and collaboration platforms.



Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Popular

To Top