DeepL, the company that built its reputation on delivering some of the most natural-sounding text translations in the business, has now moved decisively into spoken language.
On Tuesday, it rolled out a full voice-to-voice translation suite aimed at everything from Zoom meetings and mobile chats to group workshops for frontline workers. At the same time, the company opened up an API so developers and businesses can build their own applications on top of the technology, including custom setups for call centers.
“After spending so many years in text translation, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch in an interview. “We have come a long way when it comes to text translation and document translation. But we thought there wasn’t a great product for real-time voice translation.”
Register for Tekedia Mini-MBA edition 20 (June 8 – Sept 5, 2026).
Register for Tekedia AI in Business Masterclass.
Join Tekedia Capital Syndicate and co-invest in great global startups.
Register for Tekedia AI Lab.
The timing makes sense. Companies have been wrestling for years with clunky real-time tools that either lag badly or butcher nuance. Kutylowski pointed straight to the central engineering headache: finding the sweet spot between low latency and rock-solid accuracy. Right now, DeepL’s system still follows the classic route, speech-to-text, then translation, then text-to-speech—but the company owns the entire stack, something it believes gives it a clear quality advantage built on years of refining its text engine.
Down the road, DeepL plans to develop a true end-to-end voice model that skips the text middleman altogether, which should make conversations feel even more immediate.
For everyday users, the new tools are straightforward and practical. Add-ons for Zoom and Microsoft Teams let listeners hear translated audio in their own language while others speak normally, or simply read real-time subtitles on screen. The features are in early access for now, and organizations can join a waitlist.
There’s also a mobile and web version for one-on-one or small-group talks, whether everyone is in the same room or halfway across the globe. In larger settings like training sessions or workshops, participants just scan a QR code to join the multilingual conversation.
The system can learn custom vocabulary on the fly, industry jargon, company names, even personal names, so it doesn’t stumble over the specialized language that matters in real workplaces. That adaptability is especially useful for customer service teams. Kutylowski noted that AI is reimagining what customer support will look like in the coming years, with a translation layer letting companies serve customers in languages where hiring fluent staff is both difficult and expensive.
DeepL is betting it can deliver the kind of reliability that has made its written translations stand out by controlling the full pipeline. The voice push builds directly on a Voice API, which it quietly released back in February, giving developers another building block for multilingual apps.
Of course, DeepL is not alone in this space. Several well-funded rivals are already carving out niches. Sanas, which raised $65 million last year, focuses on smoothing out accents in real time for call-center agents. Dubai-based Camb.AI specializes in dubbing and localizing video content for media and entertainment companies, often working with Amazon Web Services. Palabra, backed by Reddit co-founder Alexis Ohanian’s Seven Seven Six fund, is working on a real-time engine that tries to keep both the original meaning and the speaker’s actual voice intact—putting it in the most direct head-to-head with what DeepL is building.
What sets DeepL apart is its deliberate, step-by-step evolution. It didn’t rush into voice; it spent years perfecting text first. That patience has given it a foundation of linguistic nuance that many pure-play voice startups lack.
The implications, especially for global businesses, are: smoother cross-border meetings, more inclusive training programs, and customer support that scales without massive hiring budgets. Developers get an API that lets them embed high-quality translation wherever it’s needed, from internal collaboration tools to public-facing services.
The bigger picture is, however, the following: As companies operate across more languages and time zones, the cost of misunderstanding—or simply waiting for a human interpreter—has become a real drag on productivity. DeepL’s new suite aims to remove that friction without sacrificing the quality users have come to expect from its text products.



