Deepgram’s Nova-3 aims to make voice AI finally fit for the real world

Latest top stories

Start-ups

Technology

Deepgram’s Nova-3 aims to make voice AI finally fit for the real world

13 October 2025

Voice assistants have been promising natural conversation for more than a decade — but in practice, talking to machines still feels a little like talking to walls. Speech recognition has improved, yes, but in noisy cafés, multilingual workplaces, or call centres buzzing with chatter, even the most advanced systems stumble.

Deepgram, a San Francisco-based startup founded by a team of physicists turned AI engineers, says it’s ready to change that. Its latest model, Nova-3, launched earlier this year, represents what the company calls its most intelligent and adaptable system yet — a model built not for the lab, but for the chaos of real-world speech.

In an exclusive interview with MoveTheNeedle.news, Natalie Rutgers, Deepgram’s Vice President of Product, describes Nova-3 as “the result of years of iteration and learning from real-world feedback.”

“Nova-2 had already set a high bar with its unmatched speed, accuracy, and affordability,” she says. “And now, with Nova-3, we are delivering a significant leap forward in intelligence, adaptability, and language understanding.”

A voice that listens between languages

What sets Nova-3 apart, Rutgers explains, is its ability to transcribe conversations that flow between multiple languages in real time — a capability that no other commercial model currently offers.

“The biggest shift? Nova-3 became the first voice AI model to offer real-time multilingual transcription — a true breakthrough for global use cases,” she says. “It’s not just that it understands different languages — it understands how people naturally switch between them mid-sentence, which is huge for industries like emergency response, global customer support, and international healthcare.”

That flexibility has implications well beyond transcription. In multilingual call centres, or emergency services coordinating across borders, the ability to handle code-switching — moving between English and Spanish, or Hindi and English, for instance — could remove friction that still dogs global communications.

Nova-3 also introduces self-serve customization, a deceptively simple feature that could save businesses days or weeks of retraining models. “You can now ‘prompt’ it with up to 100 critical keywords — think prescription names, menu items, or legal phrases — and it’ll transcribe those correctly, right out of the gate,” Rutgers explains.

Built for noise, not perfection

Deepgram has spent the better part of a decade positioning itself as an antidote to the limitations of traditional voice technology. Founded in 2015, it built its own deep learning architecture from scratch — a system that could process raw audio directly rather than relying on pre-packaged models designed for ideal acoustic conditions.

That decision is paying off now, as Nova-3 takes aim squarely at the problem of noisy, unpredictable environments.

“Nova-3 wasn’t built in a vacuum — it was shaped by the real, messy, high-stakes challenges enterprises deal with every day,” Rutgers says. “Think noisy call centers, drive-thrus with overlapping voices, hospitals where accuracy isn’t a nice-to-have — it’s life-or-death.”

Inside, Nova-3 uses a compressed audio representation that helps it adapt to underrepresented sound conditions — background chatter, echoey rooms, even overlapping voices in control towers. Unlike earlier models that filtered out “bad” data, Deepgram trained Nova-3 on it.

“Nova-3 was built with real-world chaos in mind — not pristine studio recordings,” Rutgers says. “It thrives in the places where other models struggle: echoey boardrooms, busy call centers, drive-thrus with honking cars and background chatter, even overlapping conversations.”

And accents? “Nova-3 doesn’t flinch,” Rutgers says. “It was trained on an incredibly diverse dataset designed to reflect how people actually talk — across regions, dialects, and speech patterns. So whether someone’s ordering food in the South, explaining symptoms with an Irish lilt, or switching between Hindi and English in a customer support call, Nova-3 holds its own. It doesn’t just transcribe — it listens like a human who’s actually paying attention.”

The race for understanding

The market for voice AI is booming — and competitive. Analysts estimate the global speech recognition industry could reach more than $50 billion by 2030, driven by automation, customer service, and accessibility applications.

Tech giants like Google, Microsoft, and OpenAI have poured billions into their own speech platforms, and OpenAI’s Whisper has emerged as a popular open-source alternative for developers. Yet Rutgers says Deepgram is now outpacing all of them.

“In real-world head-to-head comparisons, Nova-3 blows its competitors out of the water,” she says.

According to Deepgram’s internal benchmarks, Nova-3 shows a 54% reduction in word error rate (WER) on streaming audio and nearly 47% improvement on batch data compared with the next-best models. In blind tests, it was preferred over Whisper in all seven languages tested, in some cases by an eight-to-one margin.

It’s also fast — up to 40 times quicker than competing diarization-enabled speech-to-text models — and, Rutgers points out, cheaper: streaming starts at $0.0077 per minute, more than twice as affordable as major cloud providers.

Those gains could make a difference as voice AI becomes less about curiosity and more about infrastructure — powering customer service, transcription, and even diagnostics.

From experiment to everyday tool

Rutgers is convinced that the next phase of AI won’t be about typing prompts but about talking naturally. “Over the next few years, I think voice AI is going to shift from being a tool people experiment with to something that's fully embedded in how we communicate, work, and interact with technology,” she says.

She imagines systems that understand tone, context, even intent — and respond accordingly. “Our goal at Deepgram is to make it possible for people to simply talk — and have systems understand, respond, and even act on those conversations in real time.”

In that future, she says, voice becomes the “invisible layer that just works,” embedded in everything from customer support desks to healthcare record systems and voice-enabled pharmacy kiosks.

“It’s about understanding context, tone, intent, even unique vocabulary,” Rutgers says. “That level of intelligence unlocks more natural, responsive systems — and, frankly, better human-machine relationships.”

In three years, she adds, “we hope it’ll feel totally normal to talk to your tech like you talk to a colleague — and trust that it gets you.”

Deepgram still faces the same challenge as every AI company: translating technical brilliance into trust and adoption at scale. But if Nova-3 performs as Rutgers claims, the company may finally have cracked a long-standing problem in voice technology — making machines that can actually hear the world as it sounds, not just as it’s supposed to.