When Brooke Hopkins first joined Waymo in 2020, it was far from certain that self-driving cars would ever work. Many people believed autonomous vehicles would never be on the road, saying they could never be trusted to navigate the real world, too difficult to scale, and wouldn’t be truly autonomous.
Brooke led the evaluation infrastructure team responsible for running millions of simulations to prove that such a future could exist. From coding the first evaluation system to managing 20 petabytes of data, Brooke built simulation scheduling infrastructure across distributed compute, and owned everything from low-level database schema to the developer tools engineers relied on every day.
Today, that work speaks for itself as Waymo operates autonomous vehicles across the U.S. with plans to expand internationally.
When Brooke turned her attention to the voice AI landscape, she found the structural similarity almost immediately. Like autonomous vehicles, voice agents navigate non-deterministic paths from point A to point B, and traditional unit testing fails the moment the environment changes.
For example, how do you ensure that customer support calls are being handled correctly each time?
The uncomfortable truth for every enterprise looking to deploy voice AI agents is that they have no reliable way to know if it works. Not before it goes live, or at scale, or across the thousands of edge cases that are likely to reveal themselves only when a customer is on the other end of the line.
The infrastructure category needed to solve this problem, including simulation, observability, and evaluation systems for voice agents, is still being built. Meanwhile, deployment is moving faster than the tooling needed to test and monitor these systems reliably.
That gap is what drew Brooke to start Coval and build the infrastructure for voice agents the same way that Waymo built it for self-driving cars. Today, we’re excited to announce Norwest’s investment in Coval’s $28M Series A funding round.
Voice Agents Are Scaling Faster Than the Infrastructure Around Them
We are still early in the shift from text-based AI to autonomous, multimodal agents that can operate on behalf of users. Voice is emerging as one of the most important interfaces in that transition because it mirrors how humans naturally communicate.
Contact centers have become the first major battleground for enterprise AI. Agents are being deployed by the hundreds inside large enterprises that are routing more volume through automated voice workflows.
The global voice recognition market is estimated to have surpassed $22 billion this year, enterprise adoption has tripled, and Gartner says contact centers will save $80 billion in 2026 from conversational AI alone.
As enterprises move from experimentation to production deployments, reliability and operational visibility become non-negotiable. The experts we spoke with estimate enterprise voice agent deployments will grow 3 to 5x in the next few years, with monitoring and evaluation infrastructure representing a meaningful share of total spend at scale.
Deploying voice agents at that scale is genuinely hard. They are non-deterministic, they do not follow scripts or give the same output twice, and an update to the underlying LLM can break a call flow that worked fine the week before.
Traditional software testing like unit tests, regression suites, and manual QA break down almost immediately in this environment, and enterprises building on voice AI have no CI/CD equivalent, no release gate, or systematic way to catch failures before they reach customers.
As voice AI becomes critical infrastructure for the enterprise, companies will need a platform that allows enterprises to trust and safely scale it. Coval is building this operations layer, providing a single pane of glass that shows where agents are going wrong, how users are responding, and everything in between.
What Coval Builds
Coval’s platform has three interconnected pillars, each feeding into the next.
Simulation. Before any voice agent goes live, Coval generates synthetic scenarios and runs the agent through them at scale, covering regression testing, edge case discovery, and CI/CD integration. Enterprises can gate releases on Coval test results the same way they would with any other software quality check.
Observability. Once an agent is in production, Coval monitors live calls, surfaces anomalies, and tracks performance across full call volume. This is the layer that catches the failures simulation did not anticipate.
Human-in-the-loop labeling. Automated evaluations improve when human reviewers validate edge cases, and those validated cases become new simulation scenarios. The loop compounds: better simulation surfaces more unknowns, better observability catches more edge cases, and better labeling makes the automated evals more accurate over time.
The long-term vision extends further into what Coval calls the “Voice OS,” a unified control plane across every voice agent an enterprise runs, covering monitoring, compliance, fine-tuning, and inference in one place. When we pressure-tested that framing with enterprise buyers during diligence, the response was consistently strong. Enterprises building voice AI at scale are not looking for another point tool, rather they need a platform that ties it all together.
Going After the Enterprise
Coval is purpose-built for enterprise buyers, and that focus is visible in every product decision. The simulation infrastructure, the compliance-grade observability, the human-in-the-loop labeling pipeline are not features that matter to a startup running a handful of voice flows. But, they are the requirements that come up when a large enterprise is running thousands of agents across multiple lines of business and needs to know, with confidence, that they work.
That need is already showing up across industries. Coval has worked with companies ranging from AI-native leaders like Perplexity to enterprises operating high-volume customer support and operational workflows across financial services, healthcare, insurance, and travel.
The enterprises we spoke with during diligence were not asking whether this category should exist. They were asking who would own it. Coval’s combination of technical depth, founder pedigree, and clarity of product vision puts them in a strong position to become the de facto platform solution in this space.
Why Now, Why Coval
The voice AI evaluation market is forming now, and the window to build the foundational infrastructure layer is open. Brooke’s background is the most direct answer to why Coval is positioned to win it. She did not study voice agent evaluation from the outside; she built the infrastructure for a harder version of the same problem at one of the most demanding engineering organizations in the world. That domain depth shows up in how she runs the company, in how customers and prospects describe her, and in the product decisions Coval has made from the beginning.
We believe voice AI infrastructure follows the same arc as cloud infrastructure a decade ago: the category looks like a tool today and becomes a platform over time. The companies that own the evaluation and observability layer in the early innings tend to own the relationship with the engineering teams who build on it. That is the position Coval is building toward, and we are proud to be part of the journey.

To learn more about Coval, visit coval.ai. If you’re building in voice AI infrastructure, we’d love to connect. Reach out to Scott at [email protected] or Nikhil at [email protected].

