Skip to main content

Voice Sims: test agents in real world conditions before they talk to your customers

Every day, Sierra’s customers simulate tens of thousands of conversations with mock user personas to make sure their agents are ready for prime time. Voice Sims — a pioneering feature that tests voice agents in real world conditions before they talk to a single customer — is key to this process.

Voice is harder than text

Voice poses very different challenges from text: the cadence of conversations matters enormously — when to stop and start talking, or pause; words can be easily misunderstood given accents, background noise, industry acronyms, and bad connections; and a flat tone or poor word choice can make even a correct answer land badly.

It’s these messy, human details that make voice so challenging, and yet so magical when it works. Dedicated voice testing ensures you end up with a fluent, natural sounding agent — not a stochastic parrot or talking chatbot.

Introducing Voice Sims

Voice Sims enable you to create multiple “users,” who speak different languages, have different needs, call from different locations (at home with the TV on, from the street, on a train), in different emotional states, and in different situations. Once generated, you can run them multiple times before, during, or after launch — and the results are then evaluated by another agent.

See Voice Sims in action.

Voice Sims run in parallel with other modalities : sharing the same evaluation infrastructure; plugging into Sierra’s Agent Studio and your CI/CD pipelines; and gating releases just like unit tests. Under the hood it’s a real, production grade voice conversation that tests every dimension of a call — not just the words on a page, but the way they are spoken, understood, and heard.

  • Hear how your agent will perform in the real world. This includes: speech to text accuracy — the ability to transcribe what a consumer says accurately regardless of their language, accent, speaker set-up or background noise, as well as industry specific or technical terms; speech accuracy — the ability to read back license plates, account numbers or dates of birth correctly; and its overall behavior — pausing when interrupted, asking for missing context, phrasing responses appropriately, and generally keeping the conversation moving along.
  • Measure your agent’s emotional intelligence. Not every caller is calm and patient. Some are confused, frustrated, or even angry. Voice Sims can create these emotional scenarios and reveal whether your agent responds appropriately — apologizing where needed, adopting a reassuring tone, going faster or slower, avoiding robotic phrasing that inflames an already heated situation. It’s not enough to get the facts right, calls have to feel empathetic and human too.
  • Evaluate the entire voice stack end-to-end. Voice Sims pinpoint whether errors come from recognition (inaccurate transcription due to background noise or accents), reasoning (policy gaps or logic errors), or synthesis (an unnatural pitch, poor intonation or mispronunciations). Just as importantly, they measure latency and turn-taking — making sure your agent doesn’t pause too long, speak over the customer, or otherwise break the rhythm of a natural conversation.
  • Assess whether your agent is sticking to your design rules and guardrails. For example, agents must authenticate users with spoken dates of birth or addresses instead of “magic links,” avoid reading out long URLs, and provide keypad fallbacks when needed. Voice Sims encode these rules as automatic checks, so every release respects them without relying on manual quality assurance testing.
  • Measure performance across releases. Voice Sims enable you to view and aggregate key performance metrics like latency or error rates over time — making it easier to identify and avoid regressions as you upgrade your agent. On a per journey basis, you can diagnose and fix bugs before they reach production, and understand precise improvements release over release.

Easy, intuitive to use

Voice Sims sit alongside your chat simulations in Agent Studio, Sierra’s no code tool for building and managing agents. As you create new journeys or information your agent needs to handle, you can automatically simulate different modes of conversation (voice, chat, etc). This makes it easy to ensure your test suite is robust, inspect transcripts alongside audio recordings, scrub through playback, and jump directly to a point of failure — for example, if a caller spelled out an email address and the agent misheard the domain.

Why it matters

The impact of Voice Sims is simple but profound: agents become more reliable and more empathetic, and conversations more natural. By continuously creating simulated conversations, you catch brittle “only in calls” bugs early. By testing noisy, emotional, interrupted conversations on repeat, you refine prompts, and delivery until they sound right. And because it’s automated, teams can run fast — re-doing the same tough scenarios on every change, just like with unit tests, to ensure the agent is launch ready and improving over time.

For years, it was prohibitively expensive for most companies to talk to their customers. AI is changing all that, and adoption of voice is now skyrocketing. Sierra's platform will handle hundreds of millions of our customer's calls this year. Voice Sims help businesses put their agents where their customers are, on the phone, in the wild, confused or frustrated, with dogs barking and kids yelling — ensuring they can handle all that human complexity and emotion naturally, and with grace. It’s how we can create better, more human customer experiences with AI, while also driving better business outcomes for companies.

Share

See what Sierra can do for you.

Find out how Sierra can help your business build better, more human customer experiences with AI.