𝜏-bench: benchmarking AI agents for the real-world

Sierra’s AI research team is on a mission to advance the frontier of conversational AI agents. In this research paper, we present a new benchmark for evaluating AI agents' performance and reliability in real-world settings, with dynamic user and tool interaction.

Descargar

View more resources

𝜏²-bench: evaluating conversational agents in a dual-control environment

𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space.

11 de junio de 2025

Producto Liderazgo de opinión

𝜏³-bench: Voice

𝜏³-bench is here and we've expanded agent evaluation to voice.

18 de marzo de 2026

Producto Liderazgo de opinión

Descubra lo que Sierra puede hacer por usted

Descubra cómo Sierra puede ayudar a su empresa a crear experiencias de cliente mejores y más humanas con IA.

Más información