𝜏³-bench: Voice𝜏³-bench is here and we've expanded agent evaluation to voice. March 18, 2026ProductThought leadership
𝜏³-bench: Knowledge𝜏³-bench is here and we've expanded agent evaluation to knowledge.March 18, 2026ProductThought leadership
𝜏²-bench: evaluating conversational agents in a dual-control environment𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space.June 11, 2025ProductThought leadership
𝜏-bench: benchmarking AI agents for the real-worldSierra’s AI research team is on a mission to advance the frontier of conversational AI agents. In this research paper, we present a new benchmark for evaluating AI agents' performance and reliability in real-world settings, with dynamic user and tool interaction.June 17, 2024ProductThought leadership