𝜏²-bench: evaluating conversational agents in a dual-control environment

𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space. And, critically, it reflects the kinds of tasks AI agents are increasingly being asked to perform in the real world.

ダウンロード

View more resources

𝜏-bench: benchmarking AI agents for the real-world

Sierra’s AI research team is on a mission to advance the frontier of conversational AI agents. In this research paper, we present a new benchmark for evaluating AI agents' performance and reliability in real-world settings, with dynamic user and tool interaction.

2024年6月17日

製品ソートリーダーシップ

𝜏³-bench: Voice

𝜏³-bench is here and we've expanded agent evaluation to voice.

2026年3月18日

製品ソートリーダーシップ

Sierraでできることを、ぜひご覧ください

SierraがどのようにAIを活用し、より優れた人間味あふれる顧客体験の実現をお手伝いするのかをご紹介します。

詳しく見る