𝜏-bench: benchmarking AI agents for the real-world

Sierra’s AI research team is on a mission to advance the frontier of conversational AI agents. In this research paper, we present a new benchmark for evaluating AI agents' performance and reliability in real-world settings, with dynamic user and tool interaction.

ダウンロード

View more resources

𝜏²-bench: evaluating conversational agents in a dual-control environment

𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space.

2025年6月11日

製品ソートリーダーシップ

𝜏³-bench: Voice

𝜏³-bench is here and we've expanded agent evaluation to voice.

2026年3月18日

製品ソートリーダーシップ

Sierraでできることを、ぜひご覧ください

SierraがどのようにAIを活用し、より優れた人間味あふれる顧客体験の実現をお手伝いするのかをご紹介します。

詳しく見る