リソースResearch𝜏³-bench: Voice𝜏³-bench: Voice𝜏³-bench is here and we've expanded agent evaluation to voice.ダウンロード
𝜏-bench: benchmarking AI agents for the real-worldSierra’s AI research team is on a mission to advance the frontier of conversational AI agents. In this research paper, we present a new benchmark for evaluating AI agents' performance and reliability in real-world settings, with dynamic user and tool interaction.2024年6月17日ProductThought leadership
𝜏²-bench: evaluating conversational agents in a dual-control environment𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space.2025年6月11日ProductThought leadership
𝜏³-bench: Knowledge𝜏³-bench is here and we've expanded agent evaluation to knowledge.2026年3月18日ProductThought leadership