𝜏-knowledge: benchmarking agents on real-world knowledge𝜏-Knowledge measures how well agents can work through messy, evolving knowledge bases to complete complex, multi-step tasks. While models are improving, they still struggle to reliably use this information in practice, leaving a large gap to real-world performance.2026年5月13日
𝜏³-Bench: Advancing agent benchmarking to knowledge and voice𝜏³-Bench is here. We've expanded agent evaluation to two new frontiers: knowledge retrieval and voice.2026年3月18日
Improving voice performance with post-trainingPost-training helps our customers' voice agents achieve shorter, clearer, and more human-like conversations.2025年11月12日
𝜏-Bench leaderboard: compare, explore, and understand agent performanceIntroducing the 𝜏-Bench leaderboard — a community-driven platform where researchers can submit, verify, and compare results while exploring model behavior through interactive tools.2025年10月13日