𝜏-Bench: Benchmarking AI Agents for the Real-world
Sierra’s AI research team is on a mission to advance the frontier of conversational AI agents. In this research paper, we present a new benchmark for evaluating AI agents' performance and reliability in real-world settings, with dynamic user and tool interaction.