𝜏²-bench: evaluating conversational agents in a dual-control environment
𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space. And, critically, it reflects the kinds of tasks AI agents are increasingly being asked to perform in the real world.
Download