Skip to main content

𝜏²-bench: evaluating conversational agents in a dual-control environment

𝜏²-bench challenges AI agents not just to reason and act, but to coordinate, guide, and assist a user in achieving a shared objective. This leap from solo operation to co-ownership of a task pushes agents into a much more demanding space. And, critically, it reflects the kinds of tasks AI agents are increasingly being asked to perform in the real world.

Download
Tau Bench cover

See what Sierra can do for you.

Find out how Sierra can help your business build better, more human customer experiences with AI.