The AI-native interview

Coding agents like Codex and Claude Code are upending software engineering as we know it. The role is shifting from building the machine to designing and honing it. Much like engineers stopped worrying about how a compiler translates code into machine instructions, we now need to focus less on the precise lines of code that are written and more about whether it produces the right outcomes over time.

This shifts what we should evaluate in interviews. When a single engineer can build across the stack, leverage comes from combining technical ability with product thinking and business context. They don’t just write code. They define scope, make tradeoffs, and iterate with customers to deliver impact. We’ve redesigned our engineering interview process from the ground up to reflect this new reality.

Framing the problem

Sierra’s engineering interview process had been fairly standard: two coding interviews plus interviews for algorithms, system design, and culture fit, followed by reference checks. It’s a well-understood, scalable approach, and for a long time it worked.

But recently, something started to feel off. Much of the signal we got from this interview was about mechanics; typing syntax into an editor, remembering algorithm details, stitching frameworks together. This felt increasingly dissonant from the new reality of our work. The gap showed up most clearly in debriefs. In the absence of clear interview signals, hiring managers leaned more heavily on referrals and prior experience.

We started building an AI-native interview process with three key attributes:

Representative: Reflects the work engineers actually do day to day, capturing initiative, ownership, judgment, system understanding, and product thinking.
High signal: Gives us clarity about where a candidate could excel, and where they may need support.
Positive experience: Feels engaging and authentic for candidates, so that when we make an offer, they’re excited to say yes.

Introducing the AI-native onsite

We removed our coding and algorithms interviews and replaced them with an AI-native onsite:

Plan: A working session with the candidate to define a product to build. The candidate drives ideation, while interviewers ask questions to strengthen it. We focus on an idea in the candidate’s domain so we see their product thinking in action.
Build: The interviewer steps out and the candidate brings the idea to life over 2 hours, using the AI tooling and frameworks of their choice. They have complete freedom to pivot or adjust scope as they go.
Review: The candidate demos what they’ve built. We debate the key product flows and choices they made; review the code to understand their technical judgment (data model, abstractions, extensibility, etc.); and discuss the path to production. We also dig into how they used AI along the way.

We’ve found this new format to be much more effective. Because candidates can actually build during the onsite — rather than just talk about what they might build — it’s both more representative of the work, and produces higher signal. It’s much easier to gauge their agency (do they pivot when they get stuck?) and judgement (how do they scope what to build within the time constraints?).

It’s also more engaging, even if candidates are nervous at the start. To set expectations, we share evaluation criteria and advice ahead of time. For example, it’s OK to cut your scope as you build, and to skip boilerplate (CRUD, auth) to focus on what’s unique. As Paul Buchheit, the creator of Gmail, put it: if it’s great, it doesn’t have to be good.

Rounding out the rest of the process

As we’ve honed the AI-native onsite, we’ve also rethought the rest of the interview process. Our coding phone screen still required candidates to write code without AI into an online editor. But vibe-coding an app is easy. The harder, more relevant, problem is getting it into production in a scalable way. So we replaced the phone screen with a system design interview to better reflect that.

While the AI-native onsite tests for product sense and building 0->1, it doesn’t capture taking a feature from 1->N in an existing, messy codebase. To address this, we’re piloting a debugging interview. Candidates are given a medium-sized codebase and a draft PR from a colleague that introduces a cross-cutting feature. Their job is to review and improve it — pulling down the code, inspecting the output, and iterating with coding agents to make it better. The level of AI used in this interview is still TBD, as new models can zero-shot many fixes.

What did we learn?

We’re hiring for strengths, not just an absence of weakness. This approach gives us much richer signal about a candidate’s spikes and gaps. For example, some people excel at product strategy and initiative but have holes in their system understanding. Our debriefs have shifted from “should we hire this person?” to “where would this person thrive, and how do we support them?”

We ask every candidate for feedback, and many have said this was the most fun they’ve had in an interview. One trivia enthusiast built an AI-powered game intended to keep the user in a state of flow — the demo just involved the interviewer playing it. In another case, a backend engineer built a headless simulation tool and used an agent with a markdown file to walk through the demo.

This format isn’t without challenges. It’s open-ended, which makes it harder to standardize. To mitigate this, we’ve developed a set of evaluation criteria that are agnostic to what the candidate builds, and we run interviews in pairs to improve calibration. We also debated whether this approach applies for infrastructure, and concluded that it does — many infrastructure engineers now build full-stack tools or agents and work closely with product to vertically integrate with what customers need. That said, we’ve amended the interview slightly to better capture the signal we need for infrastructure.

The emergence of highly proficient coding agents is forcing us to reimagine Sierra from the ground up — from how we build (using agents to build and optimize agents with Ghostwriter), to how we hire. Given the pace of change, this is just the beginning. And yes, we’re still hiring. So if you’re interested in helping build this with us, learn more here: sierra.ai/careers.

The AI-native interview

Subscribe to the Sierra blog

Framing the problem

Introducing the AI-native onsite

Rounding out the rest of the process

What did we learn?

Subscribe to the Sierra blog

Related posts

Agents as a Service

μ-Bench: an open multilingual transcription benchmark

𝜏³-Bench: Advancing agent benchmarking to knowledge and voice

Discover what Sierra can do for you

Related posts

Agents as a Service

μ-Bench: an open multilingual transcription benchmark

𝜏³-Bench: Advancing agent benchmarking to knowledge and voice