TANAY.SHAH
← FIELD REPORT/BLOG/HOW-TO-EVALUATE-A-FOUNDING-ENGINEER-2026
// PUBLISHED 2026-05-10· 11 MIN READ

How to Evaluate a Founding Engineer in 2026: A Playbook for Seed-Stage Founders

Most founders pattern-match founding-engineer hires off senior-IC interview rubrics and end up with the wrong person. Here is the four-dimensional grid I use, the specific signals that predict success, and the anti-patterns that look strong in interviews but stall the team six weeks in.

I have spent the last fourteen weeks as the founding engineer of an AI YC F25 company. Before that, I had three early-stage founding-engineer-shaped reps at smaller AI startups in NYC. So I have been on both sides of this hiring decision: I have been hired by founders who got it right, hired by founders who got it wrong, and watched friends across the YC F25 batch sit through the same five interview rubrics that produce the same five wrong hires. This post is the framework I wish every seed-stage founder used before they wrote the offer letter, written from the seat of someone who has actually been the hire.

The single most common failure mode I see is founders importing the senior-IC interview rubric — leetcode, system design at FAANG scale, hour-long architecture chat — and being surprised six weeks in when their new hire ships beautifully written code that took three weeks to land instead of two days. The senior-IC rubric measures depth on a stable codebase. Founding engineering is a fundamentally different job: you are shipping zero-to-one in a codebase that did not exist yesterday, you are on call for the production aftermath, you are making architecture decisions with one user and zero data, and you are doing all of it inside the same week that the founders pivoted the product wedge. The rubric has to match the job.

The two axes that actually matter

I evaluate founding-engineer candidates on a two-by-two: zero-to-one velocity on one axis, ownership depth on the other. Most candidates are strong on exactly one of the two. The interview's job is to figure out which (or which neither, or — rarely — which both).

Founding-engineer evaluation matrixTwo-by-two evaluation grid for founding engineers. X-axis is zero-to-one velocity (greenfield throughput, scope-to-ship time, autonomous decision rate). Y-axis is ownership depth (production on-call, debug-from-symptoms, post-mortem authorship). Top-right quadrant — high velocity AND high ownership — is the only quadrant that ships founding-engineer outcomes. Top-left is a senior IC who will stall the team. Bottom-right is a hacker who will break production. Bottom-left is a wrong hire.AXESQUADRANTSHIRE↑ OWNERSHIP DEPTHon-call · debug-from-symptoms · post-mortemsTOP-LEFTSENIOR ICowns production · debugs deeplybut greenfield throughput is lowwill stall the seed-stage teamTOP-RIGHT — HIREFOUNDING ENGINEERships zero-to-one fast ANDowns the production aftermaththe only quadrant that ships outcomesBOTTOM-LEFTWRONG HIREneither greenfield velocitynor ownership depth — passBOTTOM-RIGHTHACKERships fast in greenfield butbreaks production · skips on-callwill leave debt that compoundsZERO-TO-ONE VELOCITY →scope-to-ship time · autonomous decisions · greenfield throughput

Zero-to-one velocity is the rate at which the candidate goes from a vague problem statement to a working artifact in production. Not the rate at which they refactor an existing system; not the rate at which they migrate a service. Greenfield throughput. Ownership depth is whether they treat the production aftermath as their job — taking the page at 2am, reading flight recorders to debug a customer-specific bug they did not write, writing the post-mortem the way the team will reference it three quarters later.

Top-right is the founding engineer. Top-left — high ownership but low greenfield velocity — is a senior IC: a great hire for series-B engineering teams, the wrong hire for seed. They will obsess over architectural correctness on a problem that needed a rough draft a week ago. Bottom-right — high velocity but low ownership — is the hacker pattern: ships a beautiful greenfield demo on day three, can't tell you why the production system breaks under realistic load on day thirty, drops the on-call pager in someone else's lap. Hackers ship debt that compounds. Bottom-left is a wrong hire for any seat at this stage.

Six dimensions I actually grade against

The two-axis matrix is the headline frame, but the actual grading happens against six dimensions that sit underneath it. I score each one zero to three on a real interview: zero is a red flag, one is a worry, two is competent, three is the signal you came to find. A founding engineer needs at least a two on every dimension and a three on at least three of them. Below two on any single dimension is a no-hire, even if the other five are tens — the gap will surface in week six.

  • Scope-to-ship time. Hand them a vague, real problem from your roadmap. Watch how fast they get to a working prototype. Strong candidates carve scope ruthlessly, ship a minimal end-to-end vertical slice, and propose the next slice on top. Weak candidates ask for the spec, ask for the figma, ask for the data model, and burn the first day asking instead of shipping. The signal is not whether they ask questions — they should — but whether they ship in parallel with asking.
  • Production debugging from symptoms. Show them a production incident where the user-visible symptom does not point at the underlying cause. Strong candidates start from the symptom and work backwards through layers — request, app, database, infrastructure — building hypotheses they can falsify. Weak candidates pattern-match to the last incident they saw and propose its fix. This is the single best predictor of ownership depth I know of.
  • Architecture under uncertainty. Ask them to design a system where you do not yet know the workload shape, the throughput, or the customer profile. Strong candidates reach for primitives that are cheap to change (Postgres, plain HTTP, append-only logs); weak candidates reach for infrastructure that is correct only at scale you will not hit for two years (Kafka, sharded MongoDB, microservices). The senior-IC rubric punishes the simple answer. The founding-engineer rubric rewards it.
  • Communication density. How much signal do they pack into a Slack message? A founding engineer writes one paragraph that contains the thing and the why; a junior writes ten messages and ends without a decision. Density is a proxy for clear thinking, and clear thinking is a proxy for everything else. The fastest live test: ask them to summarize a 300-line PR they shipped last week into three sentences. Strong candidates do it without breath; weak ones recite the diff.
  • Decision-making under information starvation. Founding engineers ship with one user, no data, and a roadmap that pivots quarterly. The job is to make irreversible decisions on bad information and live with the consequences. Ask: 'tell me about a decision you made with bad data that you would still make again, knowing how it turned out.' Strong candidates can name the call, the alternative they ruled out, and the lesson they took. Weak candidates frame every past decision as a team consensus.
  • Calibration on what is hard. Strong founding engineers know which 20% of the codebase is genuinely load-bearing and treat the rest as disposable. Weak candidates treat all code as equally precious and refactor the disposable layer instead of shipping the load-bearing one. Ask them what they would throw away first if they had to cut the codebase in half. The answer reveals their model of the system, not just their tenure on it.

The interview format that actually surfaces these

A leetcode round measures none of the six. A whiteboard system-design round measures one (architecture under uncertainty), badly, in a setting where the candidate is performing rather than thinking. A take-home that mimics a real spec measures two or three of them, but only if you grade the actual artifact, not the README polish. The interview format I would use, in order:

  • A working session, not a whiteboard session. Two hours, shared codebase (a sanitized fork of yours, not a leetcode harness), real problem from your backlog. Pair-program with you, your CTO, or your strongest existing engineer. You're watching the candidate compose actual code under uncertainty, ask the right questions at the right moments, and decide what not to build. The artifact at the end is interesting; the process to get there is the signal.
  • A production-incident chat, not a system-design chat. Pull a real post-mortem from the last three months, hand them the symptom and the eventually-discovered cause, and ask them to retrace the debugging path that should have caught it sooner. You are measuring instinct on what to look at next, not memorized failure modes.
  • A roadmap-tradeoffs chat with the founder. Half an hour, just the founder, asking the candidate to advocate for a controversial technical position against a counterargument. You are measuring two things: how strongly they hold positions on incomplete information, and whether they update when the counterargument has merit. Both are necessary; either alone is a red flag.
  • A reference call with their last manager — and one peer. The peer matters more than the manager. Managers describe what people delivered; peers describe what they were like to work alongside at 11pm on a Friday. Founding engineering is mostly that 11pm-on-a-Friday energy.
  • A live post-offer trial — paid — for one to two weeks if you can swing it. This is the single highest-fidelity signal anyone has ever invented for engineering hires. Pay full rate, treat it as a bilateral evaluation, and if it does not work for either side at the end of week two, walk away on good terms. I would take a two-week trial over any number of interview loops.

Anti-patterns that look strong in interviews

The anti-patterns are the ones I see seed-stage founders fall for repeatedly. They look like signal because they pattern-match to the senior-IC rubric the founder learned from their last big-tech job. They are not.

  • The infrastructure peacock. Spends thirty minutes of system design lovingly architecting the Kafka topology for a product that has eight users. The signal is not depth on Kafka; the signal is that they reach for production-scale infrastructure when the right answer is a Postgres table. Pass on this candidate, no matter how impressive the explanation.
  • The architecture astronaut. Will not ship without a 'proper' service boundary. Wants to discuss the abstraction before the implementation. Six weeks in, you have a three-service system with one feature and zero customers. The patch: a working session that forces them to commit code in the first thirty minutes and surfaces the resistance.
  • The big-tech polyglot. Worked on five FAANG-scale systems, three of them load-bearing. Did not own any of them end-to-end. Excellent on the architecture-under-scale dimension; ungraded on every other one. The reference call is the only thing that surfaces this — ask 'what would they have done differently if it had been theirs to ship from scratch?' The good ones have answers; the polyglots stall.
  • The portfolio-driven hacker. Has shipped twelve side projects, three of which got Hacker News coverage. Beautiful demos. Ask them about production: 'I open-sourced it' or 'a friend ran it for a while.' Zero ownership depth. They will be your fastest week-one shipper and your slowest week-twelve shipper, because the load-bearing 20% will start to fall over and they will be uninterested in catching it.

The compensation conversation

If you are a seed-stage founder hiring a founding engineer in NYC in 2026, here is the math the strongest candidates run on the offer you give them. Cash typically ranges from $160K to $220K. Equity typically ranges from 0.5% to 2.0% on a four-year vest with a one-year cliff, and the strongest candidates know that the spread within that range is what predicts whether you actually want them or you are filling a seat. Lowballing equity for a founding engineer is the single most common founder mistake I see. The economic model is that the founding engineer is taking 60-80% of a market-rate IC offer in cash, in exchange for the equity tail; if the equity offer does not look like a real piece of the company, the candidate is not a founding engineer, they are a discounted IC. Run the math out.

Before you write the job description, write down honest answers to these five questions. The answers determine whether you are about to make a great hire or a wrong one — and the candidates you really want will ask you variants of all five within the first thirty minutes.

  • What is the load-bearing technical problem the founding engineer is solving in their first ninety days? If you cannot name it concretely — not 'build the platform,' but 'ship the agent runtime to the first three design partners' — the role is not ready to hire for.
  • Who do they report to, and what does the founder cadence look like? A founding engineer working under a director-of-engineering at seed stage is being misallocated. Either they report to the founders directly, or you should not hire one yet.
  • What is the equity grant, and what is the post-money valuation that grant assumes? The strongest candidates will run the math on whether your equity offer matches the role title. If your numbers don't work, fix the numbers, don't try to hide them.
  • What is the on-call expectation, and who shares it? Founding engineers do on-call. Know whether you are asking for primary, secondary, or rotation, and be honest about it in the offer conversation.
  • What does success look like at month three, month six, month twelve? Write it down before the interview, and tell the candidate. The strongest candidates will use it as the rubric they grade themselves against in the trial period.

A note from the other side of the table

I am writing this from the seat of someone who has been the founding engineer at three companies and is currently in conversations with seed-stage AI founders in NYC about the next one. The frame I have given you is the frame I wish every founder I have talked to in the last six months had used before the first call. The good ones already use it. The other ones spend two months on the search, hire someone who does well on a leetcode round, and call me eight weeks later because the build slipped two months and they don't know why. The why is that they hired against the wrong rubric.

If you are a seed-stage NYC AI founder running this search right now and want to test the framework against a candidate, or if you want to talk through whether the role you are about to post is actually a founding-engineer role or a senior-IC role in disguise, I am at tanayshah2024@gmail.com. The answer to that question alone will save you a quarter of runway.