If you landed here, you’re probably staring at a calendar invite thinking: “What interview questions to ask candidates will actually predict job performance—not just sound smart?”
Good news: there’s 100+ years of research on what works (and what doesn’t). The short version:
Use structured questions tied to job requirements.
Score answers with anchored rubrics.
Add work-sample style prompts when possible.
Avoid illegal or biased questions—there’s no upside.
Below you’ll find a practical, copy-paste playbook with ready-to-use question banks, scoring anchors, and a 45-minute script you can run today—plus links to gold-standard research.
Meta-analyses in industrial-organizational psychology show that structured interviews (same questions, same scoring) have far better validity than unstructured chats. They also improve reliability between interviewers.
Classic and updated research finds that combining job-related interviews with work samples and other evidence (e.g., job knowledge or cognitive measures) predicts job performance substantially better than gut feel.
Interviews can be faked (image management, ingratiation). Structure, good probes, and behavioral specificity reduce the opportunity to game the process.
The “what not to ask” list isn’t just etiquette—it’s law. The EEOC outlines pre-employment inquiries to avoid (age, family status, disability, etc.). Build your question set to be job-related and consistently applied.
Bottom line: great interview questions are job-derived, structured, and scored. Everything below follows that model.
Start with requirements. List 4–6 outcomes or must-have capabilities for the role (e.g., build analytics stack; LTV/CAC modeling; experimentation).
Assign weights. If “LTV modeling” matters 25% and “nice-to-have design skills” 5%, your question mix should mirror that.
Choose question types.
Behavioral (“Tell me about a time…”) for past evidence.
Situational (“What would you do if…?”) for future problem-solving.
Work-sample (case, brief critique, quick whiteboard) for hands-on signal. Research consistently supports job-sample style tasks.
Rate 1–5 using behavioral anchors:
5 — Excellent: Clear situation → action → measurable outcome; independent ownership; sound reasoning; constraints considered; transferable playbook.
4 — Strong: Solid example with outcomes; minor gaps in rigor or metrics.
3 — Mixed: Some relevant actions, but vague metrics or shared credit only.
2 — Weak: Generalities (“we did”), no depth, can’t explain trade-offs.
1 — Concern: Dodges question; blames others; no evidence of competence.
Probing prompts you can reuse:
“What alternatives did you consider?”
“How did you measure success?”
“What would you do differently next time?”
Below are ready-to-use questions. Each block includes behavioral, situational, and an optional work-sample prompt—plus what good looks like and red flags keyed to the rubric.
Adapt the categories to your role. The examples reference common SaaS roles (product, growth, data, ops) but the structure fits any job.
Behavioral: “Tell me about a time you made a decision with incomplete data. How did you de-risk it?”
Situational: “You have two conflicting stakeholder requests and one sprint. How do you decide?”
Work-sample (10 min): Give a simple metric anomaly (e.g., conversion drop) and ask for a first-hour triage plan.
5-level answer: Frames the problem, lists hypotheses, defines decision criteria, runs scrappy tests, quantifies impact, communicates risks.
Red flags: Anecdotes without constraints; “I just knew”; no post-mortem.
Behavioral: “Walk me through a project where you owned the outcome end-to-end.”
Situational: “Your partner team stalls on a dependency. What’s your plan?”
Work-sample: Present a vague goal; ask them to clarify scope, define milestones, and write owners.
5-level answer: Clear plan, proactive comms, measurable milestones, risk tracking, lessons learned.
Red flags: Passive voice; over-index on “waiting for others.”
Behavioral: “Describe a tough stakeholder and how you gained alignment.”
Situational: “A leader wants a feature that will hurt metrics. How do you push back?”
Work-sample: Ask for a one-slide update summarizing a project for execs.
5-level answer: Tailors message to audience, uses data + narrative, proposes alternatives.
Red flags: “I escalated immediately”; no empathy; no trade-off framing.
Behavioral: “Tell me about your most useful analysis last year—what changed because of it?”
Situational: “Sign-ups are flat, trials up, paid down. What do you analyze first?”
Work-sample: Give a small CSV excerpt (or dummy table) and ask for the next three queries they’d run.
5-level answer: Hypothesis → data needed → method → decision tied to metrics.
Red flags: Tool-name dropping; perfect-world data; no decision implications.
Behavioral: “Describe an experiment that failed. What did you learn?”
Situational: “You can run one test this month. Which and why?”
Work-sample: Provide a sample A/B test result; ask for interpretation and next step.
5-level answer: Designs testable hypothesis, checks power/guardrails, interprets lift with caveats, ships follow-ups.
Red flags: “We test everything”; misreads statistical significance; p-hacking vibes.
Behavioral (Data): “Walk through an event taxonomy you designed. What changed?”
Behavioral (Product): “How did you translate customer pain into a shipped feature?”
Behavioral (Sales/CS): “Tell me about saving an at-risk account.”
Work-sample: 5-minute critique of a real screen, sales email, or schema.
5-level answer: Shows specific artifacts (queries, PRD, call notes), quantitative results, lessons.
Behavioral: “What’s a belief you changed in the last year? What triggered it?”
Behavioral: “Tell me about teaching someone a skill.”
Behavioral: “When did you say no to your manager—and how?”
5-level answer: Growth mindset, blameless retros, ethical judgment.
Do not ask about age, marital/family status, religion, disability/health, pregnancy, or national origin. The EEOC provides clear guidance and alternatives (e.g., “Are you legally authorized to work in the U.S.?” rather than citizenship details).
Ask this instead:
“Can you perform the essential functions of the job with or without accommodation?”
“Are you able to work our required schedule?”
“Do you have work authorization for this country?”
(Bookmark the EEOC pages and your local/state guidance; train every interviewer.)
0:00–02:00 | Warm-up
Set expectations: “I’ll ask structured questions, take notes, and leave time for your questions.”
02:00–05:00 | Role context
One-minute pitch of the team + the outcomes that matter. Show the rubric.
05:00–30:00 | Five core questions
Pick one from each of the 5 categories most relevant to the role (above). Keep 2–3 probes ready. Score 1–5 after each answer (anchor to the rubric).
30:00–40:00 | Mini work-sample
Give a small prompt (e.g., triage plan; critique; SQL sketch). Scored with the same 1–5 anchors.
40:00–45:00 | Candidate Q&A
Assess questions for signal (e.g., goals, constraints, metrics).
Right after the call:
Record scores + evidence in your ATS/notes.
Capture verbatim examples that justify the score (improves reliability).
Why this works: it’s a structured interview with job-linked questions, a short work sample, and anchored scoring—each step has strong evidence behind it.
Use these verbatim when you need fast “interview questions to ask candidates.”
“Tell me about a decision your team resisted. How did you win buy-in?”
“Describe a time you changed a strategy after new data emerged.”
“What trade-off are you most proud of making?”
Probe: “What would have happened if you chose differently?”
Red flag: Appeals to authority only (“my VP said so”).
“Tell me about discovering a customer insight that changed a roadmap.”
“What’s a feature you killed? Why?”
“How do you balance speed vs. quality on deadlines?”
What good looks like: Evidence (calls, logs, tests), prioritization framework, outcome metrics.
“Walk me through your best growth loop or campaign—inputs, outputs, ROAS/LTV.”
“How did you measure incrementality?”
“Which channel would you pause today and why?”
What good looks like: Causal thinking (holdouts, geo tests), funnel math, CAC payback.
“Show me a time you improved data reliability.”
“Which metric fooled you once? How did you fix it?”
“If we had to delete 50% of events, which stay and why?”
What good looks like: Schema literacy, source of truth, trade-offs between cost and fidelity.
“Describe saving an account with low NPS.”
“Teach me your escalation framework.”
“When do you say ‘no’ to a customer?”
What good looks like: Root cause, cost to serve, expansion risk.
“Tell me about the toughest production incident you owned. Post-mortem?”
“How do you decide to refactor vs. ship a workaround?”
“What’s a design decision you reversed after new constraints?”
What good looks like: Reliability mindset, instrumentation, clear IC/lead boundaries.
Ask the same core questions of every candidate; it increases reliability and perceived fairness.
Score in the moment using anchors; memory is noisy.
Use two interviewers when possible; reconcile scores with evidence.
Beware first-impression and similarity bias; structured rubrics help. (See I/O literature on interview validity and faking.)
If you can only add one thing, add this:
10-minute artifact critique: Show a real screen, PRD snippet, SQL query, or email. Ask: “What’s good, what’s risky, what would you change first?”
15-minute triage drill: Present a simple metric change and ask for the first three moves.
Work-sample style tasks consistently show strong predictive validity—especially for experienced hires. (psychologie.uni-mannheim.de)
Role outcomes (weights):
Outcome A — 25%
Outcome B — 20%
Outcome C — 20%
Outcome D — 20%
Outcome E — 15%
Questions: [select 5–6 above, mapping to weights]
Scoring (1–5): [paste anchors]
Decision rule:
Move forward if weighted average ≥ 3.6/5 and no 1s on must-haves.
Break ties with work-sample score; discuss only with evidence.
Are situational or behavioral questions better?
Both have utility. Behavioral shows past evidence; situational reveals reasoning and job knowledge. Use both, but keep them tied to requirements and evaluate with the same rubric for consistency.
Can’t I just have an open conversation?
You can—but research shows unstructured interviews are less predictive and less reliable between interviewers. They’re also easier to fake. Keep your rapport, but follow a structure.
Should I give a big take-home?
Short, bounded exercises beat long unpaid projects. If you use take-homes, time-box them and score with anchors.
What about “culture fit”?
Translate that to values-based behaviors and ask for evidence (e.g., “Tell me about a time you admitted a mistake early.”). Avoid any question that proxies for protected classes.
Schmidt & Hunter. The Validity and Utility of Selection Methods in Personnel Psychology. Psychological Bulletin. Meta-analysis across 85 years of research. (onthewards.org)
Schmidt, Oh, & Shaffer. 100 Years of Research on Selection Methods (working paper/updates). 2016. Expands on validity estimates. (home.ubalt.edu)
OPM. Structured Interviews — practitioner guide with references. (U.S. Office of Personnel Management)
Roth, Bobko, McFarland. A Meta-Analysis of Work Sample Test Validity (2025). Personnel Psychology. (psychologie.uni-mannheim.de)
Levashina & Campion. Interview Faking Behavior Scale (2025). Journal of Applied Psychology. (ResearchGate)
EEOC. What shouldn’t I ask when hiring? (legal guidance + examples). (eeoc.gov)
Great interviews are less art, more system. Decide what matters, ask every candidate the same high-signal questions, and score with evidence. If you want a shortcut, EquaTalent lets you find gaps in candidates' CVs and ask the right questions!