Blog

Getting to Senior Data Engineer: The Skills Interviewers Actually Test

Ryan Kirsch · December 10, 2025 · 9 min read

Every data engineer above junior level claims the same tools on their resume. dbt, Spark, Airflow or Dagster, a cloud warehouse, Python. The tools are not what interviewers are testing when they interview for senior roles. They are testing how you think -- and that is harder to fake and harder to teach yourself by reading documentation.

The Tool Gap Is Not Real

Mid-level and senior data engineers often have surprisingly similar tool portfolios. Both have used dbt and Spark. Both have built pipelines in Airflow or Dagster. Both can write the SQL to build a fact table. The difference is not in what tools they know -- it is in how they reason about problems where those tools are being applied.

Senior engineers have made expensive mistakes and learned from them. They have shipped a pipeline that seemed correct and produced subtly wrong numbers for three weeks before anyone noticed. They have built a “simple” ETL that turned into a production incident at scale. They have inherited a codebase where every table has a different naming convention and no tests.

These experiences produce a specific set of instincts -- patterns of thinking that show up in interviews as architectural judgment, production mindset, and the ability to identify risks that mid-level engineers miss. The good news: you can develop these instincts deliberately, without waiting to make every mistake yourself.

System Design Thinking

Senior data engineering interviews almost always include a system design component. “Design a pipeline that ingests order events and produces daily revenue reporting.” The mid-level answer describes a series of steps: fetch the data, transform it, load it to the warehouse, run the reports.

The senior answer treats the design as a problem with multiple valid solutions and explicit trade-offs:

  • What are the SLA requirements? If revenue needs to be queryable by 7 AM, the pipeline needs to be scheduled, monitored, and have a recovery path for failures. If it needs to be real-time, the architecture changes entirely.
  • What is the volume and growth trajectory?A pipeline that handles 10K orders per day works very differently from one handling 10M. Design for the scale you will reach, not just the scale you have today.
  • Who are the consumers and what do they trust?Analysts who run ad hoc SQL need different guarantees than a finance system that pulls revenue for board reports.
  • What are the failure modes? Senior engineers design systems that fail detectably, not silently. They plan for late-arriving data, schema changes, partial failures, and the pipeline that runs successfully but produces wrong output.

Practice this by taking a data system you have built and writing down every assumption it makes that could be wrong. Late-arriving events. Duplicate source records. Upstream API downtime. Schema evolution. Each one is a failure mode that a senior engineer would have explicitly handled.

Production Mindset

Mid-level engineers build things that work. Senior engineers build things that work reliably over time, in the hands of other people, on data they did not generate. This distinction shows up in the small decisions that accumulate into either a maintainable platform or a fragile one.

The production mindset shows up as:

Idempotency by default. Every pipeline job should be safe to re-run. If you re-run a load job on the same data, you should get the same result, not duplicated rows. Senior engineers think about this automatically; mid-level engineers think about it after the first production incident.

Explicit over implicit. A function namedprocess_data(df) that silently drops rows with null customer IDs is implicit. A function namedfilter_orders_with_valid_customer(df, raise_on_high_drop_rate=True) is explicit. Senior engineers prefer the second pattern even when it is more verbose, because implicit behavior in data pipelines becomes invisible bugs.

Alerting on the absence of data, not just the presence of errors. A pipeline that runs successfully but processes zero rows because the source API returned an empty response will not fire an error alert. Senior engineers add volume checks. “This pipeline should load at least 1,000 rows. Alert if it loads less.”

Documentation as a production artifact.Every production model should have a description, a documented grain, an owner, and a freshness SLA. Not because documentation is a bureaucratic requirement, but because the absence of it means the next engineer to touch this model has to reverse-engineer the business logic from the SQL.

The Ability to Say No Intelligently

This one surprises people. Senior engineers are distinguished not just by what they build, but by what they decline to build and how they explain it.

A stakeholder asks for a real-time dashboard that updates every 30 seconds. A mid-level engineer either builds it (expensive, complex, probably overkill) or says “that's not possible” (wrong, and unhelpful). A senior engineer asks: “What decision are you making that requires 30-second updates? If you're monitoring for fraud, here is what that architecture looks like and costs. If you're checking daily revenue before a meeting, a 15-minute refresh is sufficient and costs 1% of the real-time solution. Which problem are we actually solving?”

The same principle applies to technical choices. A team wants to adopt a new streaming technology because a competitor uses it. A mid-level engineer evaluates the technology. A senior engineer evaluates whether the problem the technology solves is a problem the team actually has, before evaluating the technology itself.

Saying no intelligently requires understanding the business need well enough to propose a simpler alternative. That requires asking clarifying questions rather than immediately scoping the request as given. In interviews, this shows up as candidates who ask clarifying questions before jumping to architecture. Interviewers notice.

Cross-Functional Communication

Data engineers sit at an unusual intersection: they need to understand source systems (engineering), business logic (analytics), and infrastructure (platform). Senior engineers communicate fluently in all three directions.

With engineers: clear technical specifications, willingness to read source code when documentation is incomplete, ability to discuss schema design and API contracts without sounding like a data person complaining about software engineers.

With analysts: translating between “the pipeline failed” and “here is what data is missing and here is when it will be available.” Understanding what business questions analysts are trying to answer before building models, not after.

With leadership: translating technical decisions into business impact. “The incremental model strategy will reduce our daily compute cost by 40% and cut pipeline latency from 4 hours to 45 minutes” lands differently than “I refactored the dbt models.”

How to Demonstrate Seniority in Interviews

Given that tools do not differentiate senior from mid-level, the demonstration has to come from how you talk about your work:

Lead with impact, not activity. “I built a dbt project with 50 models” is activity. “I built a dbt project that reduced analyst query time from 45 minutes to 30 seconds for daily reporting, enabling the finance team to complete their monthly close 2 days faster” is impact. Every technical story should have a business consequence.

Discuss the mistakes. Mid-level candidates describe what they built. Senior candidates describe what they built, what went wrong, and how they fixed it. “The incremental model I designed worked well in testing but had a bug where late-arriving data was silently dropped. I caught it when revenue numbers were off by 3%. Here is what I changed and what I added to prevent it from happening again.” This is seniority in three sentences.

Ask questions that demonstrate system thinking.When an interviewer describes a system design problem, ask about scale, latency requirements, consumer types, and failure modes before proposing an architecture. This demonstrates that you do not default to a single solution -- you gather information to choose the right one.

Have opinions about trade-offs. “I prefer Dagster over Airflow for new projects because the asset model gives better observability, though I recognize the team's existing Airflow expertise has real value and I would not push for a migration until we have a clear forcing function.” This is the kind of opinion that distinguishes a senior engineer from someone who lists both tools on a resume.

Share this post:

RK

Ryan Kirsch

Senior Data Engineer with experience building production pipelines at scale. Works with dbt, Snowflake, and Dagster, and writes about data engineering patterns from production experience. See his full portfolio.