dbt Macros: The Power Feature Most Engineers Underuse

The first time most engineers encounter dbt macros, it is because they need to handle a date_trunc difference between Snowflake and BigQuery, or because they saw a ref macro and needed to understand how it worked. They solve the immediate problem, file macros under "advanced feature," and move on.

That is understandable. Macros are not the first thing you reach for in dbt, and the documentation, while thorough, does not always make clear when macros are genuinely the right tool versus a premature abstraction. After using dbt in production across multiple organizations and warehouse platforms, my view is this: most projects underuse macros significantly, and the gap between a project that uses them well and one that doesn't is visible in code quality, maintenance burden, and how easily new engineers can get productive.

This post is the guide I wish had existed when I was learning where macros actually fit.

What Macros Actually Are

dbt macros are Jinja2 functions defined in .sql files inside the macros/ directory of your dbt project. When dbt compiles your models, it executes the Jinja2 template engine, which resolves macros into plain SQL before that SQL runs against your warehouse.

The simplest possible macro looks like this:

-- macros/cents_to_dollars.sql
{% macro cents_to_dollars(column_name) %}
  ({{ column_name }} / 100.0)::numeric(10,2)
{% endmacro %}

Used in a model:

-- models/marts/fct_orders.sql
select
  order_id,
  {{ cents_to_dollars('amount_cents') }} as amount_dollars,
  {{ cents_to_dollars('tax_cents') }}    as tax_dollars
from {{ ref('stg_orders') }}

That compiles to standard SQL before execution. The macro is invisible to the warehouse. It only exists at compile time inside dbt.

This is the key mental model for macros: they are compile-time code generation tools, not runtime functions. They run in dbt's compilation step, not inside the database. That distinction matters when you are reasoning about what macros can and cannot do.

Macros vs. Models: When to Use Each

The confusion between macros and models is common and worth addressing directly. Both live in your dbt project and both produce SQL, but they serve completely different purposes.

A model is a transformation that produces a table or view in your warehouse. It represents a specific data artifact: a staging table, a fact table, an entity. You ref() it from other models. It has lineage. It gets tested.

A macro is reusable code logic that generates SQL fragments. It has no lineage in the DAG. It does not produce a table. It is a function that gets called inside models, tests, and other macros.

Use a model when: you are transforming data into a named, reusable artifact that other models depend on.

Use a macro when: you are writing the same SQL expression or pattern in multiple places and want to centralize it. Good signal cases for reaching for a macro:

The same date truncation or type cast appears in a dozen models
You need different SQL syntax for Snowflake vs. BigQuery vs. Redshift
You want to generate a standard test that dbt's built-in tests do not cover
You are building utility logic that gets used as a building block across multiple models
You need to generate SQL programmatically based on configuration or metadata

The anti-pattern to watch for: using macros to abstract business logic that should live in a model. If the logic is significant enough to test, document, and expose to analysts, it belongs in a model, not a macro.

Cross-Database Compatibility: The Macro Pattern That Earns Its Keep

If your project runs against multiple warehouse targets, macros are the correct abstraction layer for SQL dialect differences. This comes up more often than people expect: development against DuckDB or SQLite, staging on BigQuery, and production on Snowflake is a real pattern. So is migrating between warehouses, or running the same dbt project for multiple clients on different platforms.

The adapter_dispatch function is built for exactly this. It lets you define a default macro implementation and then override it per adapter:

-- macros/date_spine.sql

-- Default implementation
{% macro date_spine(start_date, end_date) %}
  {{ return(adapter.dispatch('date_spine', 'my_project')(start_date, end_date)) }}
{% endmacro %}

-- BigQuery override
{% macro bigquery__date_spine(start_date, end_date) %}
  generate_date_array({{ start_date }}, {{ end_date }})
{% endmacro %}

-- Snowflake override
{% macro snowflake__date_spine(start_date, end_date) %}
  (
    select dateadd(day, seq4(), {{ start_date }}) as date_day
    from table(generator(rowcount => datediff(day, {{ start_date }}, {{ end_date }}) + 1))
  )
{% endmacro %}

-- DuckDB override (local dev)
{% macro duckdb__date_spine(start_date, end_date) %}
  (
    select ({{ start_date }} + interval (n) day)::date as date_day
    from generate_series(0, datediff('day', {{ start_date }}, {{ end_date }})) as t(n)
  )
{% endmacro %}

This pattern centralizes dialect differences in one place. Models that call date_spine do not know or care which warehouse they are running on. The macro handles it. The same model works in development (DuckDB), CI (BigQuery sandbox), and production (Snowflake) without any changes to the model file.

Common candidates for adapter dispatch: date arithmetic, string aggregation (listagg vs. array_agg vs. string_agg), pivot/unpivot patterns, and any function that has notably different syntax across warehouse platforms.

Generic Tests with Macros

dbt's built-in schema tests (not_null, unique, accepted_values, relationships) cover a lot of ground, but production data is full of edge cases that the built-in tests do not reach. Custom generic tests, implemented as macros, fill that gap.

A generic test is a macro in the tests/ subdirectory that returns a query. If the query returns any rows, the test fails. Here is an example for a test that checks whether a numeric column is within an expected range:

-- macros/tests/test_between.sql
{% test between(model, column_name, min_value, max_value) %}
  select *
  from {{ model }}
  where {{ column_name }} < {{ min_value }}
     or {{ column_name }} > {{ max_value }}
{% endtest %}

Applied in schema.yml:

models:
  - name: fct_orders
    columns:
      - name: discount_pct
        tests:
          - between:
              min_value: 0
              max_value: 100

Generic tests are one of the highest-leverage macro patterns because they convert one-off data quality checks into reusable, configurable assertions. Once you have a test_between, test_not_negative, or test_row_count_between macro in your project, every model gets access to those tests through schema.yml configuration with no additional SQL.

A few generic tests worth building early in any project: row count threshold tests (flag if a model's row count drops by more than X percent), referential integrity checks against external systems, null proportion tests (fail if more than X percent of a column is null), and expression tests that validate complex business rules inline.

Utility Macros: Reducing Repetition at Scale

Some of the most practical macros are the unglamorous utility ones: they do not enable new capabilities, they just eliminate copy-paste and standardize patterns across a codebase.

A clean example is a macro that generates the standard set of audit columns for every model:

-- macros/generate_audit_columns.sql
{% macro generate_audit_columns() %}
  current_timestamp()                 as dbt_created_at,
  '{{ invocation_id }}'               as dbt_invocation_id,
  '{{ this.identifier }}'             as dbt_model_name
{% endmacro %}

Used at the end of any model:

select
  order_id,
  customer_id,
  amount_cents,
  {{ generate_audit_columns() }}
from {{ ref('stg_orders') }}

The benefit compounds as the codebase grows. If the audit column pattern changes (adding a new column, renaming something), one macro update propagates to every model. Without the macro, you are making the same change in dozens of files and hoping nothing is missed.

Other utility macros worth building: a standard SCD Type 2 hash key generator, a macro that generates surrogate keys consistently (star_hash or dbt_utils.generate_surrogate_key are common starting points), column-level comment formatters, and schema-aware SELECT * with explicit column exclusions.

Dynamic SQL Generation with Macros

One of the less obvious macro capabilities is generating SQL programmatically based on metadata. This is useful when you have a pattern that repeats across multiple similar models with different parameters.

A practical example: you have twenty event tables with the same structure but different event types. Instead of twenty near-identical staging models, a macro can generate them:

-- macros/stage_event.sql
{% macro stage_event(source_name, table_name, event_type) %}
  select
    id                                    as event_id,
    user_id,
    occurred_at,
    properties,
    '{{ event_type }}'                    as event_type,
    _fivetran_synced                      as _loaded_at
  from {{ source(source_name, table_name) }}
  where _fivetran_deleted is false
{% endmacro %}

Each staging model for an event type becomes a one-liner:

-- models/staging/stg_page_viewed.sql
{{ stage_event('segment', 'pages', 'page_viewed') }}

This pattern is powerful but carries a tradeoff: the logic is less visible. A new engineer reading stg_page_viewed.sql needs to track down the macro to understand what the model actually does. Use dynamic generation when the repetition is high and the pattern is genuinely stable. Do not use it to be clever when a straightforward model would be clearer.

Common Mistakes and How to Avoid Them

Over-abstracting with macros. Every abstraction has a cost in readability. A macro that wraps a simple case statement or a two-line coalesce expression is adding indirection without meaningful benefit. Write the SQL directly in the model unless the repetition is real and the abstraction genuinely simplifies the codebase.

Putting business logic in macros instead of models. Business logic belongs in tested, documented models. Macros are infrastructure. If a calculation defines a KPI or encodes a business rule, it should live in a model where it can be documented in schema.yml, tested explicitly, and tracked in lineage.

Missing the macros/ directory structure. As your macro library grows, a flat macros/ directory becomes hard to navigate. Organize macros into subdirectories by category: macros/tests/, macros/utils/, macros/adapters/, macros/staging/. dbt picks them all up regardless of subdirectory structure.

Not documenting macro arguments. dbt supports documenting macros in schema.yml with argument descriptions. Do this for any macro used widely across the project. The friction of reading an undocumented macro is non-trivial for engineers joining the team later.

Reinventing what dbt-utils already provides. The dbt-utils package includes a large and well-maintained macro library covering surrogate keys, date spines, pivot/unpivot, null handling, and more. Check whether a macro you need already exists there before building your own.

A Practical Starting Point for Your Project

If you are building out a new dbt project or improving an existing one, here is a practical order for introducing macros:

First, install dbt-utils and dbt-expectations. These two packages cover a substantial portion of the macro functionality most projects need and are better maintained than custom implementations.

Second, identify the SQL patterns that appear in three or more models and centralize them. Three is a reasonable threshold: one occurrence is a coincidence, two is a pattern, three is a library candidate.

Third, build generic tests for data quality assertions that your team makes repeatedly in ad hoc queries. If you are regularly checking that revenue columns are non-negative or that event timestamps are not in the future, those checks belong in schema.yml, not in your BI tool.

Fourth, if you are running against multiple warehouse targets, audit your SQL for dialect-specific functions and build adapter dispatch macros for the ones that vary.

The macro library you build is infrastructure for your project. Like any infrastructure, it pays back proportionally to how much the project grows. A project with ten models and two engineers does not need an extensive macro library. A project with a hundred models and a team of analysts writing SQL directly against the warehouse most certainly does.