Data Platform Cost Optimization: Reducing Cloud Spend Without Sacrificing Reliability

Build Cost Visibility Before Optimization

You cannot optimize what you cannot measure. The first step is not to reduce costs -- it is to understand where costs come from. Most data platforms have a small number of queries and pipelines that drive the majority of spend.

In Snowflake, the cost attribution queries that matter:

-- Top queries by credit consumption (last 30 days)
SELECT
    query_id,
    LEFT(query_text, 100) AS query_preview,
    user_name,
    warehouse_name,
    credits_used_cloud_services,
    execution_time / 1000 AS execution_seconds,
    bytes_scanned / 1e9 AS gb_scanned
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(
    DATE_RANGE_START => DATEADD('day', -30, CURRENT_DATE),
    RESULT_LIMIT => 10000
))
WHERE execution_status = 'SUCCESS'
  AND credits_used_cloud_services > 0
ORDER BY credits_used_cloud_services DESC
LIMIT 50;

-- Credit consumption by warehouse (last 30 days)
SELECT
    warehouse_name,
    SUM(credits_used) AS total_credits,
    SUM(credits_used) * 3.0 AS estimated_cost_usd  -- Adjust for your rate
FROM TABLE(INFORMATION_SCHEMA.WAREHOUSE_METERING_HISTORY(
    DATE_RANGE_START => DATEADD('day', -30, CURRENT_DATE)
))
GROUP BY warehouse_name
ORDER BY total_credits DESC;

Run this monthly. The top 10 queries by credit consumption will tell you where to focus optimization effort. A single poorly-written query that runs every hour on a large warehouse can represent 20-30% of your total bill.

Warehouse Configuration: The Fastest Wins

Warehouse configuration is usually the highest-impact lever because it affects every query. The two settings that matter most:

Auto-suspend. Snowflake warehouses continue to run (and cost money) until suspended. The default auto-suspend is 10 minutes for most warehouse sizes. For ad hoc query warehouses, set this to 1-2 minutes. A warehouse that runs one query per hour should not be paying for 58 minutes of idle time between queries.

-- Set aggressive auto-suspend for dev/ad-hoc warehouses
ALTER WAREHOUSE analytics_dev
    SET AUTO_SUSPEND = 60;  -- 1 minute

-- ETL warehouses: slightly longer to avoid cold-start overhead
ALTER WAREHOUSE etl_medium
    SET AUTO_SUSPEND = 300;  -- 5 minutes

-- Check current settings
SHOW WAREHOUSES;

Resource monitors. Without resource monitors, a runaway query or a misconfigured pipeline can consume unlimited credits. Set credit quotas at the warehouse level:

-- Create a resource monitor for analytics warehouse
CREATE OR REPLACE RESOURCE MONITOR analytics_monthly
    WITH CREDIT_QUOTA = 500  -- Alert at 500 credits/month
    FREQUENCY = MONTHLY
    START_TIMESTAMP = IMMEDIATELY
    TRIGGERS
        ON 75 PERCENT DO NOTIFY       -- Email at 75%
        ON 90 PERCENT DO NOTIFY       -- Email at 90%
        ON 100 PERCENT DO SUSPEND;    -- Suspend at 100%

ALTER WAREHOUSE analytics_medium
    SET RESOURCE_MONITOR = analytics_monthly;

Storage Cost Patterns

Storage is cheaper than compute for most cloud warehouses, but it accumulates quietly. The two main storage cost drivers:

Time Travel storage. Snowflake's Time Travel feature retains changed data for up to 90 days by default for Enterprise accounts. For large tables with high change rates, this can significantly multiply effective storage costs. Set appropriate Time Travel windows per table type:

-- High-churn staging tables: short Time Travel
ALTER TABLE staging.raw_events
    SET DATA_RETENTION_TIME_IN_DAYS = 1;

-- Gold tables with business-critical data: full retention
ALTER TABLE analytics.fct_monthly_revenue
    SET DATA_RETENTION_TIME_IN_DAYS = 90;

-- Check storage breakdown by table
SELECT
    table_schema,
    table_name,
    ROUND(bytes / 1e9, 2) AS table_gb,
    ROUND(bytes_fail_safe / 1e9, 2) AS fail_safe_gb,
    data_retention_time_in_days
FROM information_schema.tables
WHERE bytes > 1e9  -- Tables over 1 GB
ORDER BY bytes DESC
LIMIT 20;

S3/GCS object storage patterns. For platforms using object storage, the most common cost mistake is storing data in uncompressed formats. Parquet with Snappy compression typically reduces storage costs by 60-80% vs. uncompressed CSV, and also reduces scan costs since compressed files transfer faster.

Query-Level Optimizations That Compound

The query patterns that generate disproportionate cost:

Full table scans on large tables. A query that scans a 500GB table with no partition pruning costs 10x more than one that prunes 90% of partitions. Enforce filter requirements on large tables:

-- Add a row access policy that requires date filters on large fact tables
-- (Snowflake Enterprise+)
CREATE OR REPLACE ROW ACCESS POLICY require_date_filter
AS (order_date DATE) RETURNS BOOLEAN ->
    CURRENT_ROLE() IN ('ADMIN_ROLE')  -- Admins bypass
    OR order_date >= DATEADD('year', -2, CURRENT_DATE)  -- Others: 2-year limit
;

ALTER TABLE fct_orders ADD ROW ACCESS POLICY require_date_filter ON (order_date);

Repeated expensive subqueries. A CTE or subquery that references a large table and is used multiple times in a query re-scans the large table each time. Materialize it to a temp table first.

Over-scheduled pipelines. A dbt pipeline that runs every 15 minutes when analysts only check dashboards twice per day wastes compute. Audit your pipeline schedules against actual consumer behavior. Most analytical pipelines do not need to run more than hourly.

Dev environments using production warehouse sizes.Development workloads run on the same Large warehouse as production pipelines. Create separate XS or S warehouses for development and enforce them via role-based warehouse access.

dbt-Specific Cost Optimizations

dbt project structure decisions have direct cost implications:

Incremental vs. full refresh. Every dbt model materialized as a table runs as a full rebuild on each execution. For large fact tables (hundreds of millions of rows), converting to incremental materialization can reduce per-run cost by 80-95%. The cost of getting incremental logic wrong (missed late-arriving data, duplicates) is lower than the cost of full-table scans on large production runs.

dbt model tags for selective runs. Running the full dbt project on every schedule is expensive. Use tags to run only the models that need to be refreshed at each interval:

# models/marts/finance/fct_monthly_revenue.yml
models:
  - name: fct_monthly_revenue
    config:
      tags: ["hourly", "finance"]  # This model needs hourly refresh

  - name: dim_customers
    config:
      tags: ["daily", "dimensions"]  # Only needs daily refresh

# Run only hourly-tagged models every hour:
# dbt run --select tag:hourly

# Run only daily-tagged models at midnight:
# dbt run --select tag:daily

View vs. table materialization. Not every dbt model needs to be materialized as a table. Staging models and infrequently-queried intermediate models can be views. They have zero storage cost and are recomputed only when queried. Use tables only when materialization genuinely improves query performance for the consumers.

Building a Cost Culture

Technical optimizations only go so far. The bigger leverage is building cost awareness into the team's culture:

Tag all warehouse usage by team/project.Snowflake's query tags let you attribute cost to the pipeline or team that generated it. Without attribution, cost reduction conversations have no specific owner.
Include cost in pipeline reviews. When a new pipeline is proposed, estimate the monthly credit cost alongside the business value. A pipeline that costs $5K/month and enables $50K in business value is clearly justified. One that costs $5K/month for a report that five people view once a week requires harder justification.
Regular cost retrospectives. Monthly review of top cost drivers, with ownership assigned and improvement targets set. Costs that no one reviews tend to grow.

The goal is not to minimize cost at the expense of reliability -- it is to eliminate cost that produces no business value. A platform that is 30% cheaper and equally reliable is a better platform, not a worse one.