Getting started with pre-aggregates

Defining pre-aggregates

Pre-aggregates are defined under the pre_aggregates key in your model configuration. If you’re using Lightdash YAML instead of dbt model YAML, see the Lightdash YAML syntax guide for the surrounding model structure.

dbt v1.9 and earlier
dbt v1.10+ and Fusion
Lightdash YAML

models:
  - name: orders
    meta:
      pre_aggregates:
        - name: orders_daily_by_status
          dimensions:
            - status
          metrics:
            - total_order_amount
            - average_order_size
          filters:
            - order_date: inThePast 52 weeks
          time_dimension: order_date
          granularity: day

models:
  - name: orders
    config:
      meta:
        pre_aggregates:
          - name: orders_daily_by_status
            dimensions:
              - status
            metrics:
              - total_order_amount
              - average_order_size
            filters:
              - order_date: inThePast 52 weeks
            time_dimension: order_date
            granularity: day

type: model
name: orders

pre_aggregates:
  - name: orders_daily_by_status
    dimensions:
      - status
    metrics:
      - total_order_amount
      - average_order_size
    filters:
      - order_date: inThePast 52 weeks
    time_dimension: order_date
    granularity: day

Configuration reference

Property	Required	Description
`name`	Yes	Unique identifier for the pre-aggregate. Must contain only letters, numbers, and underscores.
`dimensions`	Yes	List of dimension names to include. Must contain at least one dimension.
`metrics`	Yes	List of metric names to include. Must contain at least one metric.
`filters`	No	Static filters applied when materializing the pre-aggregate. Matching queries must include an equivalent or narrower filter to use this pre-aggregate.
`time_dimension`	No	A time-based dimension for date grouping. Must be paired with `granularity`.
`granularity`	No	Time granularity for the `time_dimension`. Valid values: `hour`, `day`, `week`, `month`, `quarter`, `year`. Must be paired with `time_dimension`.
`max_rows`	No	Maximum number of rows to store in the materialization. If the aggregation exceeds this limit, the result is truncated. Must be a positive integer.
`refresh`	No	Schedule configuration for automatic re-materialization. See Scheduling refreshes.
`materialization_role`	No	Fixed access context to use when materializing the pre-aggregate. This is useful when your model or joined tables use `required_attributes` or `any_attributes`. See Materialization role.

If you specify time_dimension, you must also specify granularity, and vice versa.

Filtered pre-aggregates

Use filters when you want a pre-aggregate to materialize only a subset of the source data. For example, this pre-aggregate only stores data for the last 52 weeks:

dbt v1.9 and earlier
dbt v1.10+ and Fusion
Lightdash YAML

models:
  - name: orders
    meta:
      pre_aggregates:
        - name: recent_orders_daily
          dimensions:
            - status
          metrics:
            - total_order_amount
            - order_count
          filters:
            - order_date: inThePast 52 weeks
          time_dimension: order_date
          granularity: day

models:
  - name: orders
    config:
      meta:
        pre_aggregates:
          - name: recent_orders_daily
            dimensions:
              - status
            metrics:
              - total_order_amount
              - order_count
            filters:
              - order_date: inThePast 52 weeks
            time_dimension: order_date
            granularity: day

type: model
name: orders

pre_aggregates:
  - name: recent_orders_daily
    dimensions:
      - status
    metrics:
      - total_order_amount
      - order_count
    filters:
      - order_date: inThePast 52 weeks
    time_dimension: order_date
    granularity: day

This is useful when a rolling time window is queried frequently and deserves its own smaller materialization.

How query matching works with filters

Filtered pre-aggregates are only used when the query filters are compatible with the pre-aggregate definition:

A query with the same or narrower filter can use the pre-aggregate
A query without the filter, or with a broader or incompatible filter, falls back to another pre-aggregate or the warehouse

For the example above:

order_date inThePast 12 weeks can use the pre-aggregate
order_date inThePast 52 weeks can use the pre-aggregate
order_date inThePast 104 weeks cannot use the pre-aggregate
no order_date filter: cannot use the pre-aggregate

If a field is only used for filtering, you should still include it in the pre-aggregate’s dimensions list so Lightdash can match and re-aggregate queries correctly.

Multiple pre-aggregates per model

You can define multiple pre-aggregates on the same model, each targeting different query patterns. It is better to have multiple small, focused pre-aggregates rather than a single one containing all metrics and dimensions. Including too many dimensions increases the number of unique combinations, which generates large materialization files — this defeats the purpose of pre-aggregates, since they are meant to be smaller and faster than querying the warehouse directly. For example, you might want a fine-grained daily pre-aggregate for detailed dashboards and a coarser monthly one for summary views:

models:
  - name: orders
    config:
      meta:
        pre_aggregates:
          - name: orders_daily_by_status
            dimensions:
              - status
            metrics:
              - total_order_amount
              - order_count
            time_dimension: order_date
            granularity: day
          - name: orders_monthly_summary
            dimensions:
              - status
            metrics:
              - total_order_amount
            time_dimension: order_date
            granularity: month
            max_rows: 1000000

When a query matches multiple pre-aggregates, Lightdash picks the smallest one.

Scheduling refreshes

By default, pre-aggregates are materialized when your dbt project compiles. You can also schedule automatic refreshes using cron expressions, using your project’s configured timezone (defaults to UTC):

pre_aggregates:
  - name: orders_daily_by_status
    dimensions:
      - status
    metrics:
      - total_order_amount
    time_dimension: order_date
    granularity: day
    refresh:
      cron: "0 6 * * *"  # Every day at 6:00 AM UTC

Materialization triggers

Pre-aggregates can be materialized through four different triggers:

Trigger	When it happens
Compile	Automatically when your dbt project is compiled
Cron	On the schedule you define in `refresh.cron`
Manual	When you trigger a refresh from the Lightdash UI

Row limits

You can set max_rows to cap the size of a materialization. If the aggregation produces more rows than the limit, the result is truncated.

When max_rows is applied, some data is excluded from the materialization. Queries that match the pre-aggregate may return incomplete results. Use this setting carefully and monitor for the “max rows applied” warning in the monitoring UI.

Materialization role

materialization_role is useful when access to the model depends on required_attributes or any_attributes. For example, if a joined table is only available to users with region_access: emea, then materializing a pre-aggregate without a fixed access context could produce different results depending on who triggered the build. Use materialization_role to make materialization run with a stable set of user attributes. This is intended for access control fields such as:

dbt v1.9 and earlier
dbt v1.10+ and Fusion

models:
  - name: orders
    meta:
      joins:
        - join: customers
          sql_on: ${customers.customer_id} = ${orders.customer_id}
      pre_aggregates:
        - name: orders_daily_by_region
          dimensions:
            - customers.region
          metrics:
            - total_order_amount
          time_dimension: order_date
          granularity: day
          materialization_role:
            email: materialize@acme.com
            attributes:
              region_access: emea

models:
  - name: orders
    config:
      meta:
        joins:
          - join: customers
            sql_on: ${customers.customer_id} = ${orders.customer_id}
        pre_aggregates:
          - name: orders_daily_by_region
            dimensions:
              - customers.region
            metrics:
              - total_order_amount
            time_dimension: order_date
            granularity: day
            materialization_role:
              email: materialize@acme.com
              attributes:
                region_access: emea

Complete example

Here’s a full model definition with a pre-aggregate, including joins, scheduling, and row limits:

dbt v1.9 and earlier
dbt v1.10+ and Fusion
Lightdash YAML

models:
  - name: orders
    meta:
      joins:
        - join: customers
          sql_on: ${customers.customer_id} = ${orders.customer_id}
      pre_aggregates:
        - name: orders_daily_by_status
          dimensions:
            - status
            - customers.country
          metrics:
            - total_order_amount
            - average_order_size
          filters:
            - status: completed
          time_dimension: order_date
          granularity: day
          max_rows: 5000000
          refresh:
            cron: "0 6 * * *"
    columns:
      - name: order_date
        meta:
          dimension:
            type: date
      - name: status
        meta:
          dimension:
            type: string
      - name: amount
        meta:
          metrics:
            total_order_amount:
              type: sum
            average_order_size:
              type: average

models:
  - name: orders
    config:
      meta:
        joins:
          - join: customers
            sql_on: ${customers.customer_id} = ${orders.customer_id}
        pre_aggregates:
          - name: orders_daily_by_status
            dimensions:
              - status
              - customers.country
            metrics:
              - total_order_amount
              - average_order_size
            filters:
              - status: completed
            time_dimension: order_date
            granularity: day
            max_rows: 5000000
            refresh:
              cron: "0 6 * * *"
    columns:
      - name: order_date
        config:
          meta:
            dimension:
              type: date
      - name: status
        config:
          meta:
            dimension:
              type: string
      - name: amount
        config:
          meta:
            metrics:
              total_order_amount:
                type: sum
              average_order_size:
                type: average

type: model
name: orders

joins:
  - join: customers
    sql_on: ${customers.customer_id} = ${orders.customer_id}

pre_aggregates:
  - name: orders_daily_by_status
    dimensions:
      - status
      - customers.country
    metrics:
      - total_order_amount
      - average_order_size
    filters:
      - status: completed
    time_dimension: order_date
    granularity: day
    max_rows: 5000000
    refresh:
      cron: "0 6 * * *"

dimensions:
  - name: order_date
    type: date
  - name: status
    type: string

metrics:
  total_order_amount:
    type: sum
    sql: ${TABLE}.amount
  average_order_size:
    type: average
    sql: ${TABLE}.amount

With this pre-aggregate, the following queries would be served from materialized data:

Total order amount by status, grouped by day, week, month, or year
Average order size by status, grouped by month
Total order amount filtered to completed orders
Order amount by customer country, grouped by quarter

These queries would not match and would query the warehouse directly:

Queries including count_distinct metrics
Queries grouped by a dimension not in the pre-aggregate (for example, customer_id)
Queries with hourly granularity (finer than the pre-aggregate’s day)
Queries without status = completed or with a broader status filter
Queries with Parameters, user attributes inside SQL, or sql_filter
Queries with raw SQL table calculations

​Defining pre-aggregates

​Configuration reference

​Filtered pre-aggregates

​How query matching works with filters

​Multiple pre-aggregates per model

​Scheduling refreshes

​Materialization triggers

​Row limits

​Materialization role

​Complete example

Defining pre-aggregates

Configuration reference

Filtered pre-aggregates

How query matching works with filters

Multiple pre-aggregates per model

Scheduling refreshes

Materialization triggers

Row limits

Materialization role

Complete example