Materialized View Architecture & Syntax

Standard PostgreSQL materialized views recompute their entire result set on every REFRESH MATERIALIZED VIEW, which is untenable when the source is a billion-row hypertable growing by tens of thousands of rows per second. The single engineering problem this guide solves is how to declare a materialized rollup that TimescaleDB maintains incrementally — processing only the buckets touched by new or late-arriving telemetry — while still serving queries that blend the materialized history with the un-materialized tail. Getting the CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous) syntax, the watermark it establishes, and the storage layout underneath it correct is the foundation every refresh policy and retention job in this section builds on. This page is written for the time-series data engineers, IoT platform developers, and Python automation builders who own that definition and have to keep it correct as schemas evolve.

The rest of this guide maps to that diagram: the query hits a real-time view, which unions rows already persisted in the materialized hypertable with a live aggregation of the raw hypertable tail — everything newer than the materialization watermark — and merges them into one result. Understanding where that boundary sits is the difference between a correct rollup and one that silently double-counts or drops the most recent bucket.

Prerequisites

This guide assumes you are on a TimescaleDB instance where the source table is already a hypertable and the aggregate you want to materialize is a simple GROUP BY time_bucket(...) query. Before running any DDL below, confirm the following:

TimescaleDB 2.10 or later on PostgreSQL 14+ (SELECT extversion FROM pg_extension WHERE extname = 'timescaledb';). Older releases lack real-time aggregation defaults and finalized continuous aggregates.
The source table is a hypertable partitioned by time — see time-based chunk partitioning strategies if it is still a plain table.
The background worker scheduler is running: SHOW timescaledb.max_background_workers; returns at least 8 (each continuous aggregate policy consumes one worker slot during a run).
Your aggregation uses only parallelizable, GROUP BY-compatible functions (avg, sum, count, min, max, stats_agg). Window functions, DISTINCT, and ordered-set aggregates are not permitted in the view definition.
A connection role with CREATE on the target schema and ownership of the source hypertable.

The materialization layer relies on the internal _timescaledb_internal schema to track watermarks, invalidation ranges, and the partial-aggregate chunks. You never write to that schema directly, but its objects appear in disk-usage accounting, so budget storage for it. Duplicate timestamps are normal in time-series data — hypertables do not enforce a unique constraint on the time column, and neither should you rely on one; plan deduplication at the query or ingestion layer instead.

Step-by-step implementation

The following steps move left to right across the diagram above: define the materialized hypertable, seed it, wire the incremental refresh that advances the watermark, then confirm the real-time view merges the tail.

Step 1 — Declare the continuous aggregate

The WITH (timescaledb.continuous) storage parameter is what turns an ordinary materialized view into a continuously maintained one. It instructs the scheduler to build a hidden materialized hypertable and to track a watermark rather than snapshotting the source.

sql

CREATE MATERIALIZED VIEW IF NOT EXISTS sensor_hourly_metrics
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    device_id,
    avg(temperature)  AS avg_temp,
    max(humidity)     AS max_humidity,
    count(*)          AS reading_count
FROM sensor_readings
GROUP BY bucket, device_id
WITH NO DATA;

WITH NO DATA is strongly recommended in production: it registers the view and its watermark without running a blocking initial scan over the whole hypertable, so ingestion is never stalled by view creation. You backfill the history explicitly in Step 2, on your own schedule.

The time_bucket call is mandatory and must be the first GROUP BY key — it defines the fixed, non-overlapping intervals the watermark advances across. Do not place time_bucket_gapfill inside the aggregate definition; the incremental engine cannot track synthetic gap-filled rows, so gap-filling belongs at query time only.

Step 2 — Seed the history with a bounded refresh

With the view created empty, materialize historical buckets by calling refresh_continuous_aggregate over an explicit, bounded window. Bounding the window keeps the seed operation predictable in memory and lock footprint instead of scanning to the epoch.

sql

-- Backfill one quarter of history; run in chunks for very large ranges.
CALL refresh_continuous_aggregate(
    'sensor_hourly_metrics',
    now() - INTERVAL '90 days',   -- window_start
    now() - INTERVAL '1 hour'     -- window_end (leave the in-flight bucket open)
);

refresh_continuous_aggregate is a CALL-only procedure (it manages its own transactions) and cannot run inside a BEGIN ... COMMIT block or a function body. Passing NULL for window_start refreshes from the earliest data — useful for a full backfill, costly on a large hypertable, so prefer explicit bounds and iterate.

Step 3 — Advance the watermark automatically

Manual seeding covers the past; ongoing freshness comes from a refresh policy that advances the materialization watermark on a schedule. The policy defines the window relative to now() — everything between now() - start_offset and now() - end_offset is materialized each run.

sql

SELECT add_continuous_aggregate_policy(
    'sensor_hourly_metrics',
    start_offset      => INTERVAL '3 hours',   -- reach back to absorb late data
    end_offset        => INTERVAL '1 hour',    -- leave the newest bucket filling
    schedule_interval => INTERVAL '30 minutes',
    if_not_exists     => true
);

The policy registers a background job; it does not itself refresh. A finite start_offset bounds the per-run window so late data within that reach is absorbed without rescanning all history, and a positive end_offset leaves the most recent, still-filling bucket un-materialized so the real-time view serves it live. Choosing these offsets against your ingestion latency is the subject of refresh policy design and scheduling, and whether to lean on incremental runs or periodic full recomputes is covered in incremental vs full refresh strategies.

Step 4 — Confirm real-time aggregation merges the tail

By default (TimescaleDB 2.10+) continuous aggregates are created with materialized_only = false, meaning the view is a real-time view: it unions the materialized hypertable with a live aggregation of raw rows newer than the watermark. Verify — and control — that behaviour explicitly:

sql

-- Keep real-time merging on (serves the freshest bucket instantly):
ALTER MATERIALIZED VIEW sensor_hourly_metrics
    SET (timescaledb.materialized_only = false);

-- Or turn it off to serve ONLY materialized rows (lower, flatter query cost):
ALTER MATERIALIZED VIEW sensor_hourly_metrics
    SET (timescaledb.materialized_only = true);

Set materialized_only = true when downstream consumers must see a stable, fully-materialized surface (for example a nightly report) and can tolerate lag equal to end_offset. Leave it false for live dashboards that need the current bucket without waiting for the next scheduled run.

Step 5 — Orchestrate the lifecycle from Python

DevOps and automation teams typically manage aggregate definitions through infrastructure-as-code. The idempotent helper below uses psycopg v3 to assert that a continuous aggregate exists, carries a refresh policy, and has retention aligned — safe to run on every CI/CD deploy.

python

import psycopg
from psycopg.rows import dict_row


def ensure_cagg_health(
    conn: psycopg.Connection,
    view_name: str,
    schedule_interval: str = "30 minutes",
    retention_window: str = "90 days",
) -> dict:
    """Assert a continuous aggregate exists, has a refresh policy, and has
    retention aligned. Idempotent — designed for repeated CI/CD execution."""
    with conn.cursor(row_factory=dict_row) as cur:
        # 1. Verify the aggregate is registered as continuous.
        cur.execute(
            """
            SELECT view_name, materialized_only, finalized
            FROM timescaledb_information.continuous_aggregates
            WHERE view_name = %s;
            """,
            (view_name,),
        )
        meta = cur.fetchone()
        if not meta:
            raise RuntimeError(f"Continuous aggregate '{view_name}' not found.")

        # 2. Register a refresh policy (no-op if one already exists).
        cur.execute(
            """
            SELECT add_continuous_aggregate_policy(
                %s,
                start_offset      => INTERVAL '3 hours',
                end_offset        => INTERVAL '1 hour',
                schedule_interval => %s::interval,
                if_not_exists     => true
            );
            """,
            (view_name, schedule_interval),
        )

        # 3. Align retention on the aggregate with the materialization horizon.
        cur.execute(
            """
            SELECT add_retention_policy(
                %s,
                drop_after    => %s::interval,
                if_not_exists => true
            );
            """,
            (view_name, retention_window),
        )

    conn.commit()
    return {"status": "healthy", "view": view_name, "materialized_only": meta["materialized_only"]}

This pattern drops cleanly into Kubernetes CronJobs or Airflow DAGs. For connection pooling and async execution under high concurrency, consult the official psycopg documentation and PostgreSQL’s background worker architecture guidelines.

Configuration parameters reference

These are the parameters that govern the definition and refresh of a continuous aggregate. Offsets and intervals are the levers you tune per workload.

Parameter	Type	Recommended value	Effect
`timescaledb.continuous`	storage flag	required	Marks the view for incremental materialization and watermark tracking.
`timescaledb.materialized_only`	boolean	`false` for live, `true` for batch	Whether queries merge the raw tail (real-time) or read materialized rows only.
`timescaledb.finalized`	boolean	`true` (default 2.10+)	Stores finalized aggregates, not partials — enables `JOIN`s and simpler storage; irreversible.
`start_offset`	interval	2-4× ingestion+network latency	How far back each refresh run reaches to absorb late data.
`end_offset`	interval	≥ 1 bucket width	Excludes the newest, still-filling bucket from materialization.
`schedule_interval`	interval	0.5-1× bucket width	How often the background worker runs the policy.
`WITH NO DATA`	clause	always in production	Creates the view without a blocking initial scan; backfill separately.

A safe invariant: start_offset > end_offset ≥ one bucket, and schedule_interval no larger than the freshness SLA you promise dashboards.

Integration with adjacent features

A continuous aggregate never stands alone in a production pipeline. Its materialized hypertable inherits the source table’s partitioning, so the same chunk_interval sizing discipline applies — oversized chunks on the aggregate waste memory during refresh, undersized ones inflate catalog overhead. Once buckets age past active querying, layer columnar compression models on the materialized hypertable to shrink it, and attach a TTL retention policy so old rollups drop on the same cadence as the raw data they summarize.

When refresh runs contend with heavy ingestion, coordination moves up a level: asynchronous execution and queue management governs how overlapping jobs are serialized across the worker pool, and error handling and retry mechanisms determine what happens when a refresh fails mid-window. The definition you write here sets the boundaries those systems operate within.

Performance validation

After creating the view and its policy, verify the machinery is actually advancing. Query the TimescaleDB information views rather than trusting the DDL succeeded.

Confirm the aggregate is registered and inspect its real-time setting:

sql

SELECT view_name, materialized_only, finalized, compression_enabled
FROM timescaledb_information.continuous_aggregates
WHERE view_name = 'sensor_hourly_metrics';

Check that the refresh job is scheduled, when it last ran, and whether it succeeded:

sql

SELECT j.job_id,
       j.schedule_interval,
       s.last_run_started_at,
       s.last_successful_finish,
       s.total_runs,
       s.total_failures
FROM timescaledb_information.jobs j
JOIN timescaledb_information.job_stats s USING (job_id)
WHERE j.proc_name = 'policy_refresh_continuous_aggregate'
  AND j.hypertable_name = 'sensor_hourly_metrics';

Read the materialization watermark directly to measure refresh lag — the gap between now() and the watermark should stay near end_offset:

sql

SELECT to_timestamp(
         _timescaledb_internal.cagg_watermark(mat_hypertable_id) / 1e6
       ) AS watermark,
       now() - to_timestamp(
         _timescaledb_internal.cagg_watermark(mat_hypertable_id) / 1e6
       ) AS refresh_lag
FROM _timescaledb_catalog.continuous_agg
WHERE user_view_name = 'sensor_hourly_metrics';

If refresh_lag grows well beyond end_offset + schedule_interval, the policy is falling behind and you should investigate worker contention before dashboards start showing stale numbers.

Troubleshooting

ERROR: cannot create a continuous aggregate with a data-modifying common table expression (or on window functions / DISTINCT) The view definition uses a construct the incremental engine cannot maintain. Reduce the query to a plain time_bucket + GROUP BY with parallelizable aggregates; move window functions and de-duplication to a view layered on top of the aggregate.

ERROR: refresh_continuous_aggregate() cannot run inside a transaction block refresh_continuous_aggregate is a procedure that manages its own transactions. Invoke it with CALL, outside any BEGIN ... COMMIT, and never from inside a PL/pgSQL function body. In psycopg, run it on a connection with autocommit enabled.

The newest bucket is missing from query results Expected when materialized_only = true and the bucket is newer than the watermark. Either set materialized_only = false to serve it live from the raw tail, or shrink end_offset so the policy materializes it sooner.

Aggregate values look doubled or inflated Duplicate source rows, not an aggregate bug — hypertables permit duplicate timestamps. De-duplicate at ingestion with ON CONFLICT DO NOTHING, or read through a SELECT DISTINCT ON (device_id, time) layer before aggregating.

ERROR: continuous aggregate policy already exists A policy is already registered for this view. Add if_not_exists => true to make the call idempotent, or drop the existing policy first with remove_continuous_aggregate_policy('sensor_hourly_metrics').

Frequently Asked Questions

Can I add or drop columns in an existing continuous aggregate?

Not with ALTER. The aggregate’s SELECT list is fixed at creation. To change which columns are materialized, create a new view with the desired definition, backfill it with refresh_continuous_aggregate, repoint consumers, then drop the old one. Plan this as a migration, not an in-place edit.

What is the difference between finalized and non-finalized aggregates?

finalized = true (the default since 2.10) stores the completed aggregate values, so the view supports JOINs and has a simpler on-disk layout. Older non-finalized aggregates store partial-aggregate state that must be finalized at query time. The setting is chosen at creation and cannot be toggled — migrate by recreating the view if you are on a legacy definition.

How much storage does the materialized hypertable consume?

Roughly proportional to the number of buckets times the number of GROUP BY groups (for example distinct device_ids) times the row width of your aggregates. A one-hour bucket over 10,000 devices is 240,000 rows/day before compression — far smaller than the raw stream, but budget the _timescaledb_internal partials and invalidation log on top.

Does the refresh policy handle late-arriving data automatically?

Yes, within reach of start_offset. When a late row lands in an already-materialized bucket, TimescaleDB writes an invalidation record; the next policy run whose window covers that bucket reprocesses it. Data older than start_offset on a given run is not revisited — size the offset to your worst-case ingestion delay, or trigger a manual refresh_continuous_aggregate for deep backfills.

Can I build an aggregate on top of another aggregate?

Yes — hierarchical rollups (hourly → daily → weekly) are supported and are the recommended way to serve multiple granularities cheaply. Define the coarser aggregate’s FROM clause against the finer aggregate rather than the raw hypertable, and give each its own refresh policy so the coarse layer refreshes only after the fine layer has advanced.

← Back to Continuous Aggregate Creation & Refresh Management

Refresh Policy Design & Scheduling — choosing offsets and cadence for the watermark
Incremental vs Full Refresh Strategies — when to reprocess history versus advance forward
Asynchronous Execution & Queue Management — serializing refresh jobs across background workers
Error Handling & Retry Mechanisms — recovering from failed refresh runs
Creating Continuous Aggregates with time_bucket_gapfill — the query-time gap-filling pattern

Materialized View Architecture & Syntax

# Prerequisites

# Step-by-step implementation

# Step 1 — Declare the continuous aggregate

# Step 2 — Seed the history with a bounded refresh

# Step 3 — Advance the watermark automatically

# Step 4 — Confirm real-time aggregation merges the tail

# Step 5 — Orchestrate the lifecycle from Python

# Configuration parameters reference

# Integration with adjacent features

# Performance validation

# Troubleshooting