Data Retention & Compression Lifecycle Automation

Time-series workloads in IoT telemetry, observability, and industrial automation generate data at velocities that quickly outpace traditional relational storage models. Managing the lifecycle of this data — retaining raw precision for recent windows, downsampling for historical analysis, compressing warm chunks, and reclaiming storage for older epochs — requires deterministic automation rather than ad-hoc cleanup jobs. This guide is written for IoT platform developers, DevOps engineers, and Python automation builders who run TimescaleDB in production and need every chunk to move through hot, warm, and cold tiers on a predictable schedule. TimescaleDB supplies the primitives — hypertable partitioning, continuous aggregates, and background policy jobs — but production deployments must orchestrate them explicitly to prevent storage bloat, query degradation, and compliance violations.

Chunks move through the tiers in sequence: rollups must materialize and compression must complete before a chunk becomes eligible to drop.

The lifecycle above is sequenced, not parallel: a chunk must have its rollups materialized and be compressed before it is eligible to drop. Getting that ordering wrong is the single most common cause of silent data loss in TimescaleDB retention deployments, and every section below reinforces the correct sequence.

Architecture Baseline

Before implementing lifecycle automation, validate your instance against the operational baseline below. Retention and compression jobs run as background workers on the same instance that services ingestion, so undersized worker pools or misaligned chunk boundaries surface as policy timeouts and creeping storage growth rather than hard errors.

PostgreSQL 14+ with the timescaledb extension at 2.11 or later (improved continuous aggregate materialization and job scheduler APIs)
Chunk intervals aligned to ingestion cadence — see optimal chunk_interval sizing for IoT sensor data before enabling any policy
timescaledb.max_background_workers sized for concurrent maintenance jobs (retention + compression + aggregate refresh) plus headroom
max_worker_processes raised in tandem so background jobs never starve application connections
Sufficient maintenance_work_mem for compression passes without stalling ingestion
A dedicated service account owning the hypertables (policy jobs execute as their owner)
An observability stack tracking chunk count, compression ratio, and job error rates

sql

-- Idempotent hypertable creation with an explicit chunk interval
SELECT create_hypertable(
    'sensor_readings',
    'time',
    chunk_time_interval => INTERVAL '1 day',
    if_not_exists => TRUE
);

-- Allocate background workers for concurrent maintenance jobs
-- (requires a PostgreSQL restart to take effect)
ALTER SYSTEM SET timescaledb.max_background_workers = 16;

For high-throughput IoT streams, a 1-day or 1-week chunk interval balances query parallelism against catalog overhead. Multi-tenant deployments should layer a device or tenant dimension on top of time using space partitioning for multi-tenant IoT, which keeps per-tenant chunks small enough that compression and retention sweeps stay fast.

The Retention & Compression Lifecycle

The central concept of this topic is a three-stage lifecycle: downsample, compress, drop. Each stage is driven by an interval threshold measured against a chunk’s time range, and each stage is idempotent so that re-running a policy never corrupts state.

Stage 1 — Downsample before you delete

Raw telemetry rarely warrants indefinite retention, but the aggregate signal usually does. Materialize a continuous aggregate before dropping raw data so analytical queries survive the deletion. A continuous aggregate is a specialized materialized view that TimescaleDB refreshes incrementally as new chunks arrive; understanding its materialized view architecture and syntax is a prerequisite for choosing bucket widths that match your retention granularity.

sql

CREATE MATERIALIZED VIEW IF NOT EXISTS sensor_readings_1h
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 hour', time) AS bucket,
    device_id,
    metric_name,
    avg(value)   AS avg_value,
    min(value)   AS min_value,
    max(value)   AS max_value,
    count(*)     AS sample_count
FROM sensor_readings
GROUP BY bucket, device_id, metric_name;

-- Real-time union of materialized and fresh data for the most recent buckets
ALTER MATERIALIZED VIEW sensor_readings_1h SET (timescaledb.materialized_only = FALSE);

Note that percentile_cont and other ordered-set aggregates are not supported inside a continuous aggregate definition. Compute percentiles at query time on top of the materialized rollup, or use an approximation such as approx_percentile from the TimescaleDB Toolkit. The bucket width you pick here is a durability decision: once raw chunks drop, the 1-hour rollup is the finest resolution you retain.

Stage 2 — Compress warm chunks

Once a chunk ages out of the write-hot window, columnar compression models convert its row-store layout into compressed columnar batches, typically shrinking storage 90–95% for regular IoT metrics and accelerating analytical scans. Configuring chunk compression scheduling and automation lets background workers compress chunks sequentially as they cross an age threshold, without interrupting ingestion into newer chunks.

sql

-- Configure compression before enabling the policy
ALTER TABLE sensor_readings SET (
    timescaledb.compress = true,
    timescaledb.compress_segmentby = 'device_id, metric_name',
    timescaledb.compress_orderby   = 'time DESC'
);

-- Compress chunks older than 7 days
SELECT add_compression_policy(
    'sensor_readings',
    compress_after => INTERVAL '7 days',
    if_not_exists => TRUE
);

The compress_segmentby columns should match the equality predicates your queries filter on, and compress_orderby should match your scan order — this is what turns compression into a query accelerator rather than pure storage savings.

Stage 3 — Drop or archive cold chunks

Finally, map business and compliance SLAs to a drop horizon. TTL policy mapping and enforcement registers a background job that removes chunks whose entire time range has aged past the window — a metadata-level DROP TABLE per chunk, not a row-by-row DELETE.

sql

-- Drop raw chunks older than 90 days
SELECT add_retention_policy(
    'sensor_readings',
    drop_after => INTERVAL '90 days',
    if_not_exists => TRUE
);

The cardinal rule ties all three stages together: always set drop_after strictly longer than compress_after, and both longer than any continuous aggregate refresh lag. A retention horizon shorter than the compression horizon simply drops chunks before compression ever benefits them; a horizon shorter than the aggregate refresh window drops raw data the rollups have not yet consumed.

Automation Patterns

Declarative SQL policies cover the steady state. Regulatory archival — where cold chunks must be exported to object storage before deletion — needs orchestration that coordinates extraction and drop transactionally. The pattern below uses psycopg v3 with workflow-step comments: it discovers eligible chunks through the supported information view, serializes each to Parquet, uploads it, then drops everything past the cutoff through the supported API. It is idempotent — re-running after a partial failure re-uploads only what remains and drops only what is still present.

python

import psycopg
from psycopg import sql
import pandas as pd
import boto3
from datetime import datetime, timedelta, timezone


def archive_and_drop_chunks(conn_str: str, bucket_name: str, retention_days: int = 90):
    """Idempotent archival workflow: extract, upload to S3, then drop cold chunks."""
    # Step 0: aware UTC cutoff. datetime.utcnow() is naive and casts
    # ambiguously to timestamptz, so build the cutoff with an explicit tz.
    cutoff = datetime.now(timezone.utc) - timedelta(days=retention_days)

    with psycopg.connect(conn_str) as conn:
        # Step 1: find chunks eligible for archival via the supported view.
        with conn.cursor() as cur:
            cur.execute("""
                SELECT chunk_schema, chunk_name
                FROM timescaledb_information.chunks
                WHERE hypertable_name = 'sensor_readings'
                  AND range_end <= %s
                ORDER BY range_end ASC;
            """, (cutoff,))
            chunks = cur.fetchall()

        s3 = boto3.client('s3')
        for schema, chunk in chunks:
            # Step 2: read the chunk on a dedicated cursor (schema-qualified, quoted).
            with conn.cursor() as read_cur:
                read_cur.execute(
                    sql.SQL("SELECT * FROM {}").format(sql.Identifier(schema, chunk))
                )
                rows = read_cur.fetchall()
                cols = [desc[0] for desc in read_cur.description]

            df = pd.DataFrame(rows, columns=cols)
            if df.empty:
                continue

            # Step 3: serialize + upload. Re-uploading an existing key is a no-op
            # overwrite, which keeps the workflow safe to retry.
            parquet_key = f"archive/sensor_readings/{schema}_{chunk}.parquet"
            df.to_parquet(f"/tmp/{chunk}.parquet", engine='pyarrow')
            s3.upload_file(f"/tmp/{chunk}.parquet", bucket_name, parquet_key)

        # Step 4: drop every chunk past the cutoff via the supported API. A raw
        # DROP TABLE on a chunk would orphan TimescaleDB catalog rows.
        with conn.cursor() as cur:
            cur.execute("SELECT drop_chunks('sensor_readings', older_than => %s);", (cutoff,))
        conn.commit()

Wrap policy registration in the same idempotent style: query timescaledb_information.jobs before calling add_retention_policy or add_compression_policy so redeploys become no-ops rather than duplicate jobs. For the ingestion side of this pipeline — batching, upserts, and gateway-reconnect backfills — the same connection-pooling and transactional discipline applies.

Performance & Scale

Lifecycle automation is ultimately a resource-budgeting exercise across chunk count, worker concurrency, and IOPS. A few relationships govern how the topic scales for IoT telemetry:

Chunk count vs catalog overhead. Every chunk is a PostgreSQL relation with catalog rows, planner statistics, and per-chunk indexes. Planning latency and autovacuum load grow roughly linearly with live chunk count. Keeping intervals aligned to time-based chunk partitioning strategies keeps that count bounded — a hypertable retaining 90 days at a 1-day interval holds ~90 active chunks, not thousands.
Background worker concurrency. Retention, compression, and aggregate refresh jobs all draw from the timescaledb.max_background_workers pool. If the pool is smaller than the number of concurrently scheduled jobs, jobs queue and effective throughput drops while job_stats shows growing runtime.
IOPS distribution. Compression is scan-and-rewrite heavy. Schedule compress_after and archival passes for off-peak windows, and stagger policies across hypertables with initial_start so multiple large chunks do not compress simultaneously and saturate disk.

Estimate the storage a policy will hold using a compression ratio $r$ (compressed size ÷ raw size). Total steady-state bytes are:

S_{\text{total}} = \underbrace{H_{\text{hot}} \cdot B_{\text{day}}}_{\text{raw chunks}} \;+\; \underbrace{(H_{\text{drop}} - H_{\text{hot}}) \cdot B_{\text{day}} \cdot r}_{\text{compressed chunks}} \;+\; S_{\text{rollup}}

where $H_{\text{hot}}$ is the days held uncompressed (= compress_after), $H_{\text{drop}}$ is the retention horizon (= drop_after), $B_{\text{day}}$ is raw bytes ingested per day, and $S_{\text{rollup}}$ is the continuous aggregate footprint. For a fleet writing 40 GB/day, compress_after => 7 days, drop_after => 90 days, and a measured $r = 0.08$ , raw chunks hold ~280 GB and compressed chunks hold $(90-7)\times 40 \times 0.08 \approx 266$ GB — under a quarter of the ~3.3 TB the same window would cost uncompressed.

Compression flattens 83 of the 90 days to a fraction of raw size, so the compressed tail costs less than the 7-day hot window despite holding twelve times as many days.

Failure Modes & Operational Gotchas

Retention outruns compression. If drop_after ≤ compress_after, chunks drop before they are ever compressed and the compression policy silently does nothing. Mitigation: enforce drop_after > compress_after in code review and assert it at deploy time.
Dropping raw data an aggregate has not consumed. A retention window shorter than the continuous aggregate’s refresh lag deletes rows the rollup never materialized, leaving permanent gaps. Mitigation: keep drop_after larger than end_offset plus the continuous aggregate refresh policy schedule_interval, and validate with troubleshooting stale continuous aggregates.
Modifying compression settings after the policy exists. Changing compress_segmentby/compress_orderby does not recompress existing chunks. Mitigation: decompress, alter, and recompress affected chunks, or accept mixed layouts until they age out.
Retention sweep lock contention. drop_chunks takes brief locks; overlapping long analytical transactions can block the sweep and stall the scheduler. Mitigation: schedule sweeps off-peak and cap max_runtime via alter_job.
Over-fragmented chunks. Sub-daily intervals on a modest write rate explode catalog overhead and slow every policy evaluation. Mitigation: size the interval so each chunk holds meaningful data before it compresses.
DROP TABLE on a raw chunk. Bypassing drop_chunks orphans TimescaleDB catalog rows and corrupts hypertable metadata. Mitigation: only ever remove chunks through drop_chunks or a retention policy.

Monitoring Checklist

Automation without validation drifts. Track policy health continuously against the TimescaleDB system views.

sql

-- Job execution health: last status, failures, and runtime per policy
SELECT j.proc_name, j.hypertable_name,
       s.last_run_status, s.last_successful_finish,
       s.total_runs, s.total_failures, s.total_duration
FROM timescaledb_information.jobs      AS j
JOIN timescaledb_information.job_stats AS s USING (job_id)
ORDER BY s.total_failures DESC;

-- Compression effectiveness per hypertable (before vs after bytes)
SELECT hypertable_name,
       pg_size_pretty(before_compression_total_bytes) AS before,
       pg_size_pretty(after_compression_total_bytes)  AS after,
       round(100 * (1 - after_compression_total_bytes::numeric
             / NULLIF(before_compression_total_bytes, 0)), 1) AS pct_saved
FROM hypertable_compression_stats('sensor_readings');

-- Live chunk count and compression state (catalog-overhead signal)
SELECT count(*) FILTER (WHERE is_compressed)     AS compressed_chunks,
       count(*) FILTER (WHERE NOT is_compressed) AS raw_chunks
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings';

Key metrics to alert on: aggregate refresh lag (freshness of the newest materialized bucket), raw-vs-compressed chunk ratio, per-hypertable compression percentage, and total_failures climbing across successive runs. Export these via a PostgreSQL Prometheus exporter and alert on background-worker saturation and hypertable bloat. After large retention sweeps, run VACUUM (VERBOSE, ANALYZE) sensor_readings; so MVCC dead tuples are reclaimed and planner statistics stay accurate; see the official PostgreSQL VACUUM documentation for autovacuum tuning in high-write environments.

By coupling deterministic SQL policies with idempotent Python fallbacks and continuous validation against job_stats, engineering teams maintain predictable storage costs, enforce compliance boundaries, and guarantee sub-second query performance across multi-year telemetry datasets.

← Back to all topics

In this topic:

TTL Policy Mapping & Enforcement — map SLAs to drop_after windows and enforce them with background jobs.
Chunk Compression Scheduling & Automation — sequence columnar compression as chunks age out of the hot tier.

Related across the site:

Hypertable Architecture & Partitioning — the partitioning foundation every lifecycle policy runs on.
Continuous Aggregates in TimescaleDB — build the rollups that must materialize before raw chunks drop.
Incremental vs Full Refresh Strategies — keep aggregate refresh ahead of the retention horizon.
Compression Models for High-Frequency Telemetry — tune segmentby/orderby for both storage and scan speed.
Security Boundaries & Access Control — scope the privileged workers that run retention and compression.

Data Retention & Compression Lifecycle Automation

# Architecture Baseline

# The Retention & Compression Lifecycle

# Stage 1 — Downsample before you delete

# Stage 2 — Compress warm chunks

# Stage 3 — Drop or archive cold chunks

# Automation Patterns

# Performance & Scale

# Failure Modes & Operational Gotchas

# Monitoring Checklist

# Related & Navigation

In this topic