Compression Models for High-Frequency Telemetry

High-frequency telemetry ingestion introduces compounding storage costs, index bloat, and query degradation as row counts climb into the billions. Once a sensor fleet writes millions of narrow rows per hour, the uncompressed row-store layout that makes writes fast becomes the single largest line item on the storage bill and the primary cause of slow historical scans. This guide solves one focused problem for IoT platform developers, DevOps engineers, and Python automation builders: how to convert aging chunks of a hypertable into TimescaleDB’s columnar format so that storage shrinks by an order of magnitude while analytical queries stay fast. It builds directly on the core hypertable architecture and partitioning strategy, because compression is a per-chunk operation whose behaviour is governed entirely by how those chunks were partitioned in the first place.

The mechanism above is the whole story in miniature: a freshly ingested chunk lives in row form for fast writes; once it crosses a configurable age threshold a background job rewrites it into per-column arrays grouped by a segmentby key and sorted by an orderby key; and those ordered arrays are then delta-, run-length-, and dictionary-encoded. Every implementation decision on this page is really a decision about how to feed that encoder the most compressible input.

Prerequisites

Compression is a native feature of the TimescaleDB extension, but it only behaves predictably when the surrounding environment is tuned for the memory-intensive rewrite that columnar conversion performs. Validate each item before enabling it in production.

TimescaleDB 2.10 or newer installed and loaded via shared_preload_libraries = 'timescaledb' (2.18+ additionally exposes the same engine under the columnstore naming, e.g. add_columnstore_policy; the compress APIs below remain fully supported)
The target table is already a hypertable with a deliberate chunk_time_interval — see time-based chunk partitioning strategies for why interval width dictates compression batch size
timescaledb.max_background_workers set to at least 16 so the compression policy is never starved of a scheduler slot
maintenance_work_mem set to at least 256MB per worker, since each columnar rewrite buffers a full chunk’s segments in memory
Ingestion uses batched INSERT or COPY to keep WAL churn low and to land dense, contiguous rows that compress well
A service account that owns the hypertable, required later to attach policies and to align security boundaries and access control with compressed chunks

The worked examples assume a sensor_readings hypertable partitioned on a time column, with columns device_id, sensor_type, and a numeric value — the canonical shape for 1Hz industrial or environmental telemetry.

Step-by-Step Implementation

The five steps below map onto the flow in the diagram above: configure the columnar model, schedule the conversion, pair it with a downsampled retention window, and confirm the encoder is doing its job. Each step is idempotent and safe to re-run from a migration.

Step 1 — Choose the columnar model (`segmentby` and `orderby`)

Compression quality is decided almost entirely by two parameters. segmentby groups rows that share a key into the same compressed batch, so it should be the identifier you most often filter on. orderby sorts rows inside each batch, and because delta and run-length encoding reward monotonic runs, ordering by time DESC is the default that maximises the compression ratio for append-only telemetry.

sql

-- Enable compression on a high-frequency telemetry hypertable.
ALTER TABLE sensor_readings SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'device_id, sensor_type',
  timescaledb.compress_orderby   = 'time DESC'
);

Favour moderate-cardinality segmentby columns. Segmenting by an extremely high-cardinality key produces many tiny single-row segments and collapses the ratio — a failure mode covered in depth in best practices for chunk indexing on high-cardinality tags.

Step 2 — Attach the compression policy

Enabling compression only makes chunks eligible. A policy is what actually schedules the background rewrite, firing on any chunk whose data has aged past the given interval. Keep the interval comfortably larger than your busiest query window so you are never decompressing hot data on read.

sql

-- Compress chunks whose newest row is older than 3 days.
SELECT add_compression_policy('sensor_readings', INTERVAL '3 days', if_not_exists => true);

The scheduling internals — worker contention, batching, and how this policy coexists with retention sweeps — are the subject of the sibling guide on chunk compression scheduling automation.

Step 3 — Pair compression with a downsampled rollup

Raw telemetry is usually retained for only days, while the questions asked of it (“what was the hourly average for this device last quarter?”) persist for months. A continuous aggregate refresh policy materialises those rollups so the raw chunks can be compressed aggressively and then dropped without losing analytical history.

sql

-- Hourly rollups over the raw hypertable.
CREATE MATERIALIZED VIEW IF NOT EXISTS sensor_readings_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', time) AS bucket,
  device_id,
  sensor_type,
  avg(value)  AS avg_value,
  min(value)  AS min_value,
  max(value)  AS max_value,
  count(*)    AS sample_count
FROM sensor_readings
GROUP BY bucket, device_id, sensor_type
WITH NO DATA;

-- Refresh the settled window (3h ago -> 1h ago); the positive end_offset leaves
-- the current, still-filling bucket untouched.
SELECT add_continuous_aggregate_policy(
  'sensor_readings_hourly',
  start_offset      => INTERVAL '3 hours',
  end_offset        => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour',
  if_not_exists     => true
);

Ordering matters: aggregates must materialise before the raw chunk they read is dropped. That invariant, and how to reason about the settled-window offsets, is detailed in refresh policy design and scheduling.

Step 4 — Bound raw storage with retention

With rollups safely materialised, a TTL policy mapping and enforcement rule drops raw chunks once they age out, capping table growth. Set the retention horizon comfortably larger than the compression interval so every chunk is compressed (and its rollup materialised) well before it becomes a drop candidate.

sql

-- Drop raw chunks older than 14 days; hourly rollups persist far longer.
SELECT add_retention_policy('sensor_readings', INTERVAL '14 days', if_not_exists => true);

Step 5 — Automate verification from Python

Platform teams that onboard tenants dynamically or deploy across many environments need programmatic confirmation that compression is enabled and its jobs are scheduled — not a one-off manual check. The following psycopg (v3) routine is idempotent and queries only the timescaledb_information views, so it stays resilient across extension upgrades.

python

import logging

import psycopg
from psycopg import sql

logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")


def audit_compression(conn_string: str, hypertable: str, aggregate: str) -> None:
    """Confirm compression is enabled and its policies are scheduled."""
    with psycopg.connect(conn_string) as conn, conn.cursor() as cur:
        cur.execute(
            sql.SQL(
                "SELECT compression_enabled "
                "FROM timescaledb_information.hypertables "
                "WHERE hypertable_name = %s"
            ),
            [hypertable],
        )
        row = cur.fetchone()
        if not row or not row[0]:
            logging.warning("Compression not enabled on %s; skipping.", hypertable)
            return

        cur.execute(
            sql.SQL(
                "SELECT job_id, next_start "
                "FROM timescaledb_information.jobs "
                "WHERE hypertable_name = %s AND proc_name = 'policy_compression'"
            ),
            [hypertable],
        )
        jobs = cur.fetchall()
        logging.info("Active compression jobs on %s: %d", hypertable, len(jobs))

        # The refresh job is keyed to the materialization hypertable, so join
        # through continuous_aggregates to match by the user-facing view name.
        cur.execute(
            sql.SQL(
                "SELECT js.last_successful_finish "
                "FROM timescaledb_information.job_stats js "
                "JOIN timescaledb_information.jobs j ON j.job_id = js.job_id "
                "JOIN timescaledb_information.continuous_aggregates ca "
                "  ON ca.materialization_hypertable_name = j.hypertable_name "
                "WHERE ca.view_name = %s "
                "  AND j.proc_name = 'policy_refresh_continuous_aggregate'"
            ),
            [aggregate],
        )
        last = cur.fetchone()
        if last:
            logging.info("Last aggregate refresh: %s", last[0])


if __name__ == "__main__":
    # Use environment variables for credentials in production.
    audit_compression(
        conn_string="postgresql://user:pass@localhost:5432/telemetry_db",
        hypertable="sensor_readings",
        aggregate="sensor_readings_hourly",
    )

This pattern drops cleanly into a CI/CD pipeline, a Kubernetes CronJob, or an Airflow DAG that runs after every migration.

Configuration Parameters Reference

The handful of knobs below account for nearly all of the variance in real-world compression ratios and job behaviour. Tune segmentby first; it has the largest single effect.

Parameter	Type	Recommended value	Effect
`timescaledb.compress_segmentby`	column list	1–2 moderate-cardinality ID columns (`device_id`)	Groups like rows into one batch; the strongest lever on ratio and on filtered-scan speed
`timescaledb.compress_orderby`	column list	`time DESC`	Sorts rows within a batch so delta/run-length encoding sees long monotonic runs
`add_compression_policy` interval	interval	`3 days` (larger than the hot query window)	Age after which a chunk is rewritten to columnar form
`chunk_time_interval`	interval	sized so a chunk fits comfortably in `maintenance_work_mem`	Sets the batch size of each compression job; oversized chunks stall workers
`maintenance_work_mem`	memory	`≥ 256MB` per worker	Working memory for the columnar rewrite; too low spills to disk and slows the job
`add_retention_policy` interval	interval	`14 days` (well past `compress_after`)	Age after which raw chunks are dropped once rollups exist

Integration With Adjacent Features

Compression never operates in isolation — it is one stage of a lifecycle whose ordering must be respected. It reads the chunks produced by time-based partitioning, and its output is consumed by three neighbouring subsystems.

Continuous aggregates. A materialized view architecture built over a compressed hypertable reads decompressed data transparently at refresh time. Because refresh only touches the settled window, keeping the compression interval larger than the refresh window means aggregate refreshes almost always read row-store data — avoiding the cost of decompressing recent chunks on every cycle.

Multi-tenant space partitioning. When a hypertable also carries a space dimension for tenant isolation, aligning segmentby with the space key keeps each tenant’s rows in their own compressed segments and simplifies row-level security. See configuring space partitions for multi-tenant time-series for the routing rules.

Retention. Compression and retention share a scheduler and can contend for locks on the same chunk. The full-lifecycle orchestration — compress, then drop, in that order — is governed by data retention, compression and lifecycle automation.

Performance Validation

The expected compression ratio for dense sensor data is roughly:

R = \frac{\text{before\_compression\_total\_bytes}}{\text{after\_compression\_total\_bytes}}

A healthy sensor_readings-shaped workload lands at $R \geq 3$ ; well-chosen segmentby keys on regular 1Hz data routinely reach $R \geq 10$ . Anything below 3 signals a cardinality or density problem, not a TimescaleDB limitation. Confirm the realized ratio directly from the system views:

sql

-- Realized compression ratio for the hypertable.
SELECT
  pg_size_pretty(before_compression_total_bytes) AS before,
  pg_size_pretty(after_compression_total_bytes)  AS after,
  round(
    before_compression_total_bytes::numeric
      / nullif(after_compression_total_bytes, 0),
    1
  ) AS ratio
FROM hypertable_compression_stats('sensor_readings');

-- How many chunks are compressed vs still in row store.
SELECT
  count(*)                                      AS total_chunks,
  count(*) FILTER (WHERE is_compressed)         AS compressed,
  count(*) FILTER (WHERE NOT is_compressed)     AS uncompressed
FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_readings';

-- Confirm the compression job itself is healthy.
SELECT j.job_id, s.last_run_status, s.last_successful_finish, s.total_failures
FROM timescaledb_information.jobs      AS j
JOIN timescaledb_information.job_stats AS s USING (job_id)
WHERE j.hypertable_name = 'sensor_readings'
  AND j.proc_name = 'policy_compression';

If uncompressed stays high while last_successful_finish keeps advancing, the policy is running but falling behind — usually a sign that max_background_workers is too low for the chunk backlog.

Troubleshooting

ERROR: cannot update/delete rows from chunk … it is compressed On older versions, UPDATE and DELETE against a compressed chunk are rejected. Correct historical rows before the compression horizon, or explicitly decompress the target chunk with decompress_chunk(), apply the change, and let the policy recompress it.

Compression ratio stuck below 3:1. Almost always an over-specific segmentby. Segmenting by a near-unique column (a raw event UUID, a high-cardinality tag) yields one row per segment and defeats the encoder. Move that column out of segmentby, keep only stable IDs, and recompress.

ERROR: tuple decompression limit exceeded during backfill. Bulk-writing into a range that is already compressed forces mass decompression. Route late and reconnect traffic through a staging path instead — the pattern is handling out-of-order data insertion in TimescaleDB.

The compression job never fires. If timescaledb_information.jobs shows no policy_compression row, compression was enabled with ALTER TABLE but no policy was attached — eligibility is not scheduling. Re-run add_compression_policy. If the row exists but next_start is in the past and never advances, the scheduler is starved: raise max_background_workers and reload the configuration.

Retention drops blocked by long compression transactions. A retention sweep and a compression job targeting adjacent chunks can serialize on catalog locks. Stagger their schedule_interval values so they do not fire in the same window.

Frequently Asked Questions

Can I query compressed chunks directly?

Yes. Compression is transparent to SQL — SELECT statements read compressed and uncompressed chunks in the same query, and the planner decompresses only the segments a filter touches. Filtering on a segmentby column is fastest because whole non-matching segments are skipped without decompression.

Should the compression interval be shorter or longer than my chunk interval?

Longer than your hot query window, and it is evaluated per chunk. A common, safe arrangement is a 1-day chunk interval with a 3-day compression policy, so a chunk is fully closed and settled before it is ever rewritten.

Does compression slow down ingestion?

No. New rows always land in the row-store portion of the newest, uncompressed chunk. Compression only ever touches older chunks that have aged past the policy interval, so the write path is unaffected.

How do I change segmentby after data is already compressed?

You must decompress the affected chunks first (decompress_chunk()), run ALTER TABLE … SET (timescaledb.compress_segmentby = …), then recompress. The columnar layout is fixed at compression time, so an in-place change is not possible.

Why did my compression ratio drop after adding a new sensor type?

A new high-cardinality value in a segmentby column multiplies the number of segments, and sparse per-segment data compresses poorly. If the new dimension is not a common filter key, remove it from segmentby and rely on orderby locality instead.

Sibling guides in this section

Time-Based Chunk Partitioning Strategies — interval sizing that sets each compression batch
Space Partitioning for Multi-Tenant IoT — aligning segmentby with tenant isolation keys
Security Boundaries & Access Control — RLS on hypertables before compression is enabled
Fallback Routing for Legacy Data — staging paths that avoid mass decompression on backfill

Across the platform

Chunk Compression Scheduling Automation — worker contention and batching for the compression policy
TTL Policy Mapping & Enforcement — dropping raw chunks once rollups exist
Continuous Aggregate Creation & Refresh Management — the rollup layer that lets you compress raw data aggressively

← Core Hypertable Architecture & Partitioning Strategy

Compression Models for High-Frequency Telemetry

# Prerequisites

# Step-by-Step Implementation

# Step 1 — Choose the columnar model (segmentby and orderby)

# Step 2 — Attach the compression policy

# Step 3 — Pair compression with a downsampled rollup

# Step 4 — Bound raw storage with retention

# Step 5 — Automate verification from Python

# Configuration Parameters Reference

# Integration With Adjacent Features

# Performance Validation

# Troubleshooting