TTL Policy Mapping & Enforcement

Time-to-live (TTL) policy mapping and enforcement forms the operational backbone of scalable time-series infrastructure. In high-throughput IoT telemetry and industrial automation workloads, unmanaged data growth directly inflates query latency, storage cost, and backup windows. The single engineering problem this guide solves is deterministic: how to translate a business or compliance retention window into a background job that drops exactly the right chunks, at the right time, without racing your compression or ingestion path. TimescaleDB solves it with declarative retention policies that map service-level agreements (SLAs) to physical chunk boundaries rather than individual rows. This page sits inside the broader data retention and compression lifecycle and assumes you already have a hypertable ingesting telemetry. Effective enforcement demands more than deleting old rows: it requires mapping TTL windows to hypertable partitions, automating enforcement with background workers, and sequencing it correctly against compression and vacuum cycles.

Prerequisites

Retention enforcement runs as a background job on the same instance that services ingestion, so an undersized worker pool or misaligned chunk boundary surfaces as a policy timeout and creeping storage growth rather than a hard error. Validate the environment below before registering any policy.

TimescaleDB 2.10 or newer on PostgreSQL 14+ (the add_retention_policy / drop_after semantics and the job_errors view assume this baseline).
Chunk intervals aligned to your retention granularity — a policy can only drop whole chunks, so read optimal chunk_interval sizing for IoT sensor data before enabling enforcement.
timescaledb.max_background_workers sized to cover the retention job alongside every concurrent compression and aggregate-refresh job.
max_worker_processes raised in tandem so background jobs never starve application connections.
The hypertable owned by a dedicated service account — retention jobs execute as the table owner, and a missing privilege silently fails the drop.
A schema whose partitioning (time) column matches the interval you intend to retain against; drop_after is measured from that column, not from insertion time.

The scheduler evaluates policies against the timescaledb_information.chunks catalog, so enforcement is chunk-aware, not row-aware. That design guarantees near-instant DROP TABLE reclamation but requires strict alignment between your partitioning column and the retention window.

Step-by-step implementation

Each step below maps to a node in the flowchart above: profile the SLA, map it to a drop_after interval, register the policy, then let the scheduler evaluate chunk boundaries.

1. Map the SLA to a drop_after interval

Start from the business or compliance requirement and express it as a single interval. Round to a whole multiple of your chunk interval so that no partially-expired chunk is ever left half-dropped. The number of chunks the policy keeps resident is bounded by:

N_{chunks} \approx \left\lceil \frac{W_{retention}}{I_{chunk}} \right\rceil + 1

where $W_{retention}$ is the retention window and $I_{chunk}$ is the chunk interval. The extra chunk accounts for the current, still-filling partition, which the policy will never drop.

2. Register the retention policy

add_retention_policy registers a background job that compares each chunk’s range end against now() - drop_after and drops the chunks that fall entirely outside the window. Use if_not_exists => true so re-running your deployment is idempotent.

sql

-- Map a 90-day TTL to the 'telemetry' hypertable
SELECT add_retention_policy(
    'telemetry',
    drop_after      => INTERVAL '90 days',
    schedule_interval => INTERVAL '1 day',
    initial_start   => now(),
    if_not_exists   => true
);

3. Confirm the job registered and inspect its schedule

Every policy becomes a row in timescaledb_information.jobs with proc_name = 'policy_retention'. Verifying registration before you rely on it prevents the classic “policy exists but never runs” failure.

sql

SELECT job_id, application_name, schedule_interval, config->>'drop_after' AS drop_after
FROM   timescaledb_information.jobs
WHERE  proc_name = 'policy_retention'
  AND  hypertable_name = 'telemetry';

4. Tune runtime behaviour with alter_job

For strict SLAs, adjust max_runtime, retry_period, or max_retries so a transient lock does not stall the sweep or, conversely, so a long-running drop does not block the next window.

sql

SELECT alter_job(
    (SELECT job_id FROM timescaledb_information.jobs
     WHERE proc_name = 'policy_retention' AND hypertable_name = 'telemetry'),
    max_runtime  => INTERVAL '10 minutes',
    retry_period => INTERVAL '5 minutes',
    next_start   => now()
);

5. (Advanced) Conditional retention for irregular series

Standard interval policies assume uniform data distribution. Deployments with sporadic device connectivity, event-driven logging, or per-tenant SLAs often need retention logic that deviates from a fixed calendar interval. Register a custom PL/pgSQL procedure with add_job; it can read metadata, apply business-specific filters, and call drop_chunks only when a threshold is met.

sql

CREATE OR REPLACE PROCEDURE custom_tenant_retention(job_id INT, config JSONB)
LANGUAGE plpgsql AS $$
DECLARE
    retention_days INT := (config->>'retention_days')::INT;
BEGIN
    -- drop_chunks operates on the entire hypertable, not a single tenant.
    -- True per-tenant retention requires space partitioning by tenant, since
    -- drop_chunks cannot filter by a tag column. The relation argument expects
    -- a regclass; pass the table name as a literal for the implicit cast.
    PERFORM drop_chunks(
        relation   => 'telemetry',
        older_than => (now() - (retention_days || ' days')::INTERVAL)
    );
END;
$$;

SELECT add_job('custom_tenant_retention', '1 day', config => '{"retention_days": 30}');

Because drop_chunks cannot filter by a tag column, genuine per-tenant TTLs depend on space partitioning for multi-tenant IoT; model the partitioning first, then attach the conditional job.

6. Idempotent deployment from Python

For automation builders, orchestrate policies programmatically with psycopg (v3), verifying existing jobs before applying changes so bulk deployments stay idempotent.

python

import psycopg
from psycopg.rows import dict_row


def ensure_retention_policy(dsn: str, hypertable: str, retention_interval: str) -> bool:
    """Idempotently register a retention policy. Returns True if newly created."""
    with psycopg.connect(dsn, row_factory=dict_row) as conn, conn.cursor() as cur:
        # Step 1: has a retention job already been registered for this table?
        cur.execute(
            """
            SELECT count(*) AS n
            FROM   timescaledb_information.jobs
            WHERE  proc_name = 'policy_retention'
              AND  hypertable_schema = 'public'
              AND  hypertable_name = %s
            """,
            (hypertable,),
        )
        if cur.fetchone()["n"] > 0:
            return False

        # Step 2: register it; if_not_exists guards against a concurrent deploy.
        cur.execute(
            """
            SELECT add_retention_policy(
                %s,
                drop_after    => %s::interval,
                if_not_exists => true
            )
            """,
            (hypertable, retention_interval),
        )
        conn.commit()
        return True

Use connection pooling for bulk deployments so concurrent policy registration does not exhaust the scheduler’s worker slots.

Configuration parameters reference

Parameter	Type	Recommended value	Effect
`drop_after`	`interval`	Whole multiple of chunk interval	Chunks whose range ends before `now() - drop_after` are dropped
`schedule_interval`	`interval`	`1 day`	How often the sweep runs; too frequent wastes worker slots
`initial_start`	`timestamptz`	Off-peak timestamp	Anchors the first run away from ingestion peaks
`if_not_exists`	`boolean`	`true`	Makes re-registration a no-op instead of an error
`max_runtime` (via `alter_job`)	`interval`	`10 min`	Caps a single sweep so a long drop can’t block the queue
`retry_period` (via `alter_job`)	`interval`	`5 min`	Backoff before retrying a failed sweep
`max_retries` (via `alter_job`)	`integer`	`-1` (unlimited) or `3`	Bounds retries on persistent failures

Integration with adjacent features

TTL enforcement never operates in isolation; the ordering against compression is the single most consequential decision. Coordinate retention with chunk compression scheduling and automation so data transitions from the row store to the columnar store before deletion — compression must always precede the retention drop, which maximizes I/O efficiency and shrinks backup payloads. The two policies are complementary: compress_after moves warm chunks to columnar form; drop_after, set to a larger interval, removes them once they age past the retention horizon.

Retention also interacts with your continuous aggregate refresh lifecycle. If a chunk is dropped before its rollup is materialized, the aggregate silently loses that window’s data. Always confirm the aggregate covering a time range has been refreshed before the raw chunks behind it become eligible to drop. Following each drop, PostgreSQL’s autovacuum daemon reclaims dead tuples and updates visibility maps; tuning autovacuum_vacuum_scale_factor and autovacuum_vacuum_cost_delay alongside your retention windows keeps background cleanup pacing with ingestion velocity.

Performance validation

Query the TimescaleDB system views to confirm the policy is actually reclaiming space rather than merely existing.

Check that the sweep runs and succeeds:

sql

SELECT job_id, last_run_started_at, last_successful_finish,
       last_run_status, total_runs, total_successes, total_failures
FROM   timescaledb_information.job_stats
WHERE  job_id = (
    SELECT job_id FROM timescaledb_information.jobs
    WHERE proc_name = 'policy_retention' AND hypertable_name = 'telemetry'
);

Confirm the resident chunk count matches the $N_{chunks}$ you computed, and that the oldest chunk falls inside the retention window:

sql

SELECT count(*) AS chunk_count,
       min(range_start) AS oldest_chunk_start
FROM   timescaledb_information.chunks
WHERE  hypertable_name = 'telemetry';

If chunk_count drifts above the expected bound or oldest_chunk_start predates now() - drop_after, the sweep is falling behind — investigate worker starvation or a max_runtime cap that is truncating each run.

Troubleshooting

ERROR: retention policy already exists for hypertable "telemetry" — a policy is registered without if_not_exists => true. Add the guard, or drop the existing job first with SELECT remove_retention_policy('telemetry');.

Chunks never drop even though they are older than drop_after — the job is registered but not executing. Confirm last_run_started_at is advancing in job_stats; a stale timestamp usually means timescaledb.max_background_workers is exhausted by concurrent compression and refresh jobs. Raise the worker pool and max_worker_processes together.

ERROR: cannot drop chunk ... it contains compressed data (older versions) — on TimescaleDB releases that block dropping compressed chunks directly, ensure the retention interval is larger than the compression interval so a chunk is never dropped mid-transition, and upgrade to a release where drop_chunks handles compressed chunks natively.

Retention job status is Failed with a lock-timeout message — the sweep collided with a long-running query or a compression pass holding a lock on the same chunk. Stagger initial_start away from the compression window and set a bounded retry_period via alter_job so the sweep backs off and retries.

ERROR: invalid input syntax for type interval — the drop_after value was passed as a bare number or malformed string. Always cast explicitly, e.g. INTERVAL '90 days' or %s::interval from Python.

Frequently Asked Questions

Does drop_chunks delete partial chunks or individual rows?

No. Retention operates at chunk granularity: a chunk is dropped only when its entire time range falls outside the window, via a metadata DROP TABLE. It never deletes individual rows, which is why the operation is near-instantaneous and why aligning the chunk interval to the retention window matters.

Can I set a different TTL per tenant in one hypertable?

Not with a single add_retention_policy call — drop_chunks cannot filter by a tag column. Model the table with space partitioning by tenant first, then register a conditional add_job procedure that applies per-tenant thresholds against the appropriate partitions.

Should compression run before or after the retention policy?

Always compress first. Compression must precede the retention drop so warm chunks reach the columnar store before deletion, minimizing both disk footprint and backup payload. Set compress_after to a smaller interval than drop_after.

What happens to a continuous aggregate if its source chunks are dropped?

The aggregate keeps any buckets already materialized, but any window whose raw chunks were dropped before a refresh is lost permanently. Confirm the covering refresh has completed before the underlying chunks age past drop_after.

How do I safely test a retention policy without losing data?

Register the policy with a drop_after far larger than your current data span, verify it appears in timescaledb_information.jobs, then use SELECT show_chunks('telemetry', older_than => INTERVAL '90 days'); to preview exactly which chunks a real interval would target before you tighten it.

Chunk Compression Scheduling & Automation — the compression stage that must run before any retention drop
Columnar compression models for high-frequency telemetry — how segmentby and orderby shape what retention later reclaims
Configuring space partitions for multi-tenant time-series — the prerequisite for genuine per-tenant TTLs
Refresh policy design & scheduling — sequence rollup materialization ahead of chunk drops
Time-based chunk partitioning strategies — the partitioning decisions that determine retention granularity

← Back to Data Retention & Compression Lifecycle Automation

TTL Policy Mapping & Enforcement

# Prerequisites

# Step-by-step implementation

# 1. Map the SLA to a drop_after interval

# 2. Register the retention policy

# 3. Confirm the job registered and inspect its schedule

# 4. Tune runtime behaviour with alter_job

# 5. (Advanced) Conditional retention for irregular series

# 6. Idempotent deployment from Python

# Configuration parameters reference

# Integration with adjacent features

# Performance validation

# Troubleshooting