Handling Out-of-Order Data Insertion in TimescaleDB

Out-of-order inserts are safe in TimescaleDB only when your continuous aggregate refresh window and retention horizon are sized to cover the worst-case late-arrival lag — this page gives the deterministic formula for those windows and the SQL to verify it.

Late-arriving data is a baseline operating condition for distributed telemetry, edge IoT gateways, and asynchronous Python pipelines: network partitions, clock skew, and batched retries all deliver timestamps to the ingestion layer non-sequentially. TimescaleDB’s hypertable will happily route a row backwards in time to the chunk that owns its timestamp, so the raw INSERT is rarely the problem. The failure mode is silent: a row lands after the continuous aggregate refresh policy has already materialized its bucket, or before the retention sweep has already dropped its chunk — and the data is lost or invisible. This is the reordering edge case beneath the broader fallback routing for legacy data pattern, and it reduces to sizing three time windows correctly.

Input Profiling

Before you can size any window, measure the late-arrival distribution of your fleet rather than guessing at it. Gather the following from a representative production window:

Maximum late-arrival lag ( $L_{max}$ ) — the largest gap you observe between a reading’s event timestamp and its ingestion timestamp. Capture the p99.9, not the mean; reconnecting gateways flush hours of buffered data at once.
Clock skew tolerance ( $\sigma_{clock}$ ) — the worst-case disagreement between device clocks and the database clock. Unsynchronized edge devices routinely drift seconds to minutes.
Aggregate refresh window — the current start_offset and end_offset of every continuous aggregate refresh policy defined over the target hypertable.
Retention horizon — the drop_after interval enforced by your TTL policy mapping and enforcement job.
Compression boundary — how old a chunk is before columnar compression seals it, since inserts that reach past this line pay a decompress-rewrite cost.

Record $L_{max}$ and $\sigma_{clock}$ as intervals in the same units. Everything downstream is derived from them.

The Window-Safety Calculation

Two inequalities keep out-of-order rows both materialized and retained. First, the refresh window must reach back far enough to re-materialize any bucket a late row can land in:

\text{start\_offset} \;\ge\; L_{max} + \sigma_{clock}

Second, the retention horizon must outlive the entire refresh window plus the late-arrival budget, so a chunk is never dropped while a late row could still target it:

\text{drop\_after} \;\ge\; \text{start\_offset} + L_{max} + \sigma_{clock}

Violating the first inequality produces stale aggregates; violating the second produces silent data loss. The chunk_time_interval is the third lever: it should exceed $L_{max}$ so that a single late batch touches one or two chunks rather than fragmenting a dozen — size it with the method in optimal chunk_interval for IoT sensor data.

The two inequalities reduce to a single nesting rule on the time axis: the late-arrival budget must sit inside the refresh window, which must sit inside the retention horizon. When a late row lands in that innermost band it is re-materialized and never swept.

Encode the two inequalities directly in the hypertable and policy DDL. Create the hypertable with an interval that comfortably spans the late window, then attach a backward-looking refresh policy and an aligned retention policy:

sql

-- Hypertable with late-arrival tolerance; index created after creation
SELECT create_hypertable(
  'telemetry_readings',
  'recorded_at',
  chunk_time_interval => INTERVAL '24 hours',
  if_not_exists       => TRUE,
  create_default_indexes => FALSE
);

CREATE INDEX IF NOT EXISTS idx_telemetry_time_desc
  ON telemetry_readings (recorded_at DESC);

-- Real-time aggregate: raw + materialized data merge transparently
CREATE MATERIALIZED VIEW IF NOT EXISTS hourly_device_metrics
  WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', recorded_at) AS bucket,
       device_id,
       avg(temperature) AS avg_temp,
       count(*)         AS reading_count
FROM telemetry_readings
GROUP BY 1, 2;

-- start_offset >= L_max + clock skew  (48h covers a 2-day reconnect buffer)
SELECT add_continuous_aggregate_policy(
  'hourly_device_metrics',
  start_offset      => INTERVAL '48 hours',
  end_offset        => INTERVAL '1 hour',
  schedule_interval => INTERVAL '15 minutes',
  if_not_exists     => TRUE
);

-- drop_after >= start_offset + L_max + skew  (90d clears 48h + slack)
SELECT add_retention_policy(
  'telemetry_readings',
  drop_after    => INTERVAL '90 days',
  if_not_exists => TRUE
);

On the client side, make every late batch idempotent so retry loops never double-count a metric. A composite unique index on (device_id, recorded_at) plus ON CONFLICT turns a replayed batch into a no-op, and normalizing to UTC before insert prevents skew from misrouting rows across chunk boundaries:

python

import asyncio
import asyncpg
from datetime import datetime, timezone
from typing import Sequence, Dict, Any

# Idempotent async ingestion — safe to retry after a network failure
async def ingest_telemetry_batch(
    pool: asyncpg.Pool,
    records: Sequence[Dict[str, Any]],
) -> None:
    query = """
        INSERT INTO telemetry_readings
            (device_id, recorded_at, temperature, humidity)
        VALUES ($1, $2, $3, $4)
        ON CONFLICT (device_id, recorded_at) DO UPDATE
            SET temperature = EXCLUDED.temperature,
                humidity    = EXCLUDED.humidity;
    """
    rows = [
        (
            r["device_id"],
            datetime.fromisoformat(r["recorded_at"]).astimezone(timezone.utc),
            r["temperature"],
            r["humidity"],
        )
        for r in records
    ]
    async with pool.acquire() as conn:
        async with conn.transaction():
            await conn.executemany(query, rows)

Worked Example

Take a fleet of 5,000 industrial sensors emitting one reading every 10 seconds (~500 rows/sec). Cellular gateways buffer locally and, on a bad link, replay up to 36 hours of readings on reconnect, so $L_{max} = 36\text{h}$ . NTP is best-effort at the edge, giving $\sigma_{clock} = 30\text{min}$ .

Apply the inequalities:

Refresh window: $\text{start\_offset} \ge 36\text{h} + 0.5\text{h} = 36.5\text{h}$ . Round up to an operational boundary → start_offset => INTERVAL '48 hours', exactly the policy above. Any bucket a 36-hour-late row lands in is re-materialized on the next 15-minute run.
Retention: the horizon must outlast the whole offset stack:
$\text{drop\_after} \ge 48\text{h} + 36\text{h} + 0.5\text{h} = 84.5\text{h} \approx 3.5\text{ days}$
The business keeps 90 days of raw data anyway, which clears the floor with wide margin — but if you were tempted to trim retention to 3 days to save storage, the formula shows that would start dropping chunks a reconnecting gateway can still target.
Chunk interval: with $L_{max} = 36\text{h}$ , a 24-hour chunk_time_interval means one late batch touches at most two historical chunks — acceptable. A 1-hour interval would spray the same batch across 36 chunks and bloat the catalog.

The result: a 36-hour-late replay from any gateway inserts cleanly, refreshes only its own two-day window, and is never swept by retention.

Edge Cases and When to Deviate

Backfill older than $L_{max}$ . A one-off historical import (protocol migration, archived export) exceeds the routine late window. Do not stretch start_offset to cover it — run a manual refresh_continuous_aggregate over the exact imported range instead, following incremental vs full refresh strategies.
Late data into a compressed chunk. If the merge window reaches past the compression boundary, the insert decompresses and rewrites the chunk. Keep routine late arrivals newer than that boundary and schedule deep backfills as maintenance — see chunk compression scheduling automation.
Multi-tenant fleets. One tenant’s mass reconnect can starve another’s live path. Isolate the backfill load with space partitioning for multi-tenant IoT so a reconnect storm stays a noisy-neighbour non-event.
Refresh lag exceeds schedule_interval. When a reconnect burst queues more refresh work than the window can drain, the aggregate falls behind now(). Throttle admission through asynchronous execution and queue management rather than widening start_offset.
materialized_only = true. If real-time merging is disabled, a late row is invisible until its bucket is re-materialized. Either enable real-time aggregation or shorten schedule_interval so the visibility gap stays within SLA.

Verification

Confirm the applied windows behave as calculated. First, check that the aggregate has actually materialized up to a recent watermark and is not lagging behind the late-data horizon:

sql

-- Refresh job health for the aggregate over the late-data window
SELECT j.hypertable_name,
       j.config ->> 'start_offset' AS start_offset,
       js.last_run_status,
       js.last_successful_finish,
       js.total_failures,
       js.next_start
FROM timescaledb_information.jobs j
LEFT JOIN timescaledb_information.job_stats js USING (job_id)
WHERE j.proc_name = 'policy_refresh_continuous_aggregate';

A last_successful_finish that trails now() by more than schedule_interval, or a climbing total_failures, means late batches are outpacing the refresh — throttle ingestion before widening the window. Second, verify the retention floor still clears the refresh window so no droppable chunk overlaps the late-data range:

sql

-- Oldest live chunk vs the retention horizon
SELECT hypertable_name,
       min(range_start) AS oldest_chunk_start,
       now() - min(range_start) AS oldest_data_age
FROM timescaledb_information.chunks
WHERE hypertable_name = 'telemetry_readings'
GROUP BY hypertable_name;

If oldest_data_age is drifting toward your drop_after interval faster than expected, a burst of very old late data is creating historical chunks the retention sweep will soon reclaim — validate each batch’s minimum timestamp against the TTL before merging.

← Back to Fallback Routing for Legacy Data · Part of Core Hypertable Architecture & Partitioning Strategy

Optimal chunk_interval for IoT Sensor Data — sizing the chunks a late batch has to touch.
Refresh Policy Design & Scheduling — tuning start_offset and schedule_interval.
TTL Policy Mapping & Enforcement — aligning drop_after with the late-data budget.
Incremental vs Full Refresh Strategies — choosing the refresh scope after a backfill.
Compression Models for High-Frequency Telemetry — why writing into sealed chunks is expensive.

Handling Out-of-Order Data Insertion in TimescaleDB

# Input Profiling

# The Window-Safety Calculation

# Worked Example

# Edge Cases and When to Deviate

# Verification

# Related & Navigation

Input Profiling

The Window-Safety Calculation

Worked Example

Edge Cases and When to Deviate

Verification

Related & Navigation