Choosing a mode¶

Every registered table picks one of four modes, and the right one follows from the shape of the data. This page is the decision. The definitions live in Concepts, and the full operation matrix in The contract.

Start from the shape of the data¶

Entity data (vehicles, users, accounts, catalogs): register it fully mirrored. These tables have no aging axis, rows update in place forever, and any table with a primary key qualifies. Postgres keeps the whole copy and takes plain DML with no routing rules, while CDC trails every change into the lake for analytics. Reads stay plain heap scans and cost exactly what they did before Modak.

Time-series and event data (telemetry, logs, clickstreams, transactions): usually Postgres should hold only a recent window, so the choice is between tiered and mirrored with heap retention. Both need PARTITION BY RANGE on a bigint tier key, both drop old heap partitions, and both read as one seam-split view. What differs is how rows travel to the lake. And when Postgres should keep everything anyway, tiered keep-heap moves batches to the lake without dropping the heap copy.

Tiered or mirrored with heap retention¶

Tiered moves data in batches. Whole partitions behind the tiering lag are written to Iceberg in bulk and dropped from the heap. There is no replication slot, no WAL decoding, and no per-row overhead, which makes it the cheaper choice as volume grows. Corrections to rows that already moved cold land in modak.delta. Lake freshness equals the tiering lag, and it is the only mode with bounded total history (--lake-retention).

Mirrored with heap retention moves data by CDC. A replication pump trails every change into the lake row by row, so the lake is behind by seconds rather than by the tiering lag, and every intermediate version of a row is captured. That freshness has a price: a logical replication slot, REPLICA IDENTITY FULL, WAL headroom while the pump is down, and one pump decoding every change the table makes. At high ingest volume that per-row decode is the bottleneck tiered does not have.

So the tiebreaker is freshness versus volume. If lake consumers can wait for the tiering lag, tier it. If the lake must trail in near real time, or recent rows are updated so often that the delta would work overtime, mirror with heap retention and pay the CDC cost.

Tiered keep-heap: batches to the lake, nothing deleted¶

Sometimes the heap copy should stay whole. Postgres comfortably holds the volume, plain DML on any row is worth keeping, but analytics still belongs in the lake and the seam view should push old scans there. --keep-heap is tiered with the drop turned off: partitions are copied to Iceberg in batches (no replication slot, no per-row WAL decode) and the cut-line advances, but heap partitions are never dropped. A row trigger on tiered partitions mirrors any later change into modak.delta, so plain INSERT/UPDATE/DELETE works everywhere, exactly like a mirrored table, while reads split at the seam.

Compared to fully mirrored, keep-heap trades lake freshness (the tiering lag, not seconds) for batch economics on high-volume streams. It requires range partitioning like any tiered table, and it excludes --lake-retention, because keep-heap means nothing is deleted anywhere.

flowchart TD
    startNode["New table"] --> aging{"Does the data age along a tier key?"}
    aging -->|"No, entity data"| mirrored["Fully mirrored"]
    aging -->|"Yes, time series"| keep{"Must Postgres keep the full copy?"}
    keep -->|"Yes"| keepheap["Tiered + keep-heap"]
    keep -->|"No"| window{"Lake must trail in seconds?"}
    window -->|"No, the tiering lag is fine"| tiered["Tiered"]
    window -->|"Yes"| retention["Mirrored + heap retention"]

Side by side¶

	Tiered	Tiered + keep-heap	Fully mirrored	Mirrored + heap retention
Fits	high-volume time series	time series Postgres keeps whole	entities and dimensions	time series needing a fresh lake
Postgres holds	recent partitions	everything	everything	a bounded window
Lake freshness	the tiering lag	the tiering lag	seconds (CDC)	seconds (CDC)
Write cost at scale	bulk partition moves	bulk partition moves	per-row WAL decode	per-row WAL decode
Historical writes	routed to the delta	plain DML, trigger-mirrored	plain DML	routed to the delta
Prerequisites	range partitioning	range partitioning	a primary key	range partitioning, a slot
Bounded history	yes (`--lake-retention`)	no	no	no
Reads	seam-split	seam-split	plain heap, opt-in hybrid	seam-split

Changing your mind¶

Modes are set at registration. To switch, modak-worker unregister (the lake table survives by default) and register again in the new mode. See Day-2 operations.