Skip to content

Choosing a mode

Every registered table picks one of four modes, and the right one follows from the shape of the data. This page is the decision. The definitions live in Concepts, and the full operation matrix in The contract.

Start from the shape of the data

Entity data (vehicles, users, accounts, catalogs): register it fully mirrored. These tables have no aging axis, rows update in place forever, and any table with a primary key qualifies. Postgres keeps the whole copy and takes plain DML with no routing rules, while CDC trails every change into the lake for analytics. Reads stay plain heap scans and cost exactly what they did before Modak.

Time-series and event data (telemetry, logs, clickstreams, transactions): usually Postgres should hold only a recent window, so the choice is between tiered and mirrored with heap retention. Both need PARTITION BY RANGE on a bigint tier key, both drop old heap partitions, and both read as one seam-split view. What differs is how rows travel to the lake. And when Postgres should keep everything anyway, tiered keep-heap moves batches to the lake without dropping the heap copy.

Tiered or mirrored with heap retention

Tiered moves data in batches. Whole partitions behind the tiering lag are written to Iceberg in bulk and dropped from the heap. There is no replication slot, no WAL decoding, and no per-row overhead, which makes it the cheaper choice as volume grows. Corrections to rows that already moved cold land in modak.delta. Lake freshness equals the tiering lag, and it is the only mode with bounded total history (--lake-retention).

Mirrored with heap retention moves data by CDC. A replication pump trails every change into the lake row by row, so the lake is behind by seconds rather than by the tiering lag, and every intermediate version of a row is captured. That freshness has a price: a logical replication slot, REPLICA IDENTITY FULL, WAL headroom while the pump is down, and one pump decoding every change the table makes. At high ingest volume that per-row decode is the bottleneck tiered does not have.

So the tiebreaker is freshness versus volume. If lake consumers can wait for the tiering lag, tier it. If the lake must trail in near real time, or recent rows are updated so often that the delta would work overtime, mirror with heap retention and pay the CDC cost.

Tiered keep-heap: batches to the lake, nothing deleted

Sometimes the heap copy should stay whole. Postgres comfortably holds the volume, plain DML on any row is worth keeping, but analytics still belongs in the lake and the seam view should push old scans there. --keep-heap is tiered with the drop turned off: partitions are copied to Iceberg in batches (no replication slot, no per-row WAL decode) and the cut-line advances, but heap partitions are never dropped. A row trigger on tiered partitions mirrors any later change into modak.delta, so plain INSERT/UPDATE/DELETE works everywhere, exactly like a mirrored table, while reads split at the seam.

Compared to fully mirrored, keep-heap trades lake freshness (the tiering lag, not seconds) for batch economics on high-volume streams. It requires range partitioning like any tiered table, and it excludes --lake-retention, because keep-heap means nothing is deleted anywhere.

flowchart TD
    startNode["New table"] --> aging{"Does the data age along a tier key?"}
    aging -->|"No, entity data"| mirrored["Fully mirrored"]
    aging -->|"Yes, time series"| keep{"Must Postgres keep the full copy?"}
    keep -->|"Yes"| keepheap["Tiered + keep-heap"]
    keep -->|"No"| window{"Lake must trail in seconds?"}
    window -->|"No, the tiering lag is fine"| tiered["Tiered"]
    window -->|"Yes"| retention["Mirrored + heap retention"]

Side by side

Tiered Tiered + keep-heap Fully mirrored Mirrored + heap retention
Fits high-volume time series time series Postgres keeps whole entities and dimensions time series needing a fresh lake
Postgres holds recent partitions everything everything a bounded window
Lake freshness the tiering lag the tiering lag seconds (CDC) seconds (CDC)
Write cost at scale bulk partition moves bulk partition moves per-row WAL decode per-row WAL decode
Historical writes routed to the delta plain DML, trigger-mirrored plain DML routed to the delta
Prerequisites range partitioning range partitioning a primary key range partitioning, a slot
Bounded history yes (--lake-retention) no no no
Reads seam-split seam-split plain heap, opt-in hybrid seam-split

Changing your mind

Modes are set at registration. To switch, modak-worker unregister (the lake table survives by default) and register again in the new mode. See Day-2 operations.