Skip to content

Registering tables

Registration is the onboarding step. It records the table in the modak.* catalog, creates its Iceberg counterpart, and (for mirrored tables) sets up CDC. It runs through the worker binary's CLI, the same jar as the daemon:

modak-worker register --table <schema.table> --pk <col>[,<col>...] --tier-key <col> \
                      [--mode tiered|mirrored] [--heap-retention <n>] [--lake-retention <n>] \
                      [--keep-heap] [--chunk-rows <n>] [--partition-width <n>] [--profile <name>]

--profile places the table's lake on a named storage profile (a different bucket, account, or provider). Omitted, the table lands on the default profile: the worker's own warehouse configuration.

Picking the mode

Undecided? Choosing a mode walks the decision from the shape of the data, and The contract states the full operation and API matrix per mode. In short: tiered (the default) for append-mostly time series, tiered --keep-heap when the heap should keep its full copy anyway, mirrored for entity tables, and mirrored --heap-retention N for time series whose lake copy must trail by CDC.

Tiered and heap-retention tables require PARTITION BY RANGE on a bigint tier key. Mirrored tables require a primary key, and composite keys work in both modes: --pk tenant_id,device_id.

The full lifecycle

For a tiered table, data moves through three stages. Recent rows live in the heap. Once a partition falls MODAK_TIERING_LAG behind the high-water mark it is tiered into Iceberg and the heap partition is dropped. And if the table was registered with --lake-retention N, lake rows that fall N tier-key units behind the cut-line are expired entirely: heap, then lake, then gone.

Retention is pin-aware like every other pass: it never runs while a reader holds a pin, the boundary is aligned to the partition width so the Iceberg delete removes whole files without rewriting, and it never passes a partition whose heap rows still exist. The current boundary is retention_line in modak.status. Corrections (modak_upsert/modak_delete) targeting rows below the line are rejected, since there is nothing left to correct.

Retention is tiered-only. A mirrored table's heap drop relies on the lake holding full history, so the two retention flags exclude each other.

Keep-heap

--keep-heap (tiered-only) turns off the drop stage. Partitions still tier into Iceberg and the cut-line still advances, but every heap partition rests at TIERED and keeps its rows. When a partition tiers, the worker attaches the extension's cold-mirror trigger to it, so plain DML below the cut-line keeps flowing into modak.delta and from there into the lake. Because keep-heap means nothing is deleted anywhere, it excludes --lake-retention, and modak-worker verify gains a heap-vs-lake comparison below the cut-line, since the heap stays complete and comparable.

Future partitions

The daemon premakes heap partitions ahead of the write frontier, so inserts never fail for lack of a partition. Each cycle it keeps at least MODAK_PREMAKE_PARTITIONS (default 2) empty partition widths between the table's max(tier_key) and the top partition bound, inferring the width from the topmost existing partition. Operators create the first partitions at CREATE TABLE time, and the worker takes it from there.

What tiered registration does

  1. Creates the Iceberg table (create-if-absent, so re-running is safe) and seeds its metadata_location in the catalog, so pinned reads work before the first commit.
  2. Initializes the cut-line at the oldest partition floor.
  3. Mirrors the table's pg_inherits children into modak.partitions. From then on you just CREATE TABLE ... PARTITION OF and the worker picks new partitions up on its next cycle.

What mirrored registration does

  1. Sets REPLICA IDENTITY FULL on the table (deletes and TOAST updates need the old row image).
  2. Creates a publication and a logical replication slot. The slot's consistent point marks where streaming will start.
  3. Copies existing rows to Iceberg in PK-ordered chunks (--chunk-rows, default 50000), each chunk journaled in modak.copy_progress.

The copy is fully resumable. If it dies, re-run the same register command and it continues from the last journaled chunk. Streaming then starts at the slot's consistent point, so rows changed during the copy are healed by the idempotent fold, with no gap and no duplicates. Until the copy lands the table has no mirror frontier and the daemon skips it.

Iceberg partitioning

New Iceberg tables are laid out as truncate(tier_key, width), one lake partition per width-sized tier-key band, so lake engines and maintenance can prune by tier key. Tiered tables infer the width from the first Postgres range partition. Mirrored and unpartitioned tables stay unpartitioned unless you pass --partition-width explicitly (0 forces unpartitioned). The spec applies at table creation only. Re-registering does not rewrite an existing layout.

Permissions

Registration alters the table, creates a publication, and creates a replication slot, so run it as a role with table ownership and REPLICATION. See Production deployment for the exact grants.