The contract¶

A consumer does not care how Modak places data. They care about one promise: ingest data anytime, anywhere on the time axis, and update or delete it the same way, whether it landed yesterday or two years ago. This page states that promise per mode and names the path each operation takes. Everything here follows from two facts: the heap takes plain DML, and anything only the lake holds goes through modak.delta or bulk ingest.

What each mode stores¶

	Postgres holds	Iceberg holds	Overlap
Tiered	recent partitions (`tier_key >= T`)	everything below `T`, minus lake retention	none, the cut-line splits exactly
Tiered + keep-heap	everything	everything below `T`	everything below `T`, the delta reconciles
Fully mirrored	everything	trailing full copy (frontier `F`)	total, the lake trails by the CDC lag
Mirrored + heap retention	a bounded window	full history	the window, once mirrored

Operations¶

Every cell is a supported path today.

Operation	Tiered	Tiered + keep-heap	Fully mirrored	Mirrored + heap retention
Insert, recent	plain `INSERT`	plain `INSERT`	plain `INSERT`	plain `INSERT`
Insert, historical	plain `INSERT` (extension) or `modak_upsert()` or a connector, to the delta	plain `INSERT`, trigger-mirrored to the delta	plain `INSERT`	same as tiered
Update, recent	plain `UPDATE`	plain `UPDATE`	plain `UPDATE`	plain `UPDATE`
Update, historical	plain `UPDATE` (extension) or `modak_upsert()` or a connector, to the delta	plain `UPDATE`, trigger-mirrored	plain `UPDATE`	same as tiered
Delete, recent	plain `DELETE`	plain `DELETE`	plain `DELETE`	plain `DELETE`
Delete, historical	plain `DELETE` (extension) or `modak_delete()` or a connector, tombstone to the delta	plain `DELETE`, trigger-mirrored	plain `DELETE`	same as tiered
Bulk historical load	`modak-worker ingest` (Parquet or records), upsert semantics	plain `COPY` to the heap (the trigger mirrors row by row)	plain `COPY`	`modak-worker ingest`
Continuous labeled batches	Stream Load, routed per row, exactly once per label	Stream Load for recent rows (historical rows land in the delta only, not the heap)	Stream Load, all to the heap	Stream Load
Read	one seam-split view	one seam-split view	heap, or opt-in hybrid	one seam-split view

"Historical" means rows the heap no longer holds: below T on a tiered table, below the drop boundary with heap retention. On a fully mirrored table the heap holds everything, so the distinction disappears and every write is plain DML. A keep-heap table's heap also holds everything, so its historical writes are plain heap DML too: the cold-mirror trigger on tiered partitions carries each change into modak.delta, and lake-side paths (ingest, historical Stream Load rows) should be avoided because they bypass the heap copy.

The one hard boundary is explicit expiry. A tiered table with --lake-retention has deleted rows below the retention line R from the lake on purpose, and every write path rejects rows below it rather than silently resurrecting data the policy removed.

API surfaces¶

Capability	SQL + extension	SQL, no extension	Spark	HTTP / library	CLI
Consistent two-tier read	transparent, any query	recent data only	`ModakSpark.read`	-	-
Routed insert	transparent, plain `INSERT` or `COPY`	recent data only	`ModakSpark.write`	Stream Load, per row	-
Routed update / delete	transparent, plain `UPDATE` / `DELETE`	recent data only	`ModakSpark.write` / `ModakSpark.delete`	Stream Load (upsert)	-
Bulk historical load	-	-	-	-	`modak-worker ingest`
Labeled exactly-once batches	-	-	-	Stream Load	-

Without the extension, plain SQL still covers everything on a fully mirrored table and all recent data elsewhere. Historical writes need one of the routed surfaces. The seam protocol is public, so any engine can implement what Spark implements.

Transparent DML has a short list of shapes it rejects loudly instead of routing, such as INSERT ... RETURNING for cold rows and UPDATE ... FROM that may reach them. The SQL reference lists them all.

To see which path a specific statement takes without running it, pass it to modak_explain. The console's SQL playground has an Explain button that shows the same report, and SET modak.explain = on makes a session narrate every routing decision as a NOTICE.

One write semantic for cold data¶

Every historical write path, row-at-a-time delta or bulk ingest, has the same meaning: upsert by primary key, newest wins, deletes as tombstones. The two paths differ only in shape and visibility. The delta is for row-scale corrections and is readable the moment it commits. Bulk ingest is for volume, readable when the pinned snapshot advances. Nobody chooses between "importing" and "correcting". Re-ingesting a corrected batch is just ingesting it.

Who folds the delta¶

Delta rows are visible to every reader the moment they commit, and the fold into Iceberg is a background concern. On tiered tables the worker folds each cycle. On mirrored tables with heap retention the mirror pump folds between replication batches, keeping one writer per lake table. Fully mirrored tables have nothing to fold.