Reading¶
Every read of a registered table resolves to the same shape, the hot heap above the cut-line unioned with the pinned Iceberg snapshot merged against the delta. There are three ways to ask for it.
Transparent reads (default)¶
With the extension preloaded (shared_preload_libraries = 'pg_duckdb, modak'),
a planner hook rewrites any plain SELECT that touches a registered table
(predicates, joins, aggregates, CTEs, subqueries) into the two-tier scan, and
pins (T, S) for the transaction:
SELECT * FROM public.events WHERE event_time < 100; -- just works, both tiers
The substitution follows Postgres' own view-expansion recipe, so locking and
permission checks still target the original relation. DML is never rewritten,
and SELECT ... FOR UPDATE keeps plain heap semantics.
SET modak.transparent_reads = off; -- restore raw heap semantics per session
The worker daemon always connects with transparent reads off, since it reasons about the physical heap.
Mirrored tables: heap or hybrid¶
A mirrored table's heap is complete, so plain scans are untouched by default. Reads cost exactly what they did before Modak. Sessions can opt into serving the bulk of a scan from the lake instead:
SET modak.mirrored_reads = 'hybrid';
A hybrid read first waits (bounded by modak.mirror_wait_ms, default 5000)
for the mirror frontier to pass the session's current WAL position, which
proves everything the query's snapshot can see is in the lake. It then splits
the union at max(tier_key) - modak.hybrid_lag. On timeout it falls back to
the heap with a NOTICE. Mirrored tables registered with retention always
read two-tier, because the heap below the retention line is gone.
The explicit protocol¶
What the hook does implicitly, you can drive by hand. This is useful for tooling, or for debugging exactly what a query sees:
BEGIN;
SELECT pin_id FROM modak_read_begin('public.events'::regclass) \gset
SELECT modak_rewrite_scan('public.events'::regclass) AS scan_sql \gset
-- run :scan_sql, e.g. SELECT count(*) FROM ( :scan_sql ) q;
SELECT modak_read_end(:pin_id);
COMMIT;
modak_read_begin pins (T, S) and returns the pin. modak_rewrite_scan
renders the exact union SQL for the pinned view. modak_read_end releases the
pin, and abort releases it automatically, since pins are rows in
modak.read_pins and roll back with the transaction.
The same contract works from outside Postgres. Any engine that can read the catalog and scan Iceberg at a pinned snapshot can produce the identical consistent view. The seam protocol page specifies it.
Execution¶
The cold branch runs in DuckDB via pg_duckdb (iceberg_scan on the pinned
metadata_location). The hot branch is a plain heap scan. DuckDB never
decides what is current. Modak resolves consistency before execution and hands
DuckDB a fully specified plan.
Session GUCs¶
| GUC | Default | Meaning |
|---|---|---|
modak.transparent_reads |
on |
Rewrite SELECTs on registered tables to span both tiers |
modak.mirrored_reads |
'heap' |
'hybrid' opts into two-tier reads on mirrored (no-retention) tables |
modak.mirror_wait_ms |
5000 |
Bounded wait for the mirror frontier before a hybrid read |
modak.hybrid_lag |
0 |
Hybrid seam margin, in tier-key units, kept on the heap side |