Skip to content

Configuration

The worker and console are configured entirely through environment variables. The same binary points at any deployment shape.

Connection

Env var Default Meaning
MODAK_PG_URL jdbc:postgresql://localhost:5432/postgres Catalog + hot tier (JDBC URL, passed verbatim, TLS params go here)
MODAK_PG_USER / MODAK_PG_PASSWORD postgres / empty Credentials
MODAK_WAREHOUSE /tmp/modak-warehouse Warehouse root (s3://... or a filesystem path)
MODAK_S3_ENDPOINT unset S3 endpoint. Unset = local filesystem or real AWS
MODAK_S3_ACCESS_KEY / MODAK_S3_SECRET_KEY unset S3 credentials (unset on AWS = default chain)
MODAK_S3_REGION empty S3 region
MODAK_S3_SSL false TLS to the S3 endpoint
MODAK_CREDENTIALS_<REF> unset Named credential set for storage profiles: key=value pairs, ;-separated, merged into the lake config of any profile created with --credentials <ref>. See Storage profiles

The MODAK_WAREHOUSE/MODAK_S3_* settings define the default warehouse. Deployments with more than one warehouse layer storage profiles on top.

Worker behaviour

Env var Default Meaning
MODAK_CYCLE_INTERVAL_SECONDS 10 Scheduler interval
MODAK_TIERING_LAG 0 Keep partitions hot until max(tier_key) - lag passes them
MODAK_RECLAIM_LAG same as tiering lag Extra (ceiling-based) lag before a tiered partition is dropped
MODAK_COMPACTION_BATCH 1000 Max delta rows folded per cycle
MODAK_MIRROR_BATCH 500 Mirror pump: rows per Iceberg commit
MODAK_MIRROR_FLUSH_MILLIS 2000 Mirror pump: max time before a partial batch commits
MODAK_MIRROR_MAX_BUFFERED_ROWS 100000 Memory bound: larger transactions fold intermediately (invisible to readers)
MODAK_DELTA_BACKLOG_WARN_ROWS 100000 Per-table delta backlog WARN threshold (ERROR at 4x)
MODAK_CAMPAIGN_INTERVAL_SECONDS 5 Standby retry interval for the leader lease
MODAK_SLOT_WARN_BYTES 1073741824 (1 GiB) Retained-WAL WARN threshold for the slot guard
MODAK_PREMAKE_PARTITIONS 2 Empty partition widths kept ahead of each table's write frontier. 0 disables premake

Lake maintenance

These are the worker-wide defaults. Any of the equivalent settings can be overridden per table with modak-worker policy, see Lake maintenance.

Env var Default Meaning
MODAK_MAINTENANCE_ENABLED true Fleet-wide default for the maintenance pass. Tables can re-enable or disable via policy
MODAK_MAINTENANCE_INTERVAL_SECONDS 3600 How often each table gets a maintenance pass
MODAK_MAINTENANCE_ENGINE embedded What executes maintenance plans. embedded runs in the worker, external engines are the extension point
MODAK_LAKE_STATS_INTERVAL_SECONDS 60 How often each table's lake health snapshot is refreshed
MODAK_REWRITE_TARGET_BYTES 134217728 (128 MiB) Data files smaller than this are bin-pack candidates
MODAK_REWRITE_MIN_INPUT_FILES 8 Small files that must accumulate before a rewrite runs
MODAK_SNAPSHOT_RETENTION_HOURS 24 Snapshots older than this are expirable
MODAK_SNAPSHOT_MIN_RETAINED 5 Snapshots always kept, regardless of age

Endpoints

Env var Default Meaning
MODAK_METRICS_PORT unset Headless worker: Prometheus /metrics port. Unset = no endpoint
MODAK_CONSOLE_PORT 9090 Console binary: the web console port (includes /metrics)
MODAK_CONSOLE_SQL true SQL playground. false disables the query endpoint
MODAK_LOAD_TOKEN unset Enables POST /api/load (see Stream load). Unset = no endpoint
MODAK_LOAD_SPOOL_THRESHOLD 1000 Cold rows per batch above which a load spools Parquet instead of the delta

Iceberg catalog

By default the lake is path-based: each table lives directly under MODAK_WAREHOUSE with no catalog service. Set MODAK_CATALOG_URI and tables are instead created and loaded through an Iceberg REST catalog as <namespace>.<schema>_<table>. The read path is identical either way: every commit publishes the table's metadata_location into modak.tables, and DuckDB scans that file directly.

Env var Default Meaning
MODAK_CATALOG_URI unset Iceberg REST catalog endpoint. Unset = path-based
MODAK_CATALOG_WAREHOUSE unset Warehouse location the catalog assigns to new tables
MODAK_CATALOG_TOKEN unset Bearer token, if the catalog requires one
MODAK_CATALOG_NAMESPACE modak Namespace for tables created through the catalog
MODAK_LAKE_FORMAT iceberg Lake format plugin id

Lake properties passthrough

Everything else a format supports flows through MODAK_LAKE_PROPS: semicolon-separated key=value pairs handed verbatim to the format plugin. The Iceberg plugin interprets three prefixes:

  • iceberg.catalog.<key>: an Iceberg catalog property (OAuth2, custom headers, io-impl, ...).
  • iceberg.table.<key>: a table property stamped at creation (write.*, commit.*, history.* families).
  • hadoop.<key>: passed to the Hadoop configuration backing file IO.
MODAK_LAKE_PROPS='iceberg.table.write.parquet.compression-codec=zstd;iceberg.catalog.oauth2-server-uri=https://idp/token'