Scheduling

ETLPlus supports portable schedule definitions in pipeline config and keeps recurring invocation separate from execution.

Use etlplus schedule to inspect configured schedules, emit crontab or systemd helper snippets, or dispatch currently due schedules once with --run-pending.

Example Configuration

See examples/configs/scheduling.yml for a complete example. The relevant schedule shape is:

history:
  enabled: true
  backend: sqlite

schedules:
  - name: hourly_sync
    interval:
      minutes: 60
    target:
      job: sync_customers
    backfill:
      enabled: true
      max_catchup_runs: 2
      start_at: "2026-05-01T00:00:00Z"

  - name: nightly_all
    cron: "0 2 * * *"
    timezone: UTC
    target:
      run_all: true

  - name: paused_rebuild
    cron: "30 6 * * 1"
    paused: true
    target:
      job: sync_customers

Schedule fields:

  • cron: five-field cron expression for calendar-based schedules.

  • interval.minutes: fixed-minute interval schedule.

  • target.job or target.run_all: select one job or the full DAG run.

  • paused: keep the schedule defined without dispatching it.

  • backfill.enabled: allow bounded catch-up when triggers were missed.

  • backfill.max_catchup_runs: limit how many missed runs can dispatch in one pass.

  • backfill.start_at: earliest timestamp eligible for bounded replay.

Inspect And Emit Helpers

Inspect all configured schedules:

etlplus schedule --config examples/configs/scheduling.yml

Inspect schedules plus persisted local scheduler state:

etlplus schedule --config examples/configs/scheduling.yml --show-state

Typical --show-state output for one healthy schedule includes the last attempted trigger, the last completed trigger, and the last recorded run id/status. A schedule that recently failed before completion will also show last_error_type and last_error_message until the next successful completion clears those diagnostics.

Emit a crontab snippet for one schedule:

etlplus schedule \
  --config examples/configs/scheduling.yml \
  --schedule nightly_all \
  --emit crontab

Emit a systemd timer/service pair for one schedule:

etlplus schedule \
  --config examples/configs/scheduling.yml \
  --schedule hourly_sync \
  --emit systemd

These helper outputs keep recurring invocation delegated to OS tooling or CI while ETLPlus owns the portable schedule model.

Run Due Schedules Once

Dispatch all currently due schedules one time:

etlplus schedule --config examples/configs/scheduling.yml --run-pending

Forward structured lifecycle events from the underlying etlplus run invocations:

etlplus schedule \
  --config examples/configs/scheduling.yml \
  --run-pending \
  --event-format jsonl

When a bounded catch-up batch stops early because one due run raises before completion, the command returns a nonzero exit code with a partial JSON summary. That summary includes additive due_count, attempted_count, completed_count, and pending_count fields plus pending_runs for due triggers left eligible for replay.

Overlap-skipped runs are also reported in pending_runs with reason: overlap. They are not consumed; the next invocation can dispatch them after the existing per-schedule lock is released.

This mode is intentionally one-shot. The expected operating model is to invoke it from cron, systemd, CI, or another external trigger rather than to keep a resident ETLPlus scheduler process running continuously.

Observability And State

--run-pending reuses the existing etlplus run execution path.

That means scheduled runs keep the same stable contracts:

  • Lifecycle events still use etlplus.event.v1

  • Local history still records the run through the same SQLite-default or JSONL fallback backend

  • Additive scheduler metadata is attached to events and persisted under result_summary.scheduler

The local scheduler also keeps minimal trigger state under ${ETLPLUS_STATE_DIR:-~/.etlplus}:

  • scheduler-state.json stores the last attempted and last completed trigger per schedule, plus the last recorded outcome metadata

  • scheduler-locks/ prevents overlapping dispatch for the same schedule

  • Stale lock files left behind by dead scheduler processes are reclaimed automatically on the next matching dispatch attempt

Trigger consumption rules:

  • Overlapping schedules do not consume a due trigger and are listed as pending with reason: overlap

  • Paused schedules do not consume or create due triggers

  • Callback exceptions record an attempted trigger but leave the due time eligible for replay on the next invocation

  • Callback exceptions also persist the latest exception type and message summary in scheduler-state.json

  • The next successful completion for that schedule clears the stale exception diagnostics and keeps only the latest healthy completion metadata

  • Handled run outcomes that return normally consume the trigger and update the completed timestamp, even when the underlying run exits nonzero

Backing-Service Posture

Scheduling does not change ETLPlus’ backing-service model.

The same schedule surface can target local paths, managed databases, or remote object-storage URIs. The example config uses s3://... and azure-blob://... endpoints deliberately to show that remote backing services remain first-class.

Local filesystem paths, Docker Compose, localhost Postgres, or Adminer are still useful for development, but they should be treated as convenience tooling rather than the canonical operating model.