Examples
A few self-contained examples to get you started quickly.
The quickstarts use local files so they run without external services. The pipeline and connector snippets also show the intended deployable posture: the same connector shapes can point at remote object storage and managed databases with credentials injected by the shell, CI runner, container runtime, or scheduler.
Quickstarts
Python
Run a tiny ETL in Python using the sample data shipped in this repo.
python examples/quickstart.py
What it does:
Extracts
examples/data/sample.jsonFilters to
age > 25and selectsname+emailValidates fields exist
Writes
temp/sample_output.json
CLI
Try similar steps with the CLI:
# Show version and help
etlplus --version
etlplus --help
# Transform the sample data and write JSON
etlplus transform \
--operations '{"filter": {"field": "age", "op": "gt", "value": 25}, "select": ["name", "email"]}' \
examples/data/sample.json \
temp/sample_output.json
Pipelines
For larger workflows, author a pipeline YAML and run it with the built-in etlplus run command, or
integrate the same config into your own orchestration or helper script.
Authoring: see the Pipeline Authoring Guide at
docs/pipeline-guide.mdand the exampleexamples/configs/pipeline.yml.Runner internals and Python entrypoint: see
etlplus.ops.rundocstrings anddocs/pipeline-guide.md.
Python
from etlplus.ops.run import run as run_job
result = run_job(
job="file_to_file_customers",
config_path="examples/configs/pipeline.yml",
)
print(result["status"], result.get("records"))
CLI
# List jobs defined in a pipeline file
etlplus check --config examples/configs/pipeline.yml --jobs
# Show a pipeline summary (name, version, sources, targets, jobs)
etlplus check --config examples/configs/pipeline.yml --summary
# Run a specific job end-to-end
etlplus run --config examples/configs/pipeline.yml --job file_to_file_customers
Cloud Database Connector Snippets
Use the same additive type: database connector shape for both cloud database providers. Local
Postgres, SQLite, and localhost DSNs are useful development fixtures, but managed databases with
runtime-injected credentials are the expected production path.
BigQuery
pip install -e ".[database-bigquery]"
sources:
- name: warehouse_events_bigquery
type: database
provider: bigquery
project: analytics-project
dataset: warehouse
table: events
Typical runtime environment:
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/gcloud/etlplus.json"
export GOOGLE_CLOUD_PROJECT="analytics-project"
Snowflake
pip install -e ".[database-snowflake]"
sources:
- name: warehouse_events_snowflake
type: database
provider: snowflake
account: acme.us-east-1
database: ANALYTICS
schema: PUBLIC
warehouse: TRANSFORMING
table: EVENTS
Typical runtime environment:
export SNOWFLAKE_USER="etlplus"
export SNOWFLAKE_PASSWORD="${SNOWFLAKE_PASSWORD}"
Remote Object Storage Snippets
Use the same type: file connector shape for both local paths and remote object storage.
Local paths are intentionally convenient for quickstarts; remote object storage should feel like the
normal target for shared or scheduled pipelines.
sources:
- name: landing_customers
type: file
format: csv
path: "s3://acme-landing/customers/customers.csv"
targets:
- name: curated_customers
type: file
format: json
path: "azure-blob://analytics/customers/curated/customers.json"
Typical runtime environment:
export AWS_PROFILE="etlplus-dev"
export AZURE_STORAGE_ACCOUNT_URL="https://analytics.blob.core.windows.net"
export AZURE_STORAGE_CREDENTIAL="${AZURE_STORAGE_CREDENTIAL}"
Both connector shapes can also use connection_string directly when you already have a
provider-specific SQLAlchemy-style DSN.
Design notes on config typing and merges:
Mapping inputs, dict outputs, and merge semantics are documented in
docs/pipeline-guide.md#design-notes-mapping-inputs-dict-outputs.Typing philosophy (TypedDicts as editor hints, permissive runtime):
CONTRIBUTING.md#typing-philosophy.