Advanced configuration#
Extra YAML files, OmegaConf features, and how to debug merged config.
Multiple files#
configs/
├── config.yaml
├── config_dev.yaml
├── config_prod.yaml
└── config_large.yaml
Pick one at launch:
python pipeline.py
python pipeline.py --config configs/config_prod.yaml
Example split:
configs/config_dev.yaml:
stages:
enabled_stages:
- create_test_data
- process_data
pipeline:
mode: "dev"
build_folder: "//tmp/my_pipeline/dev/build"
configs/config_prod.yaml:
stages:
enabled_stages:
- create_input
- process_data
- validate_output
- upload_results
pipeline:
mode: "prod"
build_folder: "//home/production/my_pipeline/build"
Sample repo layout: 08_multiple_configs.
OmegaConf features#
The framework uses OmegaConf. You can lean on:
Interpolation inside YAML:
base_path: //tmp/my_pipeline
client:
input_table: ${base_path}/input
output_table: ${base_path}/output
Environment substitution (OmegaConf 2.x
oc.envresolver; enable resolvers where you construct config if needed):
pipeline:
build_folder: ${oc.env:BUILD_FOLDER,//tmp/default/build}
If you do not use OmegaConf resolvers in your merge path, set build_folder in the YAML file you pass to --config or export values and generate YAML in CI.
Manual merge in tooling or tests:
from omegaconf import OmegaConf
base = OmegaConf.load("configs/config.yaml")
override = OmegaConf.load("configs/config_prod.yaml")
merged = OmegaConf.merge(base, override)
Validation the framework performs#
Expect hard failures when:
enabled_stagesmissing or emptybuild_foldermissing while a stage needs uploadStage name in YAML does not match a
stages/<name>/directoryconfig.yamlmissing for an enabled stageMalformed YT paths where the client validates them
Examples#
Missing stages list:
# bad
stages: {}
# good
stages:
enabled_stages:
- my_stage
Bad build path (must be absolute YT style with // where required):
# bad
pipeline:
build_folder: "tmp/build"
# good
pipeline:
build_folder: "//tmp/build"
“Inheritance” between pipeline and stage YAML#
There is no automatic deep merge of arbitrary keys: the pipeline file supplies pipeline.* and the list of stages; each stage file supplies that stage’s subtree. Some operation helpers read pipeline.build_folder implicitly—see operation docs for the precise merge rules per operation type.
Debugging config#
Dump what the stage sees:
from omegaconf import OmegaConf
print(OmegaConf.to_yaml(self.config))
print(OmegaConf.to_yaml(self.config.client.operations.map))
Guard optional keys:
if "build_folder" in self.deps.pipeline_config.pipeline:
bf = self.deps.pipeline_config.pipeline.build_folder