Advanced Configuration#
Advanced configuration features including multiple config files, merging, and validation.
Multiple Configuration Files#
You can use multiple configuration files for different environments or scenarios:
configs/
├── config.yaml # Default config
├── config_dev.yaml # Development config
├── config_prod.yaml # Production config
└── config_large.yaml # Large dataset config
Using Different Configs#
Specify the config file via --config flag:
# Use default config
python pipeline.py
# Use specific config
python pipeline.py --config configs/config_prod.yaml
Example: Environment-Specific Configs#
Development config (configs/config_dev.yaml):
stages:
enabled_stages:
- create_test_data
- process_data
pipeline:
mode: "dev"
build_folder: "//tmp/my_pipeline/dev/build"
Production config (configs/config_prod.yaml):
stages:
enabled_stages:
- create_input
- process_data
- validate_output
- upload_results
pipeline:
mode: "prod"
build_folder: "//home/production/my_pipeline/build"
See Example: 08_multiple_configs for a complete example.
Configuration Merging#
The framework uses OmegaConf for configuration management, which supports:
Variable interpolation: Reference other config values
Environment variable substitution: Use
${ENV_VAR}syntaxConfig composition: Merge multiple config files
Variable Interpolation#
Reference other config values within the same file:
base_path: //tmp/my_pipeline
client:
input_table: ${base_path}/input
output_table: ${base_path}/output
Environment Variables#
Use environment variables in configuration:
pipeline:
build_folder: ${BUILD_FOLDER:://tmp/default/build}
Uses BUILD_FOLDER environment variable if set, otherwise defaults to //tmp/default/build.
Config Composition#
Merge multiple config files:
from omegaconf import OmegaConf
base_config = OmegaConf.load("configs/config.yaml")
override_config = OmegaConf.load("configs/config_prod.yaml")
merged = OmegaConf.merge(base_config, override_config)
Configuration Validation#
The framework validates configuration at runtime:
Required fields: Missing required fields raise errors
Type checking: Incorrect types raise errors
Path validation: Invalid YT paths raise errors
Common Configuration Errors#
Missing
enabled_stages: Pipeline won’t know which stages to runMissing
build_folder: Required for stages withsrc/directoryInvalid stage names: Stage names must match directory names
Missing stage config: Each stage must have
config.yaml
Validation Examples#
Missing required field:
# ❌ Missing enabled_stages
stages: {}
# ✅ Correct
stages:
enabled_stages:
- my_stage
Invalid path:
# ❌ Invalid YT path (missing //)
pipeline:
build_folder: "tmp/build"
# ✅ Correct
pipeline:
build_folder: "//tmp/build"
Configuration Inheritance#
Stage configs inherit from pipeline config where applicable:
# configs/config.yaml
pipeline:
mode: "prod"
build_folder: "//tmp/my_pipeline/build"
# stages/my_stage/config.yaml
client:
operations:
map:
# Inherits build_folder from pipeline config
input_table: //tmp/my_pipeline/input
output_table: //tmp/my_pipeline/output
Configuration Debugging#
Print Configuration#
from omegaconf import OmegaConf
# Print full config
print(OmegaConf.to_yaml(self.config))
# Print specific section
print(OmegaConf.to_yaml(self.config.client.operations.map))
Validate Configuration#
# Check if key exists
if "build_folder" in self.config.pipeline:
build_folder = self.config.pipeline.build_folder
# Get with default
build_folder = self.config.pipeline.get("build_folder", "//tmp/default")
See Also#
Configuration Guide - Basic configuration
Secrets Management - Managing credentials
Troubleshooting - Configuration issues