Advanced Configuration#

Advanced configuration features including multiple config files, merging, and validation.

Multiple Configuration Files#

You can use multiple configuration files for different environments or scenarios:

configs/
├── config.yaml          # Default config
├── config_dev.yaml      # Development config
├── config_prod.yaml     # Production config
└── config_large.yaml    # Large dataset config

Using Different Configs#

Specify the config file via --config flag:

# Use default config
python pipeline.py

# Use specific config
python pipeline.py --config configs/config_prod.yaml

Example: Environment-Specific Configs#

Development config (configs/config_dev.yaml):

stages:
  enabled_stages:
    - create_test_data
    - process_data

pipeline:
  mode: "dev"
  build_folder: "//tmp/my_pipeline/dev/build"

Production config (configs/config_prod.yaml):

stages:
  enabled_stages:
    - create_input
    - process_data
    - validate_output
    - upload_results

pipeline:
  mode: "prod"
  build_folder: "//home/production/my_pipeline/build"

See Example: 08_multiple_configs for a complete example.

Configuration Merging#

The framework uses OmegaConf for configuration management, which supports:

  • Variable interpolation: Reference other config values

  • Environment variable substitution: Use ${ENV_VAR} syntax

  • Config composition: Merge multiple config files

Variable Interpolation#

Reference other config values within the same file:

base_path: //tmp/my_pipeline

client:
  input_table: ${base_path}/input
  output_table: ${base_path}/output

Environment Variables#

Use environment variables in configuration:

pipeline:
  build_folder: ${BUILD_FOLDER:://tmp/default/build}

Uses BUILD_FOLDER environment variable if set, otherwise defaults to //tmp/default/build.

Config Composition#

Merge multiple config files:

from omegaconf import OmegaConf

base_config = OmegaConf.load("configs/config.yaml")
override_config = OmegaConf.load("configs/config_prod.yaml")
merged = OmegaConf.merge(base_config, override_config)

Configuration Validation#

The framework validates configuration at runtime:

  • Required fields: Missing required fields raise errors

  • Type checking: Incorrect types raise errors

  • Path validation: Invalid YT paths raise errors

Common Configuration Errors#

  1. Missing enabled_stages: Pipeline won’t know which stages to run

  2. Missing build_folder: Required for stages with src/ directory

  3. Invalid stage names: Stage names must match directory names

  4. Missing stage config: Each stage must have config.yaml

Validation Examples#

Missing required field:

# ❌ Missing enabled_stages
stages: {}

# ✅ Correct
stages:
  enabled_stages:
    - my_stage

Invalid path:

# ❌ Invalid YT path (missing //)
pipeline:
  build_folder: "tmp/build"

# ✅ Correct
pipeline:
  build_folder: "//tmp/build"

Configuration Inheritance#

Stage configs inherit from pipeline config where applicable:

# configs/config.yaml
pipeline:
  mode: "prod"
  build_folder: "//tmp/my_pipeline/build"

# stages/my_stage/config.yaml
client:
  operations:
    map:
      # Inherits build_folder from pipeline config
      input_table: //tmp/my_pipeline/input
      output_table: //tmp/my_pipeline/output

Configuration Debugging#

Validate Configuration#

# Check if key exists
if "build_folder" in self.config.pipeline:
    build_folder = self.config.pipeline.build_folder

# Get with default
build_folder = self.config.pipeline.get("build_folder", "//tmp/default")

See Also#