Troubleshooting#
Symptoms grouped by area. Start from the section that matches where the failure shows up (driver config vs job logs).
Sections#
Pipeline — discovery, ordering,
enabled_stagesOperations — map, vanilla, YQL, S3, checkpoints, Docker
Configuration — dev vs prod, secrets, paths
Debugging — logging, reproducing, asking for help
How failures cluster#
YAML / OmegaConf — missing keys, wrong types, bad table paths
Filesystem — missing
stage.py, wrong working directoryOperations — mapper stderr, YQL pragma limits, S3 403
Environment — Python version, missing pip package in cluster image
Mode mismatch — code that only works locally, or vice versa
Fast checks#
Symptom |
First look |
|---|---|
Pipeline exits before stages |
|
Stage raises immediately |
Stage |
Cluster job fails |
Task stderr in YT UI, Docker image packages |
Works in dev, fails in prod |
Credentials, |
If you are stuck#
Reproduce in dev with the smallest table or stdin fixture.
Capture full stderr from the failing task (prod) or
.devlogs (dev).Compare cluster image package list against imports in uploaded code.