YT Cluster Requirements#
When running pipelines in production mode, code from the ytjobs package executes on YT cluster nodes. This means the cluster’s Docker image (whether default or custom) must include all dependencies required by your ytjobs code.
Warning
Critical: Cluster Dependencies
Unlike local development, where dependencies are installed on your machine, production mode requires dependencies to be present in the cluster’s Docker image. Missing dependencies will cause job failures.
Why Cluster Dependencies Matter#
In production mode:
Code execution location: Your
ytjobscode runs on YT cluster nodes, not on your local machineDocker isolation: Each job runs in a Docker container on the cluster
Dependency availability: Only packages installed in the Docker image are available to your code
Python Version Requirement#
Minimum: Python 3.11+
The framework requires Python 3.11 or higher. Ensure your cluster’s Docker image includes Python 3.11 or newer. Lower versions are not guaranteed to work.
Core Dependencies#
These dependencies are required for basic ytjobs functionality:
ytsaurus-client#
Version: >= 0.13.0
Required for:
Checkpoint operations (
ytjobs.checkpoint)YT file system operations
Usage:
from ytjobs.checkpoint import save_checkpoint, load_checkpoint
Installation:
pip install ytsaurus-client>=0.13.0
boto3 and botocore#
Versions:
boto3 == 1.35.99botocore == 1.35.99(auto-installed with boto3)
Note: 1.35.xx version is fixed because it is possible to control how many pool connections are used by boto3 in this version.
Required for:
S3 operations (
ytjobs.s3)S3 file listing, downloading, uploading
Usage:
from ytjobs.s3 import S3Client
Installation:
pip install boto3==1.35.99
Optional Dependencies#
These dependencies are not strictly required but are recommended for optimal functionality:
omegaconf#
Version: >= 2.3.0
Recommended for:
Reading configuration YAML files (
config.yaml) passed to jobsOptimal way to load and access job configuration
Usage:
from omegaconf import OmegaConf
from ytjobs.config import get_config_path
config = OmegaConf.load(get_config_path())
# Access config values
value = config.job.some_setting
Installation:
pip install omegaconf>=2.3.0
Note: While not strictly required, omegaconf is the recommended way to read configuration files in your job code. Without it, you would need to manually parse YAML files using the standard library.
Dependency Breakdown by Module#
Core Modules (Standard Library Only)#
These modules require no external dependencies for basic functionality:
ytjobs.config- Configuration utilities (note:omegaconfrecommended for reading config files)ytjobs.logging- Logging utilitiesytjobs.mapper- Mapper utilities
Feature-Specific Modules#
Checkpoint module (ytjobs.checkpoint):
Requires:
ytsaurus-client >= 0.13.0
S3 module (ytjobs.s3):
Requires:
boto3 == 1.35.99,botocore == 1.35.99
Minimum Requirements for Full Functionality#
If you use all ytjobs features, your cluster Docker image must include:
Python >= 3.11
ytsaurus-client >= 0.13.0
boto3 == 1.35.99
botocore == 1.35.99
Recommended additions:
omegaconf >= 2.3.0 # Recommended for reading config files
Solutions#
You have two options to ensure dependencies are available:
Option 1: Default Cluster Image#
Ensure your YT cluster’s default Docker image includes all required dependencies.
Advantages:
No configuration needed
Works automatically for all pipelines
Consistent environment across teams
Disadvantages:
Requires cluster administrator access
May not be possible if you don’t control the cluster
All teams must agree on dependencies
How to check: Contact your cluster administrator to verify the default Docker image includes:
Python 3.11+
Required Python packages (ytsaurus-client, boto3, etc.)
Option 2: Custom Docker Images#
Always use custom Docker images for your pipelines that include the required dependencies.
Advantages:
Full control over dependencies
No need to modify cluster defaults
Can include additional dependencies as needed
Version pinning for reproducibility
Disadvantages:
Must specify
docker_imagein each operation configRequires Docker image building and registry access
How to use: See Custom Docker Images for complete guide on creating and using custom Docker images.
Example Dockerfile:
FROM python:3.11-slim
# Install required dependencies
RUN pip install --no-cache-dir \
ytsaurus-client>=0.13.0 \
boto3==1.35.99 \
omegaconf>=2.3.0
WORKDIR /app
Example config:
client:
operations:
map:
resources:
docker_image: my-registry/my-image:latest
memory_limit_gb: 4
Verifying Cluster Compatibility#
Check Python Version#
Create a test vanilla operation to check Python version:
# stages/test_python/src/vanilla.py
import sys
print(f"Python version: {sys.version}")
Run in prod mode and check logs for Python version.
Check Dependencies#
Create a test operation to verify dependencies:
# stages/test_deps/src/vanilla.py
try:
import yt.wrapper as yt
print("✓ ytsaurus-client available")
except ImportError:
print("✗ ytsaurus-client missing")
try:
import boto3
print(f"✓ boto3 available: {boto3.__version__}")
except ImportError:
print("✗ boto3 missing")
try:
import omegaconf
print(f"✓ omegaconf available: {omegaconf.__version__}")
except ImportError:
print("✗ omegaconf missing (recommended for config reading)")
Common Issues#
Issue: ImportError for ytsaurus-client
Solution: Install
ytsaurus-client>=0.13.0in Docker imageCheck: Verify you’re using checkpoint operations
Issue: ImportError for boto3
Solution: Install
boto3==1.35.99in Docker imageCheck: Verify you’re using S3 operations
Issue: Python version too old
Solution: Use Docker image with Python 3.11+
Check: Verify Python version in cluster image
Best Practices#
1. Document Your Dependencies#
List all ytjobs modules you use in your pipeline documentation:
## Dependencies
This pipeline uses:
- `ytjobs.s3` (requires boto3)
- `ytjobs.checkpoint` (requires ytsaurus-client)
2. Use Custom Docker Images#
For production pipelines, always use custom Docker images with pinned dependency versions:
FROM python:3.11-slim
RUN pip install --no-cache-dir \
ytsaurus-client==0.13.0 \
boto3==1.35.99 \
botocore==1.35.99 \
omegaconf>=2.3.0
3. Test Dependencies Early#
Create a simple test stage that imports all ytjobs modules you use:
# stages/test_dependencies/src/vanilla.py
from ytjobs.s3 import S3Client
from ytjobs.checkpoint import save_checkpoint
print("All dependencies available!")
4. Version Pinning#
Pin exact versions in your Docker images for reproducibility:
RUN pip install --no-cache-dir \
ytsaurus-client==0.13.0 \
boto3==1.35.99 \
botocore==1.35.99 \
omegaconf>=2.3.0
5. Minimal Images#
Only install dependencies you actually use:
If you don’t use S3, don’t install boto3
If you don’t use checkpoints, don’t install ytsaurus-client
Note:
omegaconfis recommended even for minimal images if you read config files in your jobs
Summary#
Key Points:
Code runs on cluster:
ytjobscode executes on YT cluster nodes, not locallyDocker image must have dependencies: All required packages must be pre-installed in the Docker image
Python 3.11+ required: Minimum Python version for the framework
Core dependencies: ytsaurus-client (checkpoints), boto3 (S3 operations)
Recommended: omegaconf for optimal config file reading
Two solutions: Use default cluster image with dependencies OR always use custom Docker images
Action Items:
Verify your cluster’s default Docker image includes required dependencies
If not, create custom Docker images with required dependencies
Test dependencies early with a simple test operation
Document which
ytjobsmodules your pipeline usesPin dependency versions in Docker images for reproducibility