Docker Support#
YT Framework supports custom Docker images for operations that require special dependencies, GPU support, or custom environments.
Note
When Custom Docker is Required
Custom Docker images are essential if your YT cluster’s default Docker image doesn’t include the dependencies required by ytjobs (Python 3.11+, ytsaurus-client, boto3, omegaconf). See Cluster Requirements for details about cluster dependencies and when to use custom Docker images.
Overview#
Custom Docker images allow you to:
Install custom dependencies
Use GPU-enabled environments
Customize the execution environment
Ensure consistent environments across operations
Ensure required
ytjobsdependencies are available (if default cluster image lacks them)
Key points:
Specify Docker image in operation config
Image must be compatible with YT cluster
GPU support requires GPU-enabled images
Docker authentication supported
Can solve cluster dependency issues - use custom images if default cluster image lacks required packages
When to Use Custom Docker#
Cluster Dependencies#
If your YT cluster’s default Docker image doesn’t include required ytjobs dependencies (Python 3.11+, ytsaurus-client, boto3), you must use custom Docker images. This is the most common reason for using custom Docker images.
See Cluster Requirements for complete details about required dependencies.
GPU Workloads#
For GPU processing, you need a GPU-enabled Docker image:
client:
operations:
map:
resources:
docker_image: nvidia/cuda:11.8.0-runtime-ubuntu22.04
gpu_limit: 1
memory_limit_gb: 16
Custom Dependencies#
For operations requiring specific libraries or tools:
client:
operations:
vanilla:
resources:
docker_image: my-registry/my-custom-image:latest
memory_limit_gb: 4
Consistent Environments#
For reproducible environments across teams:
client:
operations:
map:
resources:
docker_image: my-registry/standard-python:3.11
memory_limit_gb: 4
Creating Docker Images#
Basic Dockerfile#
Create a Dockerfile in your pipeline or stage directory:
# Build for linux/amd64 platform (required for YT cluster compatibility)
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip install --no-cache-dir \
numpy>=1.20.0 \
pandas>=1.3.0
WORKDIR /app
Platform Requirements#
Important: YT cluster requires linux/amd64 platform:
# Build for correct platform
docker buildx build --platform linux/amd64 --tag my-image:latest --load .
Or use buildx:
docker buildx build --platform linux/amd64 --tag my-image:latest --push .
GPU Dockerfile#
For GPU workloads:
# Use NVIDIA CUDA base image
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
# Install Python
RUN apt-get update && apt-get install -y \
python3.11 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Install GPU-enabled libraries
RUN pip3 install --no-cache-dir \
torch>=2.0.0 \
torchvision>=0.15.0
WORKDIR /app
Note: GPU images are larger and take longer to pull.
Minimal Dockerfile#
For simple operations:
FROM python:3.11-slim
# Install only what you need
RUN pip install --no-cache-dir omegaconf
WORKDIR /app
Configuration#
Basic Configuration#
Specify Docker image in operation config:
# stages/my_stage/config.yaml
client:
operations:
map:
resources:
docker_image: my-registry/my-image:latest
pool: default
memory_limit_gb: 4
cpu_limit: 2
Docker Image Location#
Docker images can be:
Public registry:
python:3.11-slim,nvidia/cuda:11.8.0Private registry:
my-registry/my-image:latestYT registry:
//path/to/image(if using YT’s Docker registry)
GPU Configuration#
For GPU workloads:
client:
operations:
map:
resources:
docker_image: nvidia/cuda:11.8.0-runtime-ubuntu22.04
gpu_limit: 1 # Request 1 GPU
memory_limit_gb: 16 # More memory for GPU workloads
cpu_limit: 4
GPU requirements:
GPU-enabled Docker image
gpu_limitset to 1 or higherSufficient memory (GPU workloads need more)
Docker Authentication#
For private registries, configure Docker authentication via environment variables in secrets.env:
Authentication Configuration#
Add Docker credentials to configs/secrets.env:
# configs/secrets.env
DOCKER_AUTH_USERNAME=myuser
DOCKER_AUTH_PASSWORD=mypassword
The framework automatically uses these credentials when a Docker image is specified in the operation config:
client:
operations:
map:
resources:
docker_image: my-registry/private-image:latest
# Docker auth is automatically loaded from secrets.env
Note: Docker authentication is only used if all three are present: docker_image, DOCKER_AUTH_USERNAME, and DOCKER_AUTH_PASSWORD in secrets.env.
Complete Example#
Dockerfile#
# Build for linux/amd64 platform
FROM python:3.11-slim
# Install system tools
RUN apt-get update && apt-get install -y \
cowsay \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip install --no-cache-dir \
omegaconf \
botocore \
boto3
# Make cowsay available
RUN ln -sf /usr/games/cowsay /usr/local/bin/cowsay
WORKDIR /app
Stage Configuration#
# stages/run_in_docker/config.yaml
client:
operations:
vanilla:
resources:
docker_image: my-registry/my-image:latest
pool: default
memory_limit_gb: 2
cpu_limit: 1
Stage Code#
# stages/run_in_docker/stage.py
from yt_framework.core.pipeline import DebugContext
from yt_framework.core.stage import BaseStage
from yt_framework.operations.vanilla import run_vanilla
class RunInDockerStage(BaseStage):
def run(self, debug: DebugContext) -> DebugContext:
success = run_vanilla(
context=self.context,
operation_config=self.config.client.operations.vanilla,
)
if not success:
raise RuntimeError("Vanilla operation failed")
return debug
Vanilla Script#
# stages/run_in_docker/src/vanilla.py
#!/usr/bin/env python3
import subprocess
import logging
from ytjobs.logging.logger import get_logger
def main():
logger = get_logger("docker-example", level=logging.INFO)
# Use custom tool from Docker image
result = subprocess.run(
["cowsay", "Hello from Docker!"],
capture_output=True,
text=True,
)
logger.info(result.stdout)
if __name__ == "__main__":
main()
See Example: 07_custom_docker for complete example.
Best Practices#
Image Size#
Keep images small:
Use slim base images (
python:3.11-slim)Remove unnecessary packages
Use multi-stage builds if needed
Clean up apt cache
Example:
FROM python:3.11-slim
RUN apt-get update && apt-get install -y \
build-essential \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Dependency Management#
Install dependencies in image:
Pre-install common dependencies
Use
requirements.txtfor stage-specific depsPin versions for reproducibility
Example:
FROM python:3.11-slim
# Pre-install common dependencies
RUN pip install --no-cache-dir \
numpy>=1.20.0 \
pandas>=1.3.0
# Stage-specific deps installed at runtime via requirements.txt
WORKDIR /app
Version Tagging#
Tag images with versions:
# Build with version tag
docker buildx build --platform linux/amd64 \
--tag my-registry/my-image:v1.2.3 \
--push .
Use in config:
docker_image: my-registry/my-image:v1.2.3
Testing Images#
Test images locally:
# Build image
docker buildx build --platform linux/amd64 --tag my-image:test --load .
# Test image
docker run --rm my-image:test python3 -c "import numpy; print(numpy.__version__)"
Common Patterns#
Python with ML Libraries#
FROM python:3.11-slim
RUN pip install --no-cache-dir \
numpy>=1.20.0 \
pandas>=1.3.0 \
scikit-learn>=1.0.0 \
transformers>=4.20.0
GPU with PyTorch#
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y \
python3.11 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir \
torch>=2.0.0 \
torchvision>=0.15.0
Custom Tools#
FROM python:3.11-slim
RUN apt-get update && apt-get install -y \
ffmpeg \
imagemagick \
&& rm -rf /var/lib/apt/lists/*
# Install any custom Python dependencies your tools need
RUN pip install --no-cache-dir \
your-custom-package>=1.0.0
Troubleshooting#
Issue: Image not found#
Check image name and tag
Verify image exists in registry
Check Docker authentication
Issue: Platform mismatch#
Build for
linux/amd64platformUse
docker buildxfor cross-platform builds
Issue: GPU not available#
Verify GPU-enabled image
Check
gpu_limitis setVerify cluster has GPU nodes
Issue: Slow image pull#
Use smaller base images
Cache layers effectively
Use local registry if possible
Issue: Dependencies missing#
Check image includes required packages
Verify
requirements.txtis correctReview installation logs
Next Steps#
Understand Cluster Requirements for required dependencies
Learn about Checkpoints for model files
Explore Code Upload for code packaging
Check out Example: 07_custom_docker for complete example