Docker Support#
YT Framework supports custom Docker images for operations that require special dependencies, GPU support, or custom environments.
Note
When Custom Docker is Required
Use a custom image when the cell default does not ship the Python stack your uploaded jobs import (ytjobs, boto3, CUDA user-space, etc.). See Cluster requirements for package versions this repo tests against.
Overview#
Custom Docker images allow you to:
Install custom dependencies
Use GPU-enabled environments
Customize the execution environment
Ensure consistent environments across operations
Ensure required
ytjobsdependencies are available (if default cluster image lacks them)
Key points:
Specify Docker image in operation config
Image must be compatible with YT cluster
GPU support requires GPU-enabled images
Docker authentication supported
Can solve cluster dependency issues - use custom images if default cluster image lacks required packages
When to Use Custom Docker#
Cluster Dependencies#
If your YT cluster’s default Docker image doesn’t include required ytjobs dependencies (Python 3.11+, ytsaurus-client, boto3), you must use custom Docker images. This is the most common reason for using custom Docker images.
See Cluster Requirements for complete details about required dependencies.
GPU Workloads#
For GPU processing, you need a GPU-enabled Docker image:
client:
operations:
map:
resources:
docker_image: docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04
gpu_limit: 1
memory_limit_gb: 16
Custom Dependencies#
For operations requiring specific libraries or tools:
client:
operations:
vanilla:
resources:
docker_image: my-registry/my-custom-image:latest
memory_limit_gb: 4
Consistent Environments#
For reproducible environments across teams:
client:
operations:
map:
resources:
docker_image: docker.io/library/python:3.11-slim
memory_limit_gb: 4
Creating Docker Images#
Basic Dockerfile#
Create a Dockerfile in your pipeline or stage directory:
# Build for linux/amd64 platform (required for YT cluster compatibility)
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip install --no-cache-dir \
numpy>=1.20.0 \
pandas>=1.3.0
WORKDIR /app
Platform Requirements#
Important: YT cluster requires linux/amd64 platform:
# Build for correct platform
docker buildx build --platform linux/amd64 --tag my-image:latest --load .
Or use buildx:
docker buildx build --platform linux/amd64 --tag my-image:latest --push .
GPU Dockerfile#
For GPU workloads:
# Use NVIDIA CUDA base image
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
# Install Python
RUN apt-get update && apt-get install -y \
python3.11 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Install GPU-enabled libraries
RUN pip3 install --no-cache-dir \
torch>=2.0.0 \
torchvision>=0.15.0
WORKDIR /app
Note: GPU images are larger and take longer to pull.
Minimal Dockerfile#
For simple operations:
FROM python:3.11-slim
# Install only what you need
RUN pip install --no-cache-dir omegaconf
WORKDIR /app
Configuration#
Basic Configuration#
Specify Docker image in operation config:
# stages/my_stage/config.yaml
client:
operations:
map:
resources:
docker_image: my-registry/my-image:latest
pool: default
memory_limit_gb: 4
cpu_limit: 2
Docker Image Location#
Use a fully qualified image reference so the cluster resolves the registry unambiguously:
Docker Hub (user or org image):
docker.io/<namespace>/<repository>:<tag>— e.g.docker.io/gregorykogan/yt-framework:latestDocker Hub (official / “library” image):
docker.io/library/<name>:<tag>— e.g.docker.io/library/python:3.11-slimOther registry:
<registry-host>/<path>:<tag>— e.g.registry.example.com/acme/my-service:1.0
GPU Configuration#
For GPU workloads:
client:
operations:
map:
resources:
docker_image: docker.io/nvidia/cuda:11.8.0-runtime-ubuntu22.04
gpu_limit: 1 # Request 1 GPU
memory_limit_gb: 16 # More memory for GPU workloads
cpu_limit: 4
GPU requirements:
GPU-enabled Docker image
gpu_limitset to 1 or higherSufficient memory (GPU workloads need more)
Docker Authentication#
For private registries, configure Docker authentication via environment variables in secrets.env:
Authentication Configuration#
Add Docker credentials to configs/secrets.env:
# configs/secrets.env
DOCKER_AUTH_USERNAME=myuser
DOCKER_AUTH_PASSWORD=mypassword
The framework automatically uses these credentials when a Docker image is specified in the operation config:
client:
operations:
map:
resources:
docker_image: registry.example.com/acme/private-image:latest
# Docker auth is automatically loaded from secrets.env
Note: Docker authentication is only used if all three are present: docker_image, DOCKER_AUTH_USERNAME, and DOCKER_AUTH_PASSWORD in secrets.env.
Complete Example#
Dockerfile#
# Build for linux/amd64 platform
FROM python:3.11-slim
# Install system tools
RUN apt-get update && apt-get install -y \
cowsay \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
RUN pip install --no-cache-dir \
omegaconf \
botocore \
boto3
# Make cowsay available
RUN ln -sf /usr/games/cowsay /usr/local/bin/cowsay
WORKDIR /app
Stage Configuration#
# stages/run_in_docker/config.yaml
client:
operations:
vanilla:
resources:
docker_image: my-registry/my-image:latest
pool: default
memory_limit_gb: 2
cpu_limit: 1
Stage Code#
# stages/run_in_docker/stage.py
from yt_framework.core.pipeline import DebugContext
from yt_framework.core.stage import BaseStage
from yt_framework.operations.command_ops.vanilla import run_vanilla
class RunInDockerStage(BaseStage):
def run(self, debug: DebugContext) -> DebugContext:
success = run_vanilla(
context=self.context,
operation_config=self.config.client.operations.vanilla,
)
if not success:
raise RuntimeError("Vanilla operation failed")
return debug
Vanilla Script#
# stages/run_in_docker/src/vanilla.py
#!/usr/bin/env python3
import subprocess
import logging
from ytjobs.logging.logger import get_logger
def main():
logger = get_logger("docker-example", level=logging.INFO)
# Use custom tool from Docker image
result = subprocess.run(
["cowsay", "Hello from Docker!"],
capture_output=True,
text=True,
)
logger.info(result.stdout)
if __name__ == "__main__":
main()
See Example: 07_custom_docker for complete example.
Best Practices#
Image Size#
Keep images small:
Use slim base images (
python:3.11-slim)Remove unnecessary packages
Use multi-stage builds if needed
Clean up apt cache
Example:
FROM python:3.11-slim
RUN apt-get update && apt-get install -y \
build-essential \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
Dependency Management#
Install dependencies in image:
Pre-install common dependencies
Use
requirements.txtfor stage-specific depsPin versions for reproducibility
Example:
FROM python:3.11-slim
# Pre-install common dependencies
RUN pip install --no-cache-dir \
numpy>=1.20.0 \
pandas>=1.3.0
# Stage-specific deps installed at runtime via requirements.txt
WORKDIR /app
Version Tagging#
Tag images with versions:
# Build with version tag
docker buildx build --platform linux/amd64 \
--tag my-registry/my-image:v1.2.3 \
--push .
Use in config:
docker_image: my-registry/my-image:v1.2.3
Testing Images#
Test images locally:
# Build image
docker buildx build --platform linux/amd64 \
--tag docker.io/myuser/my-image:test --load .
# Test image
docker run --rm docker.io/myuser/my-image:test \
python3 -c "import numpy; print(numpy.__version__)"
Common Patterns#
Python with ML Libraries#
FROM python:3.11-slim
RUN pip install --no-cache-dir \
numpy>=1.20.0 \
pandas>=1.3.0 \
scikit-learn>=1.0.0 \
transformers>=4.20.0
GPU with PyTorch#
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y \
python3.11 \
python3-pip \
&& rm -rf /var/lib/apt/lists/*
RUN pip3 install --no-cache-dir \
torch>=2.0.0 \
torchvision>=0.15.0
Custom Tools#
FROM python:3.11-slim
RUN apt-get update && apt-get install -y \
ffmpeg \
imagemagick \
&& rm -rf /var/lib/apt/lists/*
# Install any custom Python dependencies your tools need
RUN pip install --no-cache-dir \
your-custom-package>=1.0.0
Troubleshooting#
Issue: Image not found#
Check image name and tag
Verify image exists in registry
Check Docker authentication
Issue: Platform mismatch#
Build for
linux/amd64platformUse
docker buildxfor cross-platform builds
Issue: GPU not available#
Verify GPU-enabled image
Check
gpu_limitis setVerify cluster has GPU nodes
Issue: Slow image pull#
Use smaller base images
Cache layers effectively
Use local registry if possible
Issue: Dependencies missing#
Check image includes required packages
Verify
requirements.txtis correctReview installation logs
Next Steps#
Understand Cluster Requirements for required dependencies
Learn about Checkpoints for model files
Explore Code Upload for code packaging
Check out Example: 07_custom_docker for complete example