# Docker Support

YT Framework supports custom Docker images for operations that require special dependencies, GPU support, or custom environments.

```{note}
**When Custom Docker is Required**

Custom Docker images are essential if your YT cluster's default Docker image doesn't include the dependencies required by `ytjobs` (Python 3.11+, ytsaurus-client, boto3, omegaconf). See [Cluster Requirements](configuration/cluster-requirements.md) for details about cluster dependencies and when to use custom Docker images.
```

## Overview

Custom Docker images allow you to:

- Install custom dependencies
- Use GPU-enabled environments
- Customize the execution environment
- Ensure consistent environments across operations
- **Ensure required `ytjobs` dependencies are available** (if default cluster image lacks them)

**Key points:**

- Specify Docker image in operation config
- Image must be compatible with YT cluster
- GPU support requires GPU-enabled images
- Docker authentication supported
- **Can solve cluster dependency issues** - use custom images if default cluster image lacks required packages

## When to Use Custom Docker

### Cluster Dependencies

If your YT cluster's default Docker image doesn't include required `ytjobs` dependencies (Python 3.11+, ytsaurus-client, boto3), you must use custom Docker images. This is the most common reason for using custom Docker images.

See [Cluster Requirements](configuration/cluster-requirements.md) for complete details about required dependencies.

### GPU Workloads

For GPU processing, you need a GPU-enabled Docker image:

```yaml
client:
  operations:
    map:
      resources:
        docker_image: nvidia/cuda:11.8.0-runtime-ubuntu22.04
        gpu_limit: 1
        memory_limit_gb: 16
```

### Custom Dependencies

For operations requiring specific libraries or tools:

```yaml
client:
  operations:
    vanilla:
      resources:
        docker_image: my-registry/my-custom-image:latest
        memory_limit_gb: 4
```

### Consistent Environments

For reproducible environments across teams:

```yaml
client:
  operations:
    map:
      resources:
        docker_image: my-registry/standard-python:3.11
        memory_limit_gb: 4
```

## Creating Docker Images

### Basic Dockerfile

Create a `Dockerfile` in your pipeline or stage directory:

```dockerfile
# Build for linux/amd64 platform (required for YT cluster compatibility)
FROM python:3.11-slim

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
RUN pip install --no-cache-dir \
    numpy>=1.20.0 \
    pandas>=1.3.0

WORKDIR /app
```

### Platform Requirements

**Important:** YT cluster requires `linux/amd64` platform:

```bash
# Build for correct platform
docker buildx build --platform linux/amd64 --tag my-image:latest --load .
```

Or use buildx:

```bash
docker buildx build --platform linux/amd64 --tag my-image:latest --push .
```

### GPU Dockerfile

For GPU workloads:

```dockerfile
# Use NVIDIA CUDA base image
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

# Install Python
RUN apt-get update && apt-get install -y \
    python3.11 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

# Install GPU-enabled libraries
RUN pip3 install --no-cache-dir \
    torch>=2.0.0 \
    torchvision>=0.15.0

WORKDIR /app
```

**Note:** GPU images are larger and take longer to pull.

### Minimal Dockerfile

For simple operations:

```dockerfile
FROM python:3.11-slim

# Install only what you need
RUN pip install --no-cache-dir omegaconf

WORKDIR /app
```

## Configuration

### Basic Configuration

Specify Docker image in operation config:

```yaml
# stages/my_stage/config.yaml
client:
  operations:
    map:
      resources:
        docker_image: my-registry/my-image:latest
        pool: default
        memory_limit_gb: 4
        cpu_limit: 2
```

### Docker Image Location

Docker images can be:

- **Public registry**: `python:3.11-slim`, `nvidia/cuda:11.8.0`
- **Private registry**: `my-registry/my-image:latest`
- **YT registry**: `//path/to/image` (if using YT's Docker registry)

### GPU Configuration

For GPU workloads:

```yaml
client:
  operations:
    map:
      resources:
        docker_image: nvidia/cuda:11.8.0-runtime-ubuntu22.04
        gpu_limit: 1              # Request 1 GPU
        memory_limit_gb: 16       # More memory for GPU workloads
        cpu_limit: 4
```

**GPU requirements:**

- GPU-enabled Docker image
- `gpu_limit` set to 1 or higher
- Sufficient memory (GPU workloads need more)

## Docker Authentication

For private registries, configure Docker authentication via environment variables in `secrets.env`:

### Authentication Configuration

Add Docker credentials to `configs/secrets.env`:

```bash
# configs/secrets.env
DOCKER_AUTH_USERNAME=myuser
DOCKER_AUTH_PASSWORD=mypassword
```

The framework automatically uses these credentials when a Docker image is specified in the operation config:

```yaml
client:
  operations:
    map:
      resources:
        docker_image: my-registry/private-image:latest
        # Docker auth is automatically loaded from secrets.env
```

**Note:** Docker authentication is only used if all three are present: `docker_image`, `DOCKER_AUTH_USERNAME`, and `DOCKER_AUTH_PASSWORD` in secrets.env.

## Complete Example

### Dockerfile

```dockerfile
# Build for linux/amd64 platform
FROM python:3.11-slim

# Install system tools
RUN apt-get update && apt-get install -y \
    cowsay \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies
RUN pip install --no-cache-dir \
    omegaconf \
    botocore \
    boto3

# Make cowsay available
RUN ln -sf /usr/games/cowsay /usr/local/bin/cowsay

WORKDIR /app
```

### Stage Configuration

```yaml
# stages/run_in_docker/config.yaml
client:
  operations:
    vanilla:
      resources:
        docker_image: my-registry/my-image:latest
        pool: default
        memory_limit_gb: 2
        cpu_limit: 1
```

### Stage Code

```python
# stages/run_in_docker/stage.py
from yt_framework.core.pipeline import DebugContext
from yt_framework.core.stage import BaseStage
from yt_framework.operations.vanilla import run_vanilla

class RunInDockerStage(BaseStage):
    def run(self, debug: DebugContext) -> DebugContext:
        success = run_vanilla(
            context=self.context,
            operation_config=self.config.client.operations.vanilla,
        )
        
        if not success:
            raise RuntimeError("Vanilla operation failed")
        
        return debug
```

### Vanilla Script

```python
# stages/run_in_docker/src/vanilla.py
#!/usr/bin/env python3
import subprocess
import logging
from ytjobs.logging.logger import get_logger

def main():
    logger = get_logger("docker-example", level=logging.INFO)
    
    # Use custom tool from Docker image
    result = subprocess.run(
        ["cowsay", "Hello from Docker!"],
        capture_output=True,
        text=True,
    )
    
    logger.info(result.stdout)

if __name__ == "__main__":
    main()
```

See [Example: 07_custom_docker](https://github.com/GregoryKogan/yt-framework/tree/main/examples/07_custom_docker/) for complete example.

## Best Practices

### Image Size

**Keep images small:**

- Use slim base images (`python:3.11-slim`)
- Remove unnecessary packages
- Use multi-stage builds if needed
- Clean up apt cache

**Example:**

```dockerfile
FROM python:3.11-slim

RUN apt-get update && apt-get install -y \
    build-essential \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*
```

### Dependency Management

**Install dependencies in image:**

- Pre-install common dependencies
- Use `requirements.txt` for stage-specific deps
- Pin versions for reproducibility

**Example:**

```dockerfile
FROM python:3.11-slim

# Pre-install common dependencies
RUN pip install --no-cache-dir \
    numpy>=1.20.0 \
    pandas>=1.3.0

# Stage-specific deps installed at runtime via requirements.txt
WORKDIR /app
```

### Version Tagging

**Tag images with versions:**

```dockerfile
# Build with version tag
docker buildx build --platform linux/amd64 \
    --tag my-registry/my-image:v1.2.3 \
    --push .
```

**Use in config:**

```yaml
docker_image: my-registry/my-image:v1.2.3
```

### Testing Images

**Test images locally:**

```bash
# Build image
docker buildx build --platform linux/amd64 --tag my-image:test --load .

# Test image
docker run --rm my-image:test python3 -c "import numpy; print(numpy.__version__)"
```

## Common Patterns

### Python with ML Libraries

```dockerfile
FROM python:3.11-slim

RUN pip install --no-cache-dir \
    numpy>=1.20.0 \
    pandas>=1.3.0 \
    scikit-learn>=1.0.0 \
    transformers>=4.20.0
```

### GPU with PyTorch

```dockerfile
FROM nvidia/cuda:11.8.0-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y \
    python3.11 \
    python3-pip \
    && rm -rf /var/lib/apt/lists/*

RUN pip3 install --no-cache-dir \
    torch>=2.0.0 \
    torchvision>=0.15.0
```

### Custom Tools

```dockerfile
FROM python:3.11-slim

RUN apt-get update && apt-get install -y \
    ffmpeg \
    imagemagick \
    && rm -rf /var/lib/apt/lists/*

# Install any custom Python dependencies your tools need
RUN pip install --no-cache-dir \
    your-custom-package>=1.0.0
```

## Troubleshooting

### Issue: Image not found

- Check image name and tag
- Verify image exists in registry
- Check Docker authentication

### Issue: Platform mismatch

- Build for `linux/amd64` platform
- Use `docker buildx` for cross-platform builds

### Issue: GPU not available

- Verify GPU-enabled image
- Check `gpu_limit` is set
- Verify cluster has GPU nodes

### Issue: Slow image pull

- Use smaller base images
- Cache layers effectively
- Use local registry if possible

### Issue: Dependencies missing

- Check image includes required packages
- Verify `requirements.txt` is correct
- Review installation logs

## Next Steps

- Understand [Cluster Requirements](configuration/cluster-requirements.md) for required dependencies
- Learn about [Checkpoints](checkpoints.md) for model files
- Explore [Code Upload](code-upload.md) for code packaging
- Check out [Example: 07_custom_docker](https://github.com/GregoryKogan/yt-framework/tree/main/examples/07_custom_docker/) for complete example