Operations Overview#

YT Framework supports several types of operations for processing data on YTsaurus clusters.

Operation Types#

When to Use Each Operation#

Operation

Best For

Input/Output

Parallelization

Map

Row-by-row processing, transformations

Table → Table

Automatic (per row)

Vanilla

Setup, cleanup, standalone tasks

None

Single job

YQL

SQL-like queries, joins, aggregations

Table(s) → Table

Automatic (query-level)

S3

External data integration

S3 → Table

File-level

Quick Comparison#

Map vs YQL#

  • Use Map when you need custom Python logic per row

  • Use YQL when you need SQL-like operations (joins, aggregations)

Vanilla vs Map#

  • Use Vanilla when you don’t need table input/output

  • Use Map when processing table rows

S3 Integration#

  • Use S3 when working with external data sources

  • Often combined with Map or YQL operations

Common Patterns#

Pattern 1: Extract → Transform → Load#

stages:
  enabled_stages:
    - extract_from_s3      # S3 operation
    - transform_data       # Map operation
    - load_to_table        # YQL operation

Pattern 2: Setup → Process → Validate#

stages:
  enabled_stages:
    - setup_environment    # Vanilla operation
    - process_data         # Map operation
    - validate_results     # Vanilla operation

Pattern 3: Join → Filter → Aggregate#

stages:
  enabled_stages:
    - join_tables         # YQL operation
    - filter_data         # YQL operation
    - aggregate_results   # YQL operation

See Also#