Contents

How Trino Executes A Query: Coordinator, Workers, Stages, Tasks, Drivers, And Pages

How Trino Executes A Query

Trino can feel hard to learn because the words are unfamiliar before the code is hard: coordinator, worker, catalog, connector, split, stage, task, driver, operator, page, block.

The useful way to learn Trino is to follow one query through the system and keep asking the same question:

Which part plans the work, which part runs the work, and where does the data
move next?

Before getting into optimizer rules or connector internals, the first thing to keep straight is the runtime map: which objects exist, where planning stops, where execution starts, and how data moves between them.

Imagine this query:

SELECT orderstatus, count(*)
FROM iceberg.tpch.orders
WHERE orderdate >= DATE '2024-01-01'
GROUP BY orderstatus;

The table name has three parts:

iceberg.tpch.orders

Read it as:

catalog.schema.table

The catalog tells Trino which connector to use. In this example, the iceberg catalog routes table access to the Iceberg connector. The connector is the boundary between Trino’s query engine and the external data source.

At a high level, the query moves like this:

SQL client
  -> coordinator
  -> parser
  -> analyzer
  -> planner
  -> optimizer
  -> distributed plan
  -> scheduler
  -> workers
  -> connector
  -> data source
  -> workers
  -> coordinator
  -> SQL client

The most important split is:

coordinator:
  decides what work should happen

workers:
  execute the work and move data through operators
flowchart LR
    Client[SQL client] -->|SQL statement| Coordinator[Coordinator]

    Coordinator -->|parse, analyze, plan, optimize| Plan[Distributed plan]
    Coordinator -->|schedule tasks and splits| WorkerA[Worker]
    Coordinator -->|schedule tasks and splits| WorkerB[Worker]
    Coordinator -->|schedule tasks and splits| WorkerN[Worker]

    WorkerA <-->|exchange pages| WorkerB
    WorkerB <-->|exchange pages| WorkerN

    WorkerA -->|status and results| Coordinator
    WorkerB -->|status and results| Coordinator
    WorkerN -->|status and results| Coordinator
    Coordinator -->|result set| Client

    subgraph DataAccess["Data access"]
        Catalog[Catalog] --> Connector[Connector]
        Connector --> Source[(External data source)]
    end

    Coordinator -->|metadata and split requests| Connector
    WorkerA -->|read/write through connector| Connector
    WorkerB -->|read/write through connector| Connector
    WorkerN -->|read/write through connector| Connector

The coordinator is the control plane for a query. It accepts SQL, analyzes it, creates plans, asks connectors for metadata, schedules tasks on workers, and tracks query state.

Workers are the execution plane. They receive tasks, read data through connectors, run operators such as scan/filter/join/aggregation, exchange intermediate data with other workers, and report progress back to the coordinator.

Connectors let Trino query systems outside the engine. Iceberg, Hive, JDBC, Kafka, TPCH, and Memory are all connector-backed access paths. For lakehouse tables, connectors are where many important behaviors live: metadata lookup, predicate pushdown, partition pruning, split generation, statistics, and file reading.

For a normal read query, the lifecycle is:

1. Client submits SQL.
2. Coordinator parses SQL into an abstract syntax tree.
3. Analyzer resolves catalogs, schemas, tables, columns, types, and functions.
4. Planner creates a logical plan.
5. Optimizer rewrites the plan.
6. Distributed planner splits the plan into stages and fragments.
7. Scheduler assigns tasks and splits to workers.
8. Workers execute operators.
9. Exchanges move intermediate data between stages.
10. Final results stream back to the client.

That list is dense, but the control/data split keeps it understandable:

before execution:
  mostly coordinator work

during execution:
  workers run pipelines and exchange data

at connector boundaries:
  Trino asks the connector about tables, columns, splits, page sources, and
  writes

The runtime hierarchy to memorize is:

Query
  -> stages
     -> tasks
        -> pipelines
           -> drivers
              -> operators
                 -> pages

Here is the same idea as a diagram:

flowchart TB
    Query[Query] --> Stages[Stages]
    Stages --> StageA[Stage]
    StageA --> Tasks[Tasks on workers]
    Tasks --> Pipelines[Pipelines]
    Pipelines --> Drivers[Drivers]
    Drivers --> Operators[Operators]
    Operators --> Pages[Pages]

    Tasks --> Splits[Splits]
    Stages <-->|Exchange| OtherStages[Other stages]

Each level has a specific job.

Concept Short meaning
Query The full SQL statement being executed.
Stage A distributed piece of the query plan. Stages are connected by exchanges.
Task A stage fragment running on one worker.
Pipeline A chain of operators that can move data page by page.
Driver One running copy of a pipeline.
Operator A step that scans, filters, joins, aggregates, sorts, exchanges, or outputs data.
Page A columnar batch of rows passed between operators.
Split A unit of table-scan work, often a file or file range for lakehouse tables.

This hierarchy is more useful than trying to remember class names first.

A stage is a distributed piece of the query plan.

For example, a query might have:

source stage:
  scan table data and apply local filters

aggregation stage:
  combine rows by group key

root stage:
  produce the final result

A task is one stage fragment running on one worker.

If a source stage needs to scan many files, Trino can create tasks on multiple workers. Those tasks process splits in parallel. The coordinator decides where tasks and splits should go; workers execute the local work they receive.

The important correction is that a stage is not a worker. A stage is part of the plan. A task is that stage’s work running on a worker.

A split is a unit of work for reading data.

For file-based lakehouse tables, a split often maps to a file or byte range. For example:

orders/file_001.parquet, bytes 0-134217728
orders/file_002.parquet, bytes 0-67108864
orders/file_003.parquet, bytes 67108864-134217728

The exact meaning depends on the connector. Trino’s engine understands that it has splits to schedule; the connector understands what those splits mean for the underlying system.

That boundary matters. The Trino scheduler can assign work to workers, but the Iceberg connector decides which data files belong to a snapshot and how to turn them into Iceberg splits.

Inside a task, Trino builds pipelines. A pipeline is an operator chain.

For a simple scan and filter, the pipeline might look like:

TableScan -> FilterAndProject -> TaskOutput

For an aggregation stage, it might look like:

ExchangeSource -> HashAggregation -> PartitionedOutput

A driver is one running copy of a pipeline. If the pipeline is the recipe, a driver is one active execution of that recipe.

An operator is one step inside the driver:

TableScanOperator:
  reads pages from a connector page source

FilterAndProjectOperator:
  filters rows and computes expressions

HashAggregationOperator:
  groups rows and maintains aggregation state

TaskOutputOperator / PartitionedOutputOperator:
  sends pages onward

The driver coordinates the operators. During processing, it repeatedly pulls pages from one operator and pushes them into the next:

current.getOutput() -> Page
next.addInput(Page)

For the example query, a source pipeline might do this:

read a Page from the Iceberg/Parquet page source
  -> keep rows where orderdate >= DATE '2024-01-01'
  -> keep or compute the columns needed by the query
  -> send pages into an aggregation or exchange path

The data moves in batches, not as individual Java objects for each SQL row.

A Trino Page is an in-memory batch of rows in columnar form.

Conceptually, it looks like this:

Page
  positionCount = 3

  Block 0: orderstatus
    ["O", "F", "O"]

  Block 1: count
    [10, 7, 4]

Each logical row is a position across blocks:

position 0 -> "O", 10
position 1 -> "F", 7
position 2 -> "O", 4

Vocabulary:

Term Meaning
Page A batch of rows.
Block One column’s values inside a page.
Position Row index inside the page.
Channel Column index inside the page.

This is why Trino is often described as processing columnar batches internally. Operators pass Page objects to each other, and each page contains one Block per output column.

Do not confuse a Trino page with a Parquet page. They are different things:

Trino Page:
  in-memory execution batch

Parquet page:
  encoded and usually compressed storage unit inside a Parquet column chunk

The names overlap, but the layers are different.

The connector boundary is where Trino stops being only a query engine and starts talking to an external system.

For the example table:

FROM iceberg.tpch.orders

Trino uses the iceberg catalog. The Iceberg connector is responsible for Iceberg-specific work such as:

finding the table
reading table metadata
choosing a snapshot
applying predicate and projection pushdown
planning splits
creating page sources
reading Parquet, ORC, or Avro data files

The engine still owns the general query pipeline:

parse SQL
analyze names and types
build and optimize plan
schedule work
run operators
exchange pages
produce final results

Keeping that boundary clear prevents a lot of confusion. If a filter is pushed into Iceberg, that is connector behavior. If a stage exchanges pages between workers, that is engine execution behavior.

There are two scheduling layers:

coordinator scheduler:
  decides which workers receive tasks and splits

worker-local task executor:
  decides which local driver or split runner gets a thread next

The coordinator is not centrally stepping every operator one by one. It places work on workers and tracks progress. After that, each worker has local execution queues and runs its assigned work.

Inside a worker execution turn, a driver streams pages through its operator chain:

TableScan.getOutput() -> Page
FilterAndProject.addInput(Page)
FilterAndProject.getOutput() -> filtered Page
TaskOutput.addInput(filtered Page)

If the driver yields or blocks, the same live driver/operator objects keep their state. Already-produced pages may have moved into an output buffer, local exchange, or another stage.

For EXPLAIN, the main thing to keep from this section is the small set of plan terms that connect back to the runtime model:

TableScan or ScanFilterProject:
  where data enters from a connector

Exchange / RemoteSource:
  where pages move between stages

Aggregation / Join / Sort:
  where operators transform pages

Fragment / Stage:
  how Trino splits the query into distributed pieces

Estimates:
  planner guesses, not runtime proof

When I later run:

EXPLAIN
SELECT orderstatus, count(*)
FROM iceberg.tpch.orders
WHERE orderdate >= DATE '2024-01-01'
GROUP BY orderstatus;

I want to map each plan section back to the hierarchy:

Which connector is used?
Where is the scan?
Where are filters applied?
Where does aggregation happen?
Where do exchanges happen?
Which fragment produces the final result?

That exercise is the fastest way to turn the vocabulary into intuition.

Trino is not a storage engine for my data. It usually queries data where it already lives.

Trino is also not usually the system that produces lakehouse data. In many lakehouse architectures, Spark or Flink writes data and Trino serves interactive SQL over that data.

For my own learning path, that means:

study query planning and execution first
study connector boundaries second
study storage formats and table formats third

For Iceberg and Hive, the connector layer is especially important because table metadata, partitions, splits, file formats, and pushdown decisions all live there.

The first mental model is:

Coordinator plans and schedules.
Workers execute.
Connectors access external data.
Operators move Page batches.
Stages and exchanges describe distributed data flow.

The runtime hierarchy is:

Query
  -> stages
     -> tasks
        -> pipelines
           -> drivers
              -> operators
                 -> pages

The key boundary is:

engine:
  general SQL planning and distributed execution

connector:
  data-source-specific metadata, splits, page sources, and writes

If I can keep those three ideas straight, the rest of Trino becomes much easier to learn.

Questions to check later:

  1. What does the coordinator do before workers read data?
  2. What is the difference between a stage and a task?
  3. What is a split?
  4. What is the relationship between a pipeline and a driver?
  5. What does an operator consume and produce?
  6. What is a Trino Page?
  7. Why is a Trino page different from a Parquet page?
  8. What work belongs to the connector instead of the core engine?
  9. Why is catalog.schema.table important?
  10. Why is EXPLAIN ANALYZE different from EXPLAIN?