How Trino Executes A Query: Coordinator, Workers, Stages, Tasks, Drivers, And Pages

Ziva Li included in series Learning Trino by Tracing One Iceberg Table From SQL to Files

2026-06-07 1942 words 9 minutes

Series - Learning Trino by Tracing One Iceberg Table From SQL to Files

Contents

Trino can feel hard to learn because the execution vocabulary is unfamiliar: coordinator, worker, catalog, connector, split, stage, task, driver, operator, page, block.

The useful way to learn Trino is to follow one query through the system and keep asking the same question:

Which part plans the work, which part runs the work, and where does the data
move next?

Before tracing into a real query, the first thing to keep straight is the runtime map: which objects exist, where planning stops, where execution starts, and how data moves between them.

1. The Small Example

Imagine this query:

SELECT o_orderstatus, count(*)
FROM iceberg.tpch.orders
WHERE o_orderdate >= DATE '1995-01-01'
GROUP BY o_orderstatus;

The table name has three parts:

iceberg.tpch.orders

Read it as:

catalog.schema.table

The catalog tells Trino which connector to use. In this example, the iceberg catalog routes table access to the Iceberg connector. The connector is the boundary between Trino’s query engine and the external data source.

At a high level, the query moves like this:

SQL client
  -> coordinator
  -> parser
  -> analyzer
  -> planner
  -> optimizer
  -> distributed plan
  -> scheduler
  -> workers
  -> connector page source
  -> data source
  -> Trino pages
  -> operators
  -> coordinator
  -> SQL client

The most important split is:

coordinator:
  decides what work should happen

workers:
  execute the work and move data through operators

2. Why MPP Matters Here

Trino is an MPP query engine. MPP means massively parallel processing: one SQL query can be split into many pieces of work and executed across multiple workers at the same time.

For this note, the important part is:

coordinator:
  plans and schedules the query

workers:
  run pieces of the query in parallel

exchanges:
  move intermediate pages between stages

final result:
  returns to the client through the coordinator

This is why Trino has words like stage, task, split, driver, operator, and exchange. They are the pieces that make one SQL query run as distributed work instead of one local process reading everything by itself.

3. Architecture

flowchart TB
    Client[SQL client]
    Coordinator[Coordinator]
    Plan[Distributed plan / fragments]
    Scheduler[Scheduler]

    Client -->|SQL statement| Coordinator
    Coordinator -->|parse, analyze, plan, optimize| Plan
    Plan --> Scheduler

    Scheduler -->|tasks and splits| WorkerA[Worker A]
    Scheduler -->|tasks and splits| WorkerB[Worker B]
    Scheduler -->|tasks and splits| WorkerC[Worker C]

    WorkerA <-->|exchange pages| WorkerB
    WorkerB <-->|exchange pages| WorkerC

    WorkerA -->|status and results| Coordinator
    WorkerB -->|status and results| Coordinator
    WorkerC -->|status and results| Coordinator
    Coordinator -->|final result set| Client

The same query also has a connector boundary:

flowchart TB
    Coordinator[Coordinator]
    Worker[Worker task]
    Catalog[Catalog]
    Connector[Connector]
    Metadata[(Table metadata)]
    Source[(External data source)]

    Catalog --> Connector

    Coordinator -->|table metadata| Catalog
    Coordinator -->|split planning| Connector
    Connector --> Metadata

    Worker -->|page source or page sink| Connector
    Connector --> Source
    Source -->|data pages| Worker

The coordinator is the control plane for a query. It accepts SQL, analyzes it, creates plans, asks connectors for metadata, schedules tasks on workers, and tracks query state.

Workers are the execution plane. They receive tasks, read data through connectors, run operators such as scan/filter/join/aggregation, exchange intermediate data with other workers, and report progress back to the coordinator.

Connectors let Trino query systems outside the engine. Iceberg, Hive, JDBC, Kafka, TPCH, and Memory are all connector-backed access paths. For lakehouse tables, connectors are where many important behaviors live: metadata lookup, predicate pushdown, partition pruning, split generation, statistics, and file reading.

4. Query Lifecycle

For a normal read query, the lifecycle is:

1. Client submits SQL.
2. Coordinator parses SQL into an abstract syntax tree.
3. Analyzer resolves catalogs, schemas, tables, columns, types, and functions.
4. Planner creates a logical plan.
5. Optimizer rewrites the plan.
6. Distributed planner splits the plan into stages and fragments.
7. Scheduler assigns tasks and splits to workers.
8. Workers execute operators.
9. Exchanges move intermediate data between stages.
10. Final results stream back to the client.

In EXPLAIN output, Trino prints fragments. For this mental model, I treat a fragment as the planned distributed unit that becomes stage work during execution.

That list is dense, but the control/data split keeps it understandable:

before execution:
  mostly coordinator work

during execution:
  workers run pipelines and exchange data

at connector boundaries:
  Trino asks the connector about tables, columns, splits, page sources, and
  writes

5. The Runtime Hierarchy

The runtime hierarchy to memorize is:

Query
  -> stages
     -> tasks
        -> pipelines
           -> drivers
              -> operators
                 -> pages

Here is the same idea as a diagram:

flowchart TB
    Query[Query] --> Stages[Stages]
    Stages --> StageA[Stage]
    StageA --> Tasks[Tasks on workers]
    Tasks --> Pipelines[Pipelines]
    Pipelines --> Drivers[Drivers]
    Drivers --> Operators[Operators]
    Operators --> Pages[Pages]

    Tasks --> Splits[Splits]
    Stages <-->|Exchange| OtherStages[Other stages]

Each level has a specific job.

Concept	Short meaning
Query	The full SQL statement being executed.
Stage	A distributed piece of the query plan. Stages are connected by exchanges.
Task	A stage fragment running on one worker.
Pipeline	A chain of operators that can move data page by page.
Driver	One running copy of a pipeline.
Operator	A step that scans, filters, joins, aggregates, sorts, exchanges, or outputs data.
Page	A columnar batch of rows passed between operators.
Split	A unit of table-scan work, often a file or file range for lakehouse tables.

Splits feed source operators. They are not a level below every task in the same way drivers and operators are; they are the scan work that a source task processes.

This hierarchy is more useful than trying to remember class names first.

6. Stage And Task

A stage is a distributed piece of the query plan.

For example, a query might have:

source stage:
  scan table data and apply local filters

aggregation stage:
  combine rows by group key

root stage:
  produce the final result

A task is one stage fragment running on one worker.

If a source stage needs to scan many files, Trino can create tasks on multiple workers. Those tasks process splits in parallel. The coordinator decides where tasks and splits should go; workers execute the local work they receive.

The important correction is that a stage is not a worker. A stage is part of the plan. A task is that stage’s work running on a worker.

7. Split

A split is a unit of work for reading data.

For file-based lakehouse tables, a split often maps to a file or byte range. For example:

orders/file_001.parquet, bytes 0-134217728
orders/file_002.parquet, bytes 0-67108864
orders/file_003.parquet, bytes 67108864-134217728

The exact meaning depends on the connector. Trino’s engine understands that it has splits to schedule; the connector understands what those splits mean for the underlying system.

That boundary matters. The Trino scheduler can assign work to workers, but the Iceberg connector decides which data files belong to a snapshot and how to turn them into Iceberg splits.

8. Pipeline, Driver, And Operator

Inside a task, Trino builds pipelines. A pipeline is an operator chain.

For a simple scan and filter, the pipeline might look like:

TableScan -> FilterAndProject -> TaskOutput

For an aggregation stage, it might look like:

ExchangeSource -> HashAggregation -> PartitionedOutput

A driver is one running copy of a pipeline. If the pipeline is the recipe, a driver is one active execution of that recipe.

An operator is one step inside the driver:

TableScanOperator:
  reads pages from a connector page source

FilterAndProjectOperator:
  filters rows and computes expressions

HashAggregationOperator:
  groups rows and maintains aggregation state

TaskOutputOperator / PartitionedOutputOperator:
  sends pages onward

The driver coordinates the operators. During processing, it repeatedly pulls pages from one operator and pushes them into the next:

current.getOutput() -> Page
next.addInput(Page)

For the example query, a source pipeline might do this:

read a Page from the Iceberg/Parquet page source
  -> keep rows where o_orderdate >= DATE '1995-01-01'
  -> keep or compute the columns needed by the query
  -> send pages into an aggregation or exchange path

The data moves in batches, not as individual Java objects for each SQL row.

9. Page And Block

A Trino Page is an in-memory batch of rows in columnar form.

Conceptually, it looks like this:

Page
  positionCount = 3

  Block 0: o_orderkey
    [101, 102, 103]

  Block 1: o_orderstatus
    ["O", "F", "O"]

Each logical row is a position across blocks:

position 0 -> 101, "O"
position 1 -> 102, "F"
position 2 -> 103, "O"

Vocabulary:

Term	Meaning
Page	A batch of rows.
Block	One column’s values inside a page.
Position	Row index inside the page.
Channel	Column index inside the page.

This is why Trino is often described as processing columnar batches internally. Operators pass Page objects to each other, and each page contains one Block per output column.

Do not confuse a Trino page with a Parquet page. They are different things:

Trino Page:
  in-memory execution batch

Parquet page:
  encoded and usually compressed storage unit inside a Parquet column chunk

The names overlap, but the layers are different.

10. Connector Boundary

The connector boundary is where Trino stops being only a query engine and starts talking to an external system.

For the example table:

FROM iceberg.tpch.orders

Trino uses the iceberg catalog. The Iceberg connector is responsible for Iceberg-specific work such as:

finding the table
reading table metadata
choosing a snapshot
applying predicate and projection pushdown
planning splits
creating page sources for the table's data files, such as Parquet, ORC, or Avro

The engine still owns the general query pipeline:

parse SQL
analyze names and types
build and optimize plan
schedule work
run operators
exchange pages
produce final results

Keeping that boundary clear prevents a lot of confusion. If a filter is pushed into Iceberg, that is connector behavior. If a stage exchanges pages between workers, that is engine execution behavior.

11. Coordinator Scheduling vs Worker Execution

There are two scheduling layers:

coordinator scheduler:
  decides which workers receive tasks and splits

worker-local task executor:
  decides which local driver or split runner gets a thread next

The coordinator is not centrally stepping every operator one by one. It places work on workers and tracks progress. After that, each worker has local execution queues and runs its assigned work.

Inside a worker execution turn, a driver streams pages through its operator chain:

TableScan.getOutput() -> Page
FilterAndProject.addInput(Page)
FilterAndProject.getOutput() -> filtered Page
TaskOutput.addInput(filtered Page)

If the driver yields or blocks, the same live driver/operator objects keep their state. Already-produced pages may have moved into an output buffer, local exchange, or another stage.

12. What This Means For EXPLAIN

For EXPLAIN, the main thing to keep from this section is the small set of plan terms that connect back to the runtime model:

TableScan or ScanFilterProject:
  where data enters from a connector

Exchange / RemoteSource:
  where pages move between stages

Aggregation / Join / Sort:
  where operators transform pages

Fragment / Stage:
  how Trino splits the query into distributed pieces

Estimates:
  planner guesses, not runtime proof

EXPLAIN shows the planned shape. EXPLAIN ANALYZE runs the query and adds runtime measurements. Detailed plan reading belongs in the next note.

When I later run:

EXPLAIN
SELECT o_orderstatus, count(*)
FROM iceberg.tpch.orders
WHERE o_orderdate >= DATE '1995-01-01'
GROUP BY o_orderstatus;

I want to map each plan section back to the hierarchy:

Which connector is used?
Where is the scan?
Where are filters applied?
Where does aggregation happen?
Where do exchanges happen?
Which fragment produces the final result?

That exercise is the fastest way to turn the vocabulary into intuition.

13. Self-Check

Questions to check later:

What does the coordinator do before workers read data?
What is the difference between a stage and a task?
What is a split?
What is the relationship between a pipeline and a driver?
What does an operator consume and produce?
What is a Trino Page?
Why is a Trino page different from a Parquet page?
What work belongs to the connector instead of the core engine?
Why is catalog.schema.table important?
Why is EXPLAIN ANALYZE different from EXPLAIN?

14. References

Trino concepts: https://trino.io/docs/current/overview/concepts.html
Trino EXPLAIN: https://trino.io/docs/current/sql/explain.html
Trino SPI overview: https://trino.io/docs/current/develop/spi-overview.html