How Trino Pushes Predicates Into Iceberg

Ziva Li included in series Learning Trino by Tracing One Iceberg Table From SQL to Files

2026-06-21 2352 words 11 minutes

Series - Learning Trino by Tracing One Iceberg Table From SQL to Files

Contents

What Predicate Pushdown Means In Trino

Predicate pushdown means Trino tries to move part of a WHERE filter closer to the data source. Instead of reading all rows first and filtering only inside the engine, Trino asks the connector whether it can use the predicate while planning or reading the table.

That can reduce work, but it is not one yes/no switch.

The useful mental model is:

pushed down:
  the connector can use the predicate during scan planning or reading

enforced:
  the connector guarantees rows returned by the scan satisfy that predicate

remaining:
  Trino still evaluates the predicate above the scan for correctness

So “pushed down” does not mean “the engine no longer checks it.”

1. The Setup

The table is the same Iceberg table from the read trace:

CREATE TABLE iceberg.tpch.orders
WITH (
    format = 'PARQUET',
    partitioning = ARRAY['o_orderstatus']
) AS
SELECT *
FROM tpch.tiny.orders;

The important part is the partitioning:

partitioning = ARRAY['o_orderstatus']

That makes o_orderstatus an identity partition column. A predicate on that column can be enforced from Iceberg partition metadata.

The first query is:

SELECT *
FROM iceberg.tpch.orders
WHERE o_orderstatus = 'F';

The second query adds a regular data-column predicate:

SELECT o_orderkey, o_totalprice
FROM iceberg.tpch.orders
WHERE o_orderstatus = 'F'
  AND o_totalprice > 1000;

This second query is the useful one for understanding pushdown. It has two predicate shapes:

Predicate	Kind	Expected behavior
`o_orderstatus = 'F'`	identity partition predicate	Iceberg can enforce it from partition metadata.
`o_totalprice > 1000`	regular data-column predicate	Iceberg can use it for pruning, but Trino still needs a remaining filter.

2. Plan Evidence For The Partition Predicate

For the simple partition query, EXPLAIN shows the predicate attached to the scan:

TableScan[table = iceberg:tpch.orders$data@... constraint on [o_orderstatus]]
    o_orderstatus := 3:o_orderstatus:varchar
        :: [[F]]

This proves:

The plan contains an Iceberg table scan with a constraint on o_orderstatus.

It does not prove how many bytes were read. It does not prove row-group pruning. It does not prove page-level filtering. It only proves the planned scan shape.

EXPLAIN ANALYZE adds runtime evidence:

TableScan[table = iceberg:tpch.orders$data@... constraint on [o_orderstatus]]
    Output: 7304 rows
    Input: 7304 rows (1012.76kB)
    Physical input: 169.66kB
    Splits: 1

This proves that, in this run, the scan returned 7304 rows and executed one split with 169.66kB physical input.

The metadata tables match that shape:

orders$partitions:
  {o_orderstatus=F}
  record_count = 7304
  file_count = 1

orders$files where partition.o_orderstatus = 'F':
  one PARQUET file
  record_count = 7304

So for this table, o_orderstatus = 'F' selects one Iceberg partition and one data file.

3. What Each Evidence Type Proves

Before going deeper into the code path, it helps to separate what each piece of evidence can and cannot prove:

Evidence	What it proves	What it does not prove
`EXPLAIN`	Planned scan shape and visible scan constraints.	Runtime rows, bytes read, or exact pruning done during execution.
`EXPLAIN ANALYZE`	Runtime rows, physical input, split count, and operator stats for that run.	Which Java branch classified each predicate.
`$partitions`	Partition metadata, record counts, file counts, and partition-level stats.	Parquet row-group reads inside a selected file.
`$files`	Which Iceberg data files match metadata filters.	Which Parquet pages were read from that file.

4. Where Predicate Pushdown Happens

Predicate pushdown is not one method call. It happens in layers:

Layer	Trino component	Where it happens	What happens
Planner optimization	Logical planner / iterative optimizer	Coordinator: `PushPredicateIntoTableScan`	Trino turns the `WHERE` expression into a connector `Constraint` and asks the connector to apply it.
Connector metadata	`ConnectorMetadata` implementation in the Iceberg connector	Coordinator: `IcebergMetadata.applyFilter(...)`	Iceberg classifies predicate domains into enforced, unenforced, unsupported, and remaining pieces.
Split planning	`ConnectorSplitManager` / `ConnectorSplitSource` in the Iceberg connector	Coordinator: `IcebergSplitManager` and `IcebergSplitSource`	Iceberg uses the enforced and unenforced scan state to plan matching files and create `IcebergSplit` objects.
File-format reader setup	`ConnectorPageSourceProvider` in the Iceberg connector	Worker: `IcebergPageSourceProvider`	The worker refines the unenforced predicate with split statistics and dynamic filters before opening the file reader.
Engine filtering	Worker execution operators	Worker: filter above the scan, such as `ScanFilterAndProjectOperator`	Trino evaluates the remaining predicate for correctness.

The full flow is:

SQL WHERE predicate
  -> coordinator planner extracts a TupleDomain
  -> coordinator planner asks ConnectorMetadata.applyFilter(...)
  -> coordinator IcebergMetadata.applyFilter(...) classifies the predicate
  -> new IcebergTableHandle carries enforced and unenforced predicates
  -> remaining predicate stays above the TableScan
  -> coordinator IcebergSplitSource uses the scan predicates to plan files
  -> IcebergSplit carries file path, partition values, and file stats
  -> worker IcebergPageSourceProvider refines the predicate for one split
  -> Parquet or ORC reader may prune row groups or pages
  -> worker Trino operator evaluates any remaining filter

So the first pushdown decision is coordinator-side planning. The later pruning work also uses pushed predicate information, but it happens when Trino plans Iceberg splits and when a worker builds the page source for a concrete split.

5. Where Pushdown Enters The Planner

The planner rule is:

PushPredicateIntoTableScan

The shape is:

SQL WHERE predicate
  -> DomainTranslator extracts a TupleDomain
  -> PushPredicateIntoTableScan builds a Constraint
  -> MetadataManager.applyFilter(...)
  -> ConnectorMetadata.applyFilter(...)
  -> IcebergMetadata.applyFilter(...)

In Trino source, the key planner step is:

core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/PushPredicateIntoTableScan.java

The rule extracts a tuple domain from the deterministic predicate, maps symbols to connector column handles, builds a Constraint, and calls:

plannerContext.getMetadata().applyFilter(session, node.getTable(), constraint)

The connector then returns a ConstraintApplicationResult. The planner reads:

new table handle
remaining filter
remaining connector expression

That is the first sign that pushdown is negotiated. The engine asks the connector what it can use. The connector answers with both new scan state and the part Trino must keep.

6. How Iceberg Classifies The Predicate

Inside Iceberg, the important method is:

IcebergMetadata.applyFilter(...)

For each predicate domain, Iceberg classifies it into one of these buckets:

newEnforcedConstraint:
  Iceberg can guarantee this predicate from metadata or partition knowledge

newUnenforcedConstraint:
  Iceberg can use this predicate in scan planning or reader setup, but it is not
  a correctness guarantee

remainingConstraint:
  Trino must still evaluate this predicate above the scan

The mixed query:

WHERE o_orderstatus = 'F'
  AND o_totalprice > 1000

should split like this:

Predicate piece	Iceberg bucket	Why
`o_orderstatus = 'F'`	`newEnforcedConstraint`	`o_orderstatus` is an identity partition column.
`o_totalprice > 1000`	`newUnenforcedConstraint`	Iceberg can push it into scan planning, but this regular data-column predicate is not classified as connector-enforced filtering.
`o_totalprice > 1000`	`remainingConstraint`	Trino still applies it above the scan for correctness.

The code path matches that mental model:

if the domain is not convertible:
  unsupported

else if Iceberg can enforce it with the partition spec:
  newEnforced

else if it is an enforceable metadata column:
  newEnforced

else:
  newUnenforced

Here unsupported does not mean the SQL query is unsupported. It means Trino has a predicate domain, but Iceberg cannot safely translate that domain into an Iceberg filter expression for scan planning. Examples include structural types such as arrays, maps, and rows, geospatial types, and most UUID or variant comparisons. Those predicates stay in the remaining filter so Trino can evaluate them after the scan.

Then Iceberg returns a new IcebergTableHandle with:

newUnenforcedConstraint
newEnforcedConstraint
newConstraintColumns

and returns remainingConstraint separately to the engine.

7. Why The Partition Predicate Is Enforced

The helper to remember is:

canEnforceColumnConstraintInSpecs(...)

It checks whether a column predicate can be enforced by the Iceberg partition specs for the selected snapshot.

The simplest case is identity partitioning:

if the partition field transform is identity:
  a predicate on that column can always be enforced

That is why o_orderstatus = 'F' is strong. If the table is identity partitioned by o_orderstatus, a file in the F partition already has that partition value. Iceberg can remove files from other partitions before Trino workers read them.

This is metadata-level pruning:

Iceberg metadata:
  partition values
  manifest entries
  data file records

Result:
  non-F partitions do not become scan work

8. Why The Price Predicate Is Not Fully Enforced

o_totalprice is a table column inside the data file. It is not the identity partition column in this setup.

That means this predicate:

o_totalprice > 1000

can still be useful, but it is a weaker kind of pushdown.

Iceberg can use file statistics and scan planning to skip work when metadata proves a file cannot match. This happens in a few steps.

First, the split source builds an effective predicate from:

data column predicate
tableHandle.getUnenforcedPredicate()
pushed-down dynamic filter predicate

A dynamic filter is a runtime predicate that Trino learns while the query is already running. The common case is a join: if the build side produces only status = F, Trino can use that runtime fact as a dynamic filter on the probe side scan, such as o_orderstatus IN ('F'). In this single-table example, the dynamic filter does not add much; in a join, it can give IcebergSplitSource and IcebergPageSourceProvider extra values for file and reader pruning.

For this example, the useful part is still:

o_totalprice > 1000

That predicate is not enforced by Iceberg, but it can still be used as a file selection hint. The split source converts the effective predicate to an Iceberg expression and gives it to Iceberg’s table scan:

toIcebergExpression(effectivePredicate)
scan.planFiles()

At this point, Iceberg is still working with metadata. It can look at manifest entries and data-file statistics before any worker opens a Parquet file. If a file says:

o_totalprice max = 900

then that file cannot contain rows where:

o_totalprice > 1000

so it does not need to become scan work for this predicate.

Second, if predicated columns exist, the scan includes column stats so each planned split can carry file-level statistics:

fileStatisticsDomain

For a selected file, that domain is built from metadata such as:

lower_bounds
upper_bounds
null_value_counts

So a split might carry a rough fact like:

this file has o_totalprice values between 800 and 1500

That does not prove every row matches o_totalprice > 1000. It only says this file may contain matching rows.

Third, when a worker opens the split, the page source can intersect:

unenforced predicate
fileStatisticsDomain
dynamic filter

This is a second chance to refine the read for one concrete split. The dynamic filter may be more selective by then, and the file statistics are already attached to the split. If the intersection is empty, the page source can return no rows for that split. If the intersection is not empty, the refined predicate can still help the Parquet reader with row-group or page pruning.

This is useful, but it is still not the same as the connector returning the predicate as enforced. File statistics can often prove:

no rows in this file can match

They cannot always prove:

every row returned by this scan satisfies the predicate

So Trino keeps the remaining filter. A later reader or split-level check may discover that a specific file or row group is fully inside the predicate, but that is separate from Iceberg classifying the original predicate as enforced during applyFilter(...).

9. The Row-Group Min/Max Version

Parquet adds another pruning layer after Iceberg has already selected data files.

A simple row-group example:

predicate:
  o_totalprice > 1000

row group 1:
  min = 100
  max = 900
  skip, because max <= 1000

row group 2:
  min = 800
  max = 1500
  read, because some rows may match

row group 3:
  min = 1200
  max = 2000
  read, because the min/max range is inside this predicate

The important case for the remaining-filter mental model is row group 2. Its stats cannot prove the final answer. It may contain rows like:

The row group is worth reading, but the filter still has to remove the rows below or equal to 1000.

That is why I need to keep these layers separate:

Iceberg partition/file pruning:
  happens before opening selected Parquet files

Parquet row-group/page pruning:
  happens inside selected Parquet files

Trino remaining filter:
  happens on decoded Trino Page objects for correctness

This is also where the word “page” can mislead:

Parquet page:
  encoded and compressed storage unit inside a Parquet column chunk

Trino Page:
  in-memory batch of rows in columnar Block form

A pushed predicate may help the Parquet reader avoid some storage pages or row groups. The remaining filter is applied to Trino pages after the connector has decoded data into engine batches.

10. What To Remember

Predicate pushdown is a negotiation between the engine and the connector.
PushPredicateIntoTableScan asks the connector to apply a filter.
IcebergMetadata.applyFilter(...) classifies predicate domains into enforced, unenforced, and remaining buckets.
Identity partition predicates, such as o_orderstatus = 'F' in this table, can be enforced by Iceberg partition metadata.
Regular data-column predicates, such as o_totalprice > 1000, can still help prune files, row groups, or pages, but they may remain as Trino filters.
Dynamic filters are runtime predicates, usually learned from joins, and can refine split planning or page-source setup.
Iceberg metadata pruning and Parquet row-group/page pruning are different layers.
A Parquet page is a storage-format unit. A Trino Page is an in-memory execution batch.
“Pushed down” does not automatically mean “fully enforced.”

11. Self-Check

Questions to answer without looking back:

What is the difference between pushed down and enforced?
Why can Iceberg enforce o_orderstatus = 'F' for this table?
Why does o_totalprice > 1000 remain a Trino filter?
What does EXPLAIN prove for a scan constraint?
What does EXPLAIN ANALYZE add?
When does a dynamic filter become useful?
What is the difference between Iceberg metadata pruning and Parquet row-group pruning?
Why is a Parquet page different from a Trino Page?

12. Source Anchors For Debugging

These are the Trino source points I would use to verify the trace with breakpoints. The links are pinned to the Trino commit used for this note: f865b4a444eacf871de4d1fefceedf292c7f6cc6.

Boundary	Trino source anchor	What to inspect
Planner extracts a tuple domain	PushPredicateIntoTableScan.java#L161	`decomposedPredicate`, `newDomain`, and symbol-to-column-handle mapping.
Planner asks connector to apply the filter	PushPredicateIntoTableScan.java#L226	`Constraint` passed into metadata and returned `ConstraintApplicationResult`.
Planner keeps remaining filter	PushPredicateIntoTableScan.java#L240	`remainingFilter` from the connector response.
Iceberg classifies domains	IcebergMetadata.java#L3590	`predicate`, `newEnforcedConstraint`, `newUnenforcedConstraint`, and `remainingConstraint`.
Iceberg checks partition enforcement	IcebergUtil.java#L621	Whether the column constraint can be enforced by the active Iceberg partition specs.
Identity partition is enforceable	IcebergUtil.java#L655	Identity partition transform returns enforceable.
Split planning uses the unenforced predicate	IcebergSplitSource.java#L274	`effectivePredicate`, `toIcebergExpression(...)`, and `scan.planFiles()`.
Split carries file statistics	IcebergSplitSource.java#L613	`fileStatisticsDomain` built from lower bounds, upper bounds, and null counts.
Page source refines reader predicate	IcebergPageSourceProvider.java#L467	Intersection of unenforced predicate, file statistics, and dynamic filter.

13. References

Trino source code: https://github.com/trinodb/trino
Trino EXPLAIN: https://trino.io/docs/current/sql/explain.html
Trino EXPLAIN ANALYZE: https://trino.io/docs/current/sql/explain-analyze.html