Fault-tolerant execution in Trino
Fault-tolerant execution is a mechanism in Trino that a cluster can use to mitigate query failures. To do this, it retries queries or their component tasks when they fail. When fault-tolerant execution is activated, intermediate exchange data is spooled, and another worker can reuse it in the event of a worker outage or other fault during query execution.
For more information on fault-tolerant execution in Trino, see Project Tardigrade
delivers ETL at Trino speeds to early users
Configuration
Fault-tolerant execution is deactivated by default. To activate the feature, set
the retry-policy
configuration property in the
trino-config
classification to either QUERY
or
TASK
based on the desired retry policy, as follows.
{"classification": "trino-config", "properties": { "retry-policy": "
QUERY
" } }
A QUERY
retry policy instructs Trino to retry a
query automatically when an error occurs on a worker node. We recommend that you use
a QUERY
retry policy when the majority of the workload for the Trino
cluster comprises many small queries.
A TASK
retry policy instructs Trino to retry
individual query tasks in the event of failure. We recommend this policy when Trino
executes large batch queries. The cluster can more efficiently retry smaller tasks
within the query rather than retry the whole query.
Exchange manager
An exchange manager stores and manages spooled data for fault-tolerant execution. It uses external storage to store spilled data beyond the in-memory buffer size. You can configure a file system-based exchange manager that stores spooled data in a specified location, such as Amazon S3, Amazon S3 compatible systems, or HDFS.
Amazon EMR releases 6.9.0 and later include the trino-exchange-manager
classification to configure the exchange manager. These releases also support HDFS
for spooling.
Setting up exchange manager
Use the trino-exchange-manager
configuration classification to
configure an exchange manager. This classification internally creates a
etc/exchange-manager.properties
configuration file on the
coordinator and all worker nodes. The classification also sets
exchange-manager.name
configuration property to
filesystem
.
By default, Amazon EMR releases 6.9.0 and later use HDFS as an exchange manager. HDFS
is available in the Amazon EMR EC2 clusters, and spooling occurs in the
trino-exchange/
directory by default. To use the default settings,
set the following configuration:
{"Classification": "trino-exchange-manager" }
If you want to provide a custom location, set the following properties in the
trino-exchange-manager
classification:
-
Set
exchange.use-local-hdfs
totrue
. -
Set
exchange.base-directories
to the custom directory location in HDFS, for example,exchange.base-directories=/exchange
. If the custom directory isn't already in HDFS, Amazon EMR will create it.
HDFS exchange manager configurations
Based on internal testing results, we recommend that you spool to local HDFS for better query performance in comparison to other cloud-based file systems. You can set the following configurations for the exchange manager with HDFS.
Configuration | Description | Default setting |
---|---|---|
|
Block size for HDFS storage |
4 MB |
|
List of file paths to configure HDFS |
If |
For additional fault-tolerant execution configuration properties, and for
information on how to set up Amazon S3 or other Amazon S3 compatible systems for spooling, see
the Fault-tolerant execution
Considerations and limitations
-
If you enable fault-tolerant execution, it disables
write
operations for connectors that do not supportwrite
whenretry-policy
is set. As of Amazon EMR release 6.9.0, Delta Lake, Hive and Iceberg connectors supportwrite
operations withretry-policy
. -
If you use exchange manager and perform expensive I/O operations, your queries may experience degraded performance while exchange manager spools the intermediate data to external storage.