Improving performance for Amazon Glue for Apache Spark jobs

In order to improve Amazon Glue for Spark performance, you may consider updating certain performance related Amazon Glue and Spark parameters.

For more information about specific strategies for identifying bottlenecks through metrics and reducing their impact, see Best practices for performance tuning Amazon Glue for Apache Spark jobs on Amazon Prescriptive Guidance. This guide introduces you to key topics applicable to Apache Spark in all runtime environments, such as Spark architecture and Resilient Distributed Datasets. Using those topics, the guide guides you to implement specific performance tuning strategies, such as optimizing shuffles and parallelizing tasks.

You can identify bottlenecks by configuring Amazon Glue to show the Spark UI. For more information, see Monitoring jobs using the Apache Spark web UI.

Additionally, Amazon Glue provides performance features that may be applicable to the specific type of data store your job connects to. Reference information about performance parameters for data stores can be found in Connection types and options for ETL in Amazon Glue for Spark.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Improving Amazon Glue performance

Optimizing reads with pushdown