Improving performance for Amazon Glue for Apache Spark jobs - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Improving performance for Amazon Glue for Apache Spark jobs

In order to improve Amazon Glue for Spark performance, you may consider updating certain performance related Amazon Glue and Spark parameters.

For more information about specific strategies for identifying bottlenecks through metrics and reducing their impact, see Best practices for performance tuning Amazon Glue for Apache Spark jobs on Amazon Prescriptive Guidance. This guide introduces you to key topics applicable to Apache Spark in all runtime environments, such as Spark architecture and Resilient Distributed Datasets. Using those topics, the guide guides you to implement specific performance tuning strategies, such as optimizing shuffles and parallelizing tasks.

You can identify bottlenecks by configuring Amazon Glue to show the Spark UI. For more information, see Monitoring jobs using the Apache Spark web UI.

Additionally, Amazon Glue provides performance features that may be applicable to the specific type of data store your job connects to. Reference information about performance parameters for data stores can be found in Connection types and options for ETL in Amazon Glue for Spark.