Enabling the Apache Spark web UI for Amazon Glue jobs - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Enabling the Apache Spark web UI for Amazon Glue jobs

You can use the Apache Spark web UI to monitor and debug Amazon Glue ETL jobs running on the Amazon Glue job system. You can configure the Spark UI using the Amazon Glue console or the Amazon Command Line Interface (Amazon CLI).

Every 30 seconds, Amazon Glue backs up the Spark event logs to the Amazon S3 path that you specify.

Configuring the Spark UI (console)

Follow these steps to configure the Spark UI by using the Amazon Web Services Management Console. When creating an Amazon Glue job, Spark UI is enabled by default.

To turn on the Spark UI when you create or edit a job
  1. Sign in to the Amazon Web Services Management Console and open the Amazon Glue console at https://console.amazonaws.cn/glue/.

  2. In the navigation pane, choose Jobs.

  3. Choose Add job, or select an existing one.

  4. In Job details, open the Advanced properties.

  5. Under the Spark UI tab, choose Write Spark UI logs to Amazon S3.

  6. Specify an Amazon S3 path for storing the Spark event logs for the job. Note that if you use a security configuration in the job, the encryption also applies to the Spark UI log file. For more information, see Encrypting data written by Amazon Glue.

  7. Under Spark UI logging and monitoring configuration:

    • Select Standard if you are generating logs to view in the Amazon Glue console.

    • Select Legacy if you are generating logs to view on a Spark history server.

    • You can also choose to generate both.

Configuring the Spark UI (Amazon CLI)

To generate logs for viewing with Spark UI, in the Amazon Glue console, use the Amazon CLI to pass the following job parameters to Amazon Glue jobs. For more information, see Amazon Glue job parameters.

'--enable-spark-ui': 'true', '--spark-event-logs-path': 's3://s3-event-log-path'

To distribute logs to their legacy locations, set the --enable-spark-ui-legacy-path parameter to "true". If you do not want to generate logs in both formats, remove the --enable-spark-ui parameter.

Configuring the Spark UI for sessions using Notebooks

Warning

Amazon Glue interactive sessions do not currently support Spark UI in the console. Configure a Spark history server.

If you use Amazon Glue notebooks, set up SparkUI config before starting the session. To do this, use the %%configure cell magic:

%%configure { “--enable-spark-ui”: “true”, “--spark-event-logs-path”: “s3://path” }