View persistent application user interfaces - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

View persistent application user interfaces

Starting with Amazon EMR version 5.25.0, you can connect to the persistent Spark History Server application details hosted off-cluster using the cluster Summary page or the Application user interfaces tab in the console. Tez UI and YARN timeline server persistent application interfaces are available starting with Amazon EMR version 5.30.1. One-click link access to persistent application history provides the following benefits:

  • You can quickly analyze and troubleshoot active jobs and job history without setting up a web proxy through an SSH connection.

  • You can access application history and relevant log files for active and terminated clusters. The logs are available for 30 days after the application ends.

Navigate to your cluster details in the console, and select the Applications tab. Select the application UI that you want once your cluster has launched. The application UI opens in a new browser tab. For more information, see Monitoring and instrumentation.

You can view YARN container logs through the links on the Spark history server, YARN timeline server, and Tez UI.

Note

To access YARN container logs from the Spark history server, YARN timeline server, and Tez UI, you must enable logging to Amazon S3 for your cluster. If you don't enable logging, the links to YARN container logs won't work.

Logs collection

To enable one-click access to persistent application user interfaces, Amazon EMR collects two types of logs:

  • Application event logs are collected into an EMR system bucket. The event logs are encrypted at rest using Server-Side Encryption with Amazon S3 Managed Keys (SSE-S3). If you use a private subnet for your cluster, make sure to include “arn:aws:s3:::prod.MyRegion.appinfo.src/*” in the resource list of the Amazon S3 policy for the private subnet. For more information, see Minimum Amazon S3 policy for private subnet.

  • YARN container logs are collected into an Amazon S3 bucket that you own. You must enable logging for your cluster to access YARN container logs. For more information, see Configure cluster logging and debugging.

If you need to disable this feature for privacy reasons, you can stop the daemon by using a bootstrap script when you create a cluster, as the following example demonstrates.

aws emr create-cluster --name "Stop Application UI Support" --release-label emr-7.0.0 \ --applications Name=Hadoop Name=Spark --ec2-attributes KeyName=<myEMRKeyPairName> \ --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=TASK,InstanceCount=1,InstanceType=m3.xlarge \ --use-default-roles --bootstrap-actions Path=s3://region.elasticmapreduce/bootstrap-actions/run-if,Args=["instance.isMaster=true","echo Stop Application UI | sudo tee /etc/apppusher/run-apppusher; sudo systemctl stop apppusher || exit 0"]

After you run this bootstrap script, Amazon EMR will not collect any Spark History Server or YARN timeline server event logs into the EMR system bucket. No application history information will be available on the Application user interfaces tab, and you will lose access to all application user interfaces from the console.

Large Spark event log files

In some cases, long-running Spark jobs, such as Spark streaming, and large jobs, such as Spark SQL queries, can generate large event logs. With large events logs, you can quickly use up disk space on compute instances and encounter OutOfMemory errors when you load Persistent UIs. To avoid these issues, we recommend that you turn on the Spark event log rolling and compaction feature. This feature is available on Amazon EMR versions emr-6.1.0 and later. For more details about rolling and compaction, see Applying compaction on rolling event log files in the Spark documentation.

To activate the Spark event log rolling and compaction feature, turn on the following Spark configuration settings.

  • spark.eventLog.rolling.enabled – Turns on event log rolling based on size. This setting is deactivated by default.

  • spark.eventLog.rolling.maxFileSize – When rolling is activated, specifies the maximum size of the event log file before it rolls over. The default is 128 MB.

  • spark.history.fs.eventLog.rolling.maxFilesToRetain – Specifies the maximum number of non-compacted event log files to retain. By default, all event log files are retained. Set to a lower number to compact older event logs. The lowest value is 1.

Note that compaction attempts to exclude events with outdated event log files, such as the following. If it does discard events, you no longer see them on the Spark History Server UI.

  • Events for finished jobs and related stage or task events.

  • Events for terminated executors.

  • Events for completed SQL inquiries, and related job, stage, and tasks events.

To launch a cluster with rolling and compaction enabled
  1. Create a spark-configuration.json file with the following configuration.

    [ { "Classification": "spark-defaults", "Properties": { "spark.eventLog.rolling.enabled": true, "spark.history.fs.eventLog.rolling.maxFilesToRetain": 1 } } ]
  2. Create your cluster with the Spark rolling compaction configuration as follows.

    aws emr create-cluster \ --release-label emr-6.6.0 \ --instance-type m4.large \ --instance-count 2 \ --use-default-roles \ --configurations file://spark-configuration.json

Considerations and limitations

One-click access to persistent application user interfaces currently has the following limitations.

  • There will be at least a two-minute delay when the application details show up on the Spark History Server UI.

  • This feature works only when the event log directory for the application is in HDFS. By default, Amazon EMR stores event logs in a directory of HDFS. If you change the default directory to a different file system, such as Amazon S3, this feature will not work.

  • This feature is currently not available for EMR clusters with multiple master nodes or for EMR clusters integrated with Amazon Lake Formation.

  • To enable one-click access to persistent application user interfaces, you must have permission to the DescribeCluster action for Amazon EMR. If you deny an IAM principal's permission to this action, it takes approximately five minutes for the permission change to propagate.

  • If you reconfigure applications in a running cluster, the application history will be not available through the application UI.

  • For each Amazon Web Services account, the default limit for active application UIs is 200.

  • You can access application UIs from the console in the US East (N. Virginia) Region, US West (N. California) Region, Canada (Central) Region, EU (Frankfurt, Ireland, London, Paris, Stockholm), Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo), South America (São Paulo), China (Beijing) operated by Sinnet, and China (Ningxia) operated by NWCD.