Monitoring Spark jobs - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Monitoring Spark jobs

So that you can monitor and troubleshoot failures, configure your interactive endpoints so that the jobs initiated with the endpoint can send log information to Amazon S3, Amazon CloudWatch Logs, or both. The following sections describe how to send Spark application logs to Amazon S3 for the Spark jobs that you launch with Amazon EMR on EKS interactive endpoints.

Configure IAM policy for Amazon S3 logs

Before your kernels can send log data to Amazon S3, the permissions policy for the job execution role must include the following permissions. Replace DOC-EXAMPLE-BUCKET-LOGGING with the name of your logging bucket.

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::DOC-EXAMPLE-BUCKET-LOGGING", "arn:aws:s3:::DOC-EXAMPLE-BUCKET-LOGGING/*", ] } ] }
Note

Amazon EMR on EKS can also create an S3 bucket. If an S3 bucket is not available, include the s3:CreateBucket permission in the IAM policy.

After you've given your execution role the permissions it needs to send logs to the S3 bucket, your log data is sent to the following Amazon S3 locations. This happens when s3MonitoringConfiguration is passed in the monitoringConfiguration section of a create-managed-endpoint request.

  • Driver logslogUri/virtual-cluster-id/endpoints/endpoint-id/containers/spark-application-id/spark-application-id-driver/(stderr.gz/stdout.gz)

  • Executor logslogUri/virtual-cluster-id/endpoints/endpoint-id/containers/spark-application-id/executor-pod-name-exec-<Number>/(stderr.gz/stdout.gz)

Note

Amazon EMR on EKS doesn't upload the endpoint logs to your S3 bucket.