Using Spark event log rotation - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Spark event log rotation

With Amazon EMR 6.3.0 and later, you can turn on the Spark event log rotation feature for Amazon EMR on EKS. Instead of generating a single event log file, this feature rotates the file based on your configured time interval and removes the oldest event log files.

Rotating Spark event logs can help you avoid potential issues with a large Spark event log file generated for long running or streaming jobs. For example, you start a long running Spark job with an event log enabled with the persistentAppUI parameter. The Spark driver generates an event log file. If the job runs for hours or days and there is a limited disk space on the Kubernetes node, the event log file can consume all available disk space. Turning on the Spark event log rotation feature solves the problem by splitting the log file into multiple files and removing the oldest files.

Note

This feature only works with Amazon EMR on EKS. Amazon EMR running on Amazon EC2 doesn't support Spark event log rotation.

To turn on the Spark event log rotation feature, configure the following Spark parameters:

  • spark.eventLog.rotation.enabled ‐ turns on log rotation. It is disabled by default in the Spark configuration file. Set it to true to turn on this feature.

  • spark.eventLog.rotation.interval ‐ specifies time interval for the log rotation. The minimum value is 60 seconds. The default value is 300 seconds.

  • spark.eventLog.rotation.minFileSize ‐ specifies a minimum file size to rotate the log file. The minimum and default value is 1 MB.

  • spark.eventLog.rotation.maxFilesToRetain ‐ specifies how many rotated log files to keep during cleanup. The valid range is 1 to 10. The default value is 2.

You can specify these parameters in the sparkSubmitParameters section of the StartJobRun API, as the following example shows.

"sparkSubmitParameters": "--class org.apache.spark.examples.SparkPi --conf spark.eventLog.rotation.enabled=true --conf spark.eventLog.rotation.interval=300 --conf spark.eventLog.rotation.minFileSize=1m --conf spark.eventLog.rotation.maxFilesToRetain=2"