Enabling continuous logging for Amazon Glue jobs - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Enabling continuous logging for Amazon Glue jobs

You can enable continuous logging using the Amazon Glue console or through the Amazon Command Line Interface (Amazon CLI).

You can enable continuous logging with either a standard filter or no filter when you create a new job, edit an existing job, or enable it through the Amazon CLI. Choosing the Standard filter prunes out non-useful Apache Spark driver/executor and Apache Hadoop YARN heartbeat log messages. Choosing No filter gives you all the log messages.

You can also specify custom configuration options such as the Amazon CloudWatch log group name, CloudWatch log stream prefix before the Amazon Glue job run ID driver/executor ID, and log conversion pattern for log messages. These configurations help you to set aggregate logs in custom CloudWatch log groups with different expiration policies, and analyze them further with custom log stream prefixes and conversions patterns.

Using the Amazon Web Services Management Console

Follow these steps to use the console to enable continuous logging when creating or editing an Amazon Glue job.

To create a new Amazon Glue job with continuous logging

  1. Sign in to the Amazon Web Services Management Console and open the Amazon Glue console at https://console.amazonaws.cn/glue/.

  2. In the navigation pane, choose Jobs.

  3. Choose Add job.

  4. In Configure the job properties, expand the Monitoring options section.

  5. Select Continuous logging to use it for this job.

  6. Under Log filtering, choose Standard filter or No filter.

To enable continuous logging for an existing Amazon Glue job

  1. Open the Amazon Glue console at https://console.amazonaws.cn/glue/.

  2. In the navigation pane, choose Jobs.

  3. Choose an existing job from the Jobs list.

  4. Choose Action, Edit job.

  5. Expand the Monitoring options section.

  6. Select Continuous logging to use it for this job.

  7. Under Log filtering, choose Standard filter or No filter.

To enable continuous logging for all newly created Amazon Glue jobs

  1. Open the Amazon Glue console at https://console.amazonaws.cn/glue/.

  2. In the navigation pane, choose Jobs.

  3. In the upper-right corner, choose User preferences.

  4. Under the heading Monitoring options, choose Continuous logging.

  5. Under Log filtering, choose Standard filter or No filter.

These user preferences are applied to all new jobs unless you override them explicitly when creating an Amazon Glue job or by editing an existing job as described previously.

Using the Amazon CLI

To enable continuous logging, you pass in job parameters to an Amazon Glue job. When you want to use the standard filter, pass the following special job parameters similar to other Amazon Glue job parameters. For more information, see Job parameters used by Amazon Glue.

'--enable-continuous-cloudwatch-log': 'true'

When you want no filter, use the following.

'--enable-continuous-cloudwatch-log': 'true', '--enable-continuous-log-filter': 'false'

You can specify a custom Amazon CloudWatch log group name. If not specified, the default log group name is /aws-glue/jobs/logs-v2/.

'--continuous-log-logGroup': 'custom_log_group_name'

You can specify a custom Amazon CloudWatch log stream prefix. If not specified, the default log stream prefix is the job run ID.

'--continuous-log-logStreamPrefix': 'custom_log_stream_prefix'

You can specify a custom continuous logging conversion pattern. If not specified, the default conversion pattern is %d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n. Note that the conversion pattern only applies to driver logs and executor logs. It does not affect the Amazon Glue progress bar.

'--continuous-log-conversionPattern': 'custom_log_conversion_pattern'

Logging application-specific messages using the custom script logger

You can use the Amazon Glue logger to log any application-specific messages in the script that are sent in real time to the driver log stream.

The following example shows a Python script.

from awsglue.context import GlueContext from pyspark.context import SparkContext sc = SparkContext() glueContext = GlueContext(sc) logger = glueContext.get_logger() logger.info("info message") logger.warn("warn message") logger.error("error message")

The following example shows a Scala script.

import com.amazonaws.services.glue.log.GlueLogger object GlueApp { def main(sysArgs: Array[String]) { val logger = new GlueLogger logger.info("info message") logger.warn("warn message") logger.error("error message") } }

Enabling the progress bar to show job progress

Amazon Glue provides a real-time progress bar under the JOB_RUN_ID-progress-bar log stream to check Amazon Glue job run status. Currently it supports only jobs that initialize glueContext. If you run a pure Spark job without initializing glueContext, the Amazon Glue progress bar does not appear.

The progress bar shows the following progress update every 5 seconds.

Stage Number (Stage Name): > (numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage]

Security configuration with continuous logging

If a security configuration is enabled for CloudWatch logs, Amazon Glue will create a log group named as follows for continuous logs:

<Log-Group-Name>-<Security-Configuration-Name>

The default and custom log groups will be as follows:

  • The default continuous log group will be /aws-glue/jobs/logs-v2-<Security-Configuration-Name>

  • The custom continuous log group will be <custom-log-group-name>-<Security-Configuration-Name>

You need to add the logs:AssociateKmsKey to your IAM role permissions, if you enable a security configuration with CloudWatch Logs. If that permission is not included, continuous logging will be disabled. Also, to configure the encryption for the CloudWatch Logs, follow the instructions at Encrypt Log Data in CloudWatch Logs Using Amazon Key Management Service in the Amazon CloudWatch Logs User Guide.

For more information on creating security configurations, see Working with security configurations on the Amazon Glueconsole.