Logging Amazon EMR API calls using Amazon CloudTrail - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Logging Amazon EMR API calls using Amazon CloudTrail

Amazon EMR is integrated with Amazon CloudTrail, a service that provides a record of actions taken by a user, role, or an Amazon Web Services service. CloudTrail captures all API calls for Amazon EMR as events. The calls captured include calls from the Amazon EMR console and code calls to the Amazon EMR API operations. Using the information collected by CloudTrail, you can determine the request that was made to Amazon EMR, the IP address from which the request was made, when it was made, and additional details.

Every event or log entry contains information about who generated the request. The identity information helps you determine the following:

  • Whether the request was made with root user or user credentials.

  • Whether the request was made on behalf of an IAM Identity Center user.

  • Whether the request was made with temporary security credentials for a role or federated user.

  • Whether the request was made by another Amazon Web Services service.

CloudTrail is active in your Amazon Web Services account when you create the account and you automatically have access to the CloudTrail Event history. The CloudTrail Event history provides a viewable, searchable, downloadable, and immutable record of the past 90 days of recorded management events in an Amazon Web Services Region. For more information, see Working with CloudTrail Event history in the Amazon CloudTrail User Guide. There are no CloudTrail charges for viewing the Event history.

For an ongoing record of events in your Amazon Web Services account past 90 days, create a trail or a CloudTrail Lake event data store.

CloudTrail trails

A trail enables CloudTrail to deliver log files to an Amazon S3 bucket. All trails created using the Amazon Web Services Management Console are multi-Region. You can create a single-Region or a multi-Region trail by using the Amazon CLI. Creating a multi-Region trail is recommended because you capture activity in all Amazon Web Services Regions in your account. If you create a single-Region trail, you can view only the events logged in the trail's Amazon Web Services Region. For more information about trails, see Creating a trail for your Amazon Web Services account and Creating a trail for an organization in the Amazon CloudTrail User Guide.

You can deliver one copy of your ongoing management events to your Amazon S3 bucket at no charge from CloudTrail by creating a trail, however, there are Amazon S3 storage charges. For more information about CloudTrail pricing, see Amazon CloudTrail Pricing. For information about Amazon S3 pricing, see Amazon S3 Pricing.

CloudTrail Lake event data stores

CloudTrail Lake lets you run SQL-based queries on your events. CloudTrail Lake converts existing events in row-based JSON format to Apache ORC format. ORC is a columnar storage format that is optimized for fast retrieval of data. Events are aggregated into event data stores, which are immutable collections of events based on criteria that you select by applying advanced event selectors. The selectors that you apply to an event data store control which events persist and are available for you to query. For more information about CloudTrail Lake, see Working with Amazon CloudTrail Lake in the Amazon CloudTrail User Guide.

CloudTrail Lake event data stores and queries incur costs. When you create an event data store, you choose the pricing option you want to use for the event data store. The pricing option determines the cost for ingesting and storing events, and the default and maximum retention period for the event data store. For more information about CloudTrail pricing, see Amazon CloudTrail Pricing.

Amazon EMR data events in CloudTrail

Data events provide information about the resource operations performed on or in a resource (for example, reading or writing to an Amazon S3 object). These are also known as data plane operations. Data events are often high-volume activities. By default, CloudTrail doesn’t log data events. The CloudTrail Event history doesn't record data events.

Additional charges apply for data events. For more information about CloudTrail pricing, see Amazon CloudTrail Pricing.

You can log data events for the Amazon EMR resource types by using the CloudTrail console, Amazon CLI, or CloudTrail API operations. For more information about how to log data events, see Logging data events with the Amazon Web Services Management Console and Logging data events with the Amazon Command Line Interface in the Amazon CloudTrail User Guide.

The following table lists the Amazon EMR resource types for which you can log data events. The Data event type (console) column shows the value to choose from the Data event type list on the CloudTrail console. The resources.type value column shows the resources.type value, which you would specify when configuring advanced event selectors using the Amazon CLI or CloudTrail APIs. The Data APIs logged to CloudTrail column shows the API calls logged to CloudTrail for the resource type.

For more information about these API operations, see Amazon EMR WAL (EMRWAL) CLI reference. Amazon EMR logs some Data API operations to CloudTrail that are HBase system operations that you never call directly. These operations aren't in the EMRWAL CLI reference.

Data event type (console) resources.type value Data APIs logged to CloudTrail
Amazon EMR write-ahead log workspace AWS::EMRWAL::Workspace
  • GetCurrentWALTime

  • ListTagsForResource

  • ListWALs

  • ListWorkspaces

  • TrimWAL

  • CompleteWALFlush

You can configure advanced event selectors to filter on the eventName, readOnly, and resources.ARN fields to log only those events that are important to you. For more information about these fields, see AdvancedFieldSelector in the Amazon CloudTrail API Reference.

Amazon EMR management events in CloudTrail

Management events provide information about management operations that are performed on resources in your Amazon Web Services account. These are also known as control plane operations. By default, CloudTrail logs management events.

Amazon EMR logs all Amazon EMR control plane operations as management events. For a list of the Amazon EMR control plane operations that Amazon EMR logs to CloudTrail, see the Amazon EMR API Reference.

Amazon EMR event examples

An event represents a single request from any source and includes information about the requested API operation, the date and time of the operation, request parameters, and so on. CloudTrail log files aren't an ordered stack trace of the public API calls, so events don't appear in any specific order.

The following example shows a CloudTrail log entry that demonstrates the RunJobFlow action.

{ "Records": [ { "eventVersion":"1.01", "userIdentity":{ "type":"IAMUser", "principalId":"EX_PRINCIPAL_ID", "arn":"arn:aws:iam::123456789012:user/temporary-user-xx-7M", "accountId":"123456789012", "userName":"temporary-user-xx-7M" }, "eventTime":"2018-03-31T17:59:21Z", "eventSource":"elasticmapreduce.amazonaws.com", "eventName":"RunJobFlow", "awsRegion":"us-west-2", "sourceIPAddress":"192.0.2.1", "userAgent":"aws-sdk-java/unknown-version Linux/xx Java_HotSpot(TM)_64-Bit_Server_VM/xx", "requestParameters":{ "tags":[ { "value":"prod", "key":"domain" }, { "value":"us-west-2", "key":"realm" }, { "value":"VERIFICATION", "key":"executionType" } ], "instances":{ "slaveInstanceType":"m5.xlarge", "ec2KeyName":"emr-integtest", "instanceCount":1, "masterInstanceType":"m5.xlarge", "keepJobFlowAliveWhenNoSteps":true, "terminationProtected":false }, "visibleToAllUsers":false, "name":"MyCluster", "ReleaseLabel":"emr-5.16.0" }, "responseElements":{ "jobFlowId":"j-2WDJCGEG4E6AJ" }, "requestID":"2f482daf-b8fe-11e3-89e7-75a3d0e071c5", "eventID":"b348a38d-f744-4097-8b2a-e68c9b424698" }, ...additional entries ] }

For information about CloudTrail record contents, see CloudTrail record contents in the Amazon CloudTrail User Guide.