Example CloudTrail log queries - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Example CloudTrail log queries

The following example shows a portion of a query that returns all anonymous (unsigned) requests from the table created for CloudTrail event logs. This query selects those requests where useridentity.accountid is anonymous, and useridentity.arn is not specified:

SELECT * FROM cloudtrail_logs WHERE eventsource = 's3.amazonaws.com' AND eventname in ('GetObject') AND useridentity.accountid = 'anonymous' AND useridentity.arn IS NULL AND requestparameters LIKE '%[your bucket name ]%';

For more information, see the Amazon Big Data blog post Analyze security, compliance, and operational activity using Amazon CloudTrail and Amazon Athena.

Query nested fields in CloudTrail logs

Because the userIdentity and resources fields are nested data types, querying them requires special treatment.

The userIdentity object consists of nested STRUCT types. These can be queried using a dot to separate the fields, as in the following example:

SELECT eventsource, eventname, useridentity.sessioncontext.attributes.creationdate, useridentity.sessioncontext.sessionissuer.arn FROM cloudtrail_logs WHERE useridentity.sessioncontext.sessionissuer.arn IS NOT NULL ORDER BY eventsource, eventname LIMIT 10

The resources field is an array of STRUCT objects. For these arrays, use CROSS JOIN UNNEST to unnest the array so that you can query its objects.

The following example returns all rows where the resource ARN ends in example/datafile.txt. For readability, the replace function removes the initial arn:aws:s3::: substring from the ARN.

SELECT awsregion, replace(unnested.resources_entry.ARN,'arn:aws:s3:::') as s3_resource, eventname, eventtime, useragent FROM cloudtrail_logs t CROSS JOIN UNNEST(t.resources) unnested (resources_entry) WHERE unnested.resources_entry.ARN LIKE '%example/datafile.txt' ORDER BY eventtime

The following example queries for DeleteBucket events. The query extracts the name of the bucket and the account ID to which the bucket belongs from the resources object.

SELECT awsregion, replace(unnested.resources_entry.ARN,'arn:aws:s3:::') as deleted_bucket, eventtime AS time_deleted, useridentity.username, unnested.resources_entry.accountid as bucket_acct_id FROM cloudtrail_logs t CROSS JOIN UNNEST(t.resources) unnested (resources_entry) WHERE eventname = 'DeleteBucket' ORDER BY eventtime

For more information about unnesting, see Filter arrays.

Tips for querying CloudTrail logs

Consider the following when exploring CloudTrail log data:

  • Before querying the logs, verify that your logs table looks the same as the one in Create a table for CloudTrail logs in Athena using manual partitioning. If it is not the first table, delete the existing table using the following command: DROP TABLE cloudtrail_logs.

  • After you drop the existing table, re-create it. For more information, see Create a table for CloudTrail logs in Athena using manual partitioning.

    Verify that fields in your Athena query are listed correctly. For information about the full list of fields in a CloudTrail record, see CloudTrail record contents.

    If your query includes fields in JSON formats, such as STRUCT, extract data from JSON. For more information, see Extract JSON data from strings.

    Some suggestions for issuing queries against your CloudTrail table:

  • Start by looking at which users called which API operations and from which source IP addresses.

  • Use the following basic SQL query as your template. Paste the query to the Athena console and run it.

    SELECT useridentity.arn, eventname, sourceipaddress, eventtime FROM cloudtrail_logs LIMIT 100;
  • Modify the query to further explore your data.

  • To improve performance, include the LIMIT clause to return a specified subset of rows.