Create a table for CloudTrail logs in Athena using manual partitioning
You can manually create tables for CloudTrail log files in the Athena console, and then run queries in Athena.
To create an Athena table for a CloudTrail trail using the Athena console
-
Copy and paste the following DDL statement into the Athena console query editor.
CREATE EXTERNAL TABLE cloudtrail_logs ( eventversion STRING, useridentity STRUCT< type:STRING, principalid:STRING, arn:STRING, accountid:STRING, invokedby:STRING, accesskeyid:STRING, userName:STRING, onBehalfOf: STRUCT< userId: STRING, identityStoreArn: STRING>, sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING, creationdate:STRING>, sessionissuer:STRUCT< type:STRING, principalId:STRING, arn:STRING, accountId:STRING, userName:STRING>, ec2RoleDelivery:string, webIdFederationData: STRUCT< federatedProvider: STRING, attributes: map<string,string> > > >, eventtime STRING, eventsource STRING, eventname STRING, awsregion STRING, sourceipaddress STRING, useragent STRING, errorcode STRING, errormessage STRING, requestparameters STRING, responseelements STRING, additionaleventdata STRING, requestid STRING, eventid STRING, resources ARRAY<STRUCT< arn:STRING, accountid:STRING, type:STRING>>, eventtype STRING, apiversion STRING, readonly STRING, recipientaccountid STRING, serviceeventdetails STRING, sharedeventid STRING, vpcendpointid STRING, eventCategory STRING, tlsDetails struct< tlsVersion:string, cipherSuite:string, clientProvidedHostHeader:string> ) PARTITIONED BY (region string, year string, month string, day string) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/
Account_ID
/CloudTrail/';Note
We suggest using the
org.apache.hive.hcatalog.data.JsonSerDe
shown in the example. Although acom.amazon.emr.hive.serde.CloudTrailSerde
exists, it does not currently handle some of the newer CloudTrail fields. -
(Optional) Remove any fields not required for your table. If you need to read only a certain set of columns, your table definition can exclude the other columns.
-
Modify
s3://amzn-s3-demo-bucket/AWSLogs/
to point to the Amazon S3 bucket that contains your log data.Account_ID/
CloudTrail/ -
Verify that fields are listed correctly. For more information about the full list of fields in a CloudTrail record, see CloudTrail record contents.
The example
CREATE TABLE
statement in Step 1 uses the Hive JSON SerDe. In the example, the fieldsrequestparameters
,responseelements
, andadditionaleventdata
are listed as typeSTRING
in the query, but areSTRUCT
data type used in JSON. Therefore, to get data out of these fields, useJSON_EXTRACT
functions. For more information, see Extract JSON data from strings. For performance improvements, the example partitions the data by Amazon Web Services Region, year, month, and day. -
Run the
CREATE TABLE
statement in the Athena console. -
Use the ALTER TABLE ADD PARTITION command to load the partitions so that you can query them, as in the following example.
ALTER TABLE
table_name
ADD PARTITION (region='us-east-1', year='2019', month='02', day='01') LOCATION 's3://amzn-s3-demo-bucket/AWSLogs/Account_ID
/CloudTrail/us-east-1/2019/02/01/
'