Working with Amazon CloudTrail Lake - Amazon CloudTrail
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Working with Amazon CloudTrail Lake

Amazon CloudTrail Lake lets you run SQL-based queries on your events. CloudTrail Lake converts existing events in row-based JSON format to Apache ORC format. ORC is a columnar storage format that is optimized for fast retrieval of data. Events are aggregated into event data stores, which are immutable collections of events based on criteria that you select by applying advanced event selectors. You can keep the event data in an event data store for up to 3,653 days (about 10 years) if you choose the One-year extendable retention pricing option, or up to 2,557 days (about 7 years) if you choose the Seven-year retention pricing option. The selectors that you apply to an event data store control which events persist and are available for you to query. CloudTrail Lake is an auditing solution that can complement your compliance stack, and assist you with near real-time troubleshooting.

CloudTrail Lake event data stores

When you create an event data store, you choose the type of events to include in your event data store. You can create an event data store to include CloudTrail events, CloudTrail Insights events, Amazon Config configuration items, Amazon Audit Manager evidence, or events from outside of Amazon. Each event data store can only contain a specific event category (for example, Amazon Config configuration items), because the event schema is unique to the event category. You can store events from an organization in Amazon Organizations in an organization event data store, including events from multiple Regions and accounts. You can also run SQL queries across multiple event data stores using the supported SQL JOIN keywords. For information about running queries across multiple event data stores, see Advanced, multi-table query support.

You can copy trail events to a new or existing event data store to create a point-in-time snapshot of events logged to the trail. For more information, see Copy trail events to an event data store.

You can federate an event data store to see the metadata associated with the event data store in the Amazon Glue Data Catalog and run SQL queries on the event data using Amazon Athena. The table metadata stored in the Amazon Glue Data Catalog lets the Athena query engine know how to find, read, and process the data that you want to query. For more information, see Federate an event data store.

By default, all events in an event data store are encrypted by CloudTrail. When you configure an event data store, you can choose to use your own Amazon Key Management Service key. Using your own KMS key incurs Amazon KMS costs for encryption and decryption. After you associate an event data store with a KMS key, the KMS key cannot be removed or changed.

You can control access to actions on event data stores by using authorization based on tags. For more information and examples, see Examples: Denying access to create or delete event data stores based on tags in this guide.

You can use CloudTrail Lake dashboards to visualize the data in your event data stores. Each dashboard consists of multiple widgets and each widget represents a SQL query. For more information about Lake dashboards, see View CloudTrail Lake dashboards.

CloudTrail Lake event data stores incur charges. When you create an event data store, you choose the pricing option you want to use for the event data store. The pricing option determines the cost for ingesting and storing events, and the default and maximum retention period for the event data store. For information about CloudTrail pricing and managing Lake costs, see Amazon CloudTrail Pricing and Managing CloudTrail Lake costs.

CloudTrail Lake supports Amazon CloudWatch metrics, which provide information about data ingested and storage bytes. For more information about supported CloudWatch metrics, see Supported CloudWatch metrics.

Note

CloudTrail typically delivers events within an average of about 5 minutes of an API call. This time is not guaranteed.

CloudTrail Lake integrations

You can use CloudTrail Lake integrations to log and store user activity data from outside of Amazon; from any source in your hybrid environments, such as in-house or SaaS applications hosted on-premises or in the cloud, virtual machines, or containers. After you create event data stores in CloudTrail Lake and create a channel to log activity events, you call the PutAuditEvents API to ingest your application activity into CloudTrail. You can then use CloudTrail Lake to search, query, and analyze the data that is logged from your applications.

Integrations can also log events to your event data stores from over a dozen CloudTrail partners. In a partner integration, you create destination event data stores, a channel, and a resource policy. After you create the integration, you provide the channel ARN to the partner. There are two types of integrations: direct and solution. With direct integrations, the partner calls the PutAuditEvents API to deliver events to the event data store for your Amazon account. With solution integrations, the application runs in your Amazon account and the application calls the PutAuditEvents API to deliver events to the event data store for your Amazon account.

For more information about integrations, see Create an integration with an event source outside of Amazon.

CloudTrail Lake queries

CloudTrail Lake queries offer a deeper and more customizable view of events than simple key and value lookups in Event history, or running LookupEvents. An Event history search is limited to a single Amazon Web Services account, only returns events from a single Amazon Web Services Region, and cannot query multiple attributes. In contrast, CloudTrail Lake users can run complex SQL queries across multiple event fields. CloudTrail Lake supports all valid Presto SELECT statements and functions. For more information about the supported SQL functions and operators, see Functions and Operators on the Presto documentation website.

You can save CloudTrail Lake queries for future use, and view results of queries for up to seven days. When you run queries, you can save the query results to an Amazon S3 bucket.

The CloudTrail console provides a number of sample queries that can help you get started writing your own queries. For more information, see View sample queries in the CloudTrail console.

CloudTrail Lake queries incur charges. When you run queries in Lake, you pay based upon the amount of data scanned. For information about CloudTrail pricing and managing Lake costs, see Amazon CloudTrail Pricing and Managing CloudTrail Lake costs.

Additional resources

The following resources can help you get a better understanding of what CloudTrail Lake is and how you can use it.