Querying Amazon CloudFront logs - Amazon Athena
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Querying Amazon CloudFront logs

You can configure Amazon CloudFront CDN to export Web distribution access logs to Amazon Simple Storage Service. Use these logs to explore users' surfing patterns across your web properties served by CloudFront.

Before you begin querying the logs, enable Web distributions access log on your preferred CloudFront distribution. For information, see Access logs in the Amazon CloudFront Developer Guide. Make a note of the Amazon S3 bucket in which you save these logs.

Creating a table for CloudFront standard logs

Note

This procedure works for the Web distribution access logs in CloudFront. It does not apply to streaming logs from RTMP distributions.

To create a table for CloudFront standard log file fields

  1. Copy and paste the following example DDL statement into the Query Editor in the Athena console. The example statement uses the log file fields documented in the Standard log file fields section of the Amazon CloudFront Developer Guide. Modify the LOCATION for the Amazon S3 bucket that stores your logs. For information about using the Query Editor, see Getting started.

    This query specifies ROW FORMAT DELIMITED and FIELDS TERMINATED BY '\t' to indicate that the fields are delimited by tab characters. For ROW FORMAT DELIMITED, Athena uses the LazySimpleSerDe by default. The column date is escaped using backticks (`) because it is a reserved word in Athena. For information, see Reserved keywords.

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_standard_logs ( `date` DATE, time STRING, x_edge_location STRING, sc_bytes BIGINT, c_ip STRING, cs_method STRING, cs_host STRING, cs_uri_stem STRING, sc_status INT, cs_referrer STRING, cs_user_agent STRING, cs_uri_query STRING, cs_cookie STRING, x_edge_result_type STRING, x_edge_request_id STRING, x_host_header STRING, cs_protocol STRING, cs_bytes BIGINT, time_taken FLOAT, x_forwarded_for STRING, ssl_protocol STRING, ssl_cipher STRING, x_edge_response_result_type STRING, cs_protocol_version STRING, fle_status STRING, fle_encrypted_fields INT, c_port INT, time_to_first_byte FLOAT, x_edge_detailed_result_type STRING, sc_content_type STRING, sc_content_len BIGINT, sc_range_start BIGINT, sc_range_end BIGINT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://DOC-EXAMPLE-BUCKET/' TBLPROPERTIES ( 'skip.header.line.count'='2' )
  2. Run the query in Athena console. After the query completes, Athena registers the cloudfront_standard_logs table, making the data in it ready for you to issue queries.

Creating a table for CloudFront real-time logs

To create a table for CloudFront real-time log file fields

  1. Copy and paste the following example DDL statement into the Query Editor in the Athena console. The example statement uses the log file fields documented in the Real-time logs section of the Amazon CloudFront Developer Guide. Modify the LOCATION for the Amazon S3 bucket that stores your logs. For information about using the Query Editor, see Getting started.

    This query specifies ROW FORMAT DELIMITED and FIELDS TERMINATED BY '\t' to indicate that the fields are delimited by tab characters. For ROW FORMAT DELIMITED, Athena uses the LazySimpleSerDe by default. The column timestamp is escaped using backticks (`) because it is a reserved word in Athena. For information, see Reserved keywords.

    The follow example contains all of the available fields. You can comment out or remove fields that you do not require.

    CREATE EXTERNAL TABLE IF NOT EXISTS cloudfront_real_time_logs ( `timestamp` STRING, c_ip STRING, time_to_first_byte BIGINT, sc_status BIGINT, sc_bytes BIGINT, cs_method STRING, cs_protocol STRING, cs_host STRING, cs_uri_stem STRING, cs_bytes BIGINT, x_edge_location STRING, x_edge_request_id STRING, x_host_header STRING, time_taken BIGINT, cs_protocol_version STRING, c_ip_version STRING, cs_user_agent STRING, cs_referer STRING, cs_cookie STRING, cs_uri_query STRING, x_edge_response_result_type STRING, x_forwarded_for STRING, ssl_protocol STRING, ssl_cipher STRING, x_edge_result_type STRING, fle_encrypted_fields STRING, fle_status STRING, sc_content_type STRING, sc_content_len BIGINT, sc_range_start STRING, sc_range_end STRING, c_port BIGINT, x_edge_detailed_result_type STRING, c_country STRING, cs_accept_encoding STRING, cs_accept STRING, cache_behavior_path_pattern STRING, cs_headers STRING, cs_header_names STRING, cs_headers_count BIGINT, primary_distribution_id STRING, primary_distribution_dns_name STRING, origin_fbl STRING, origin_lbl STRING, asn STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://DOC-EXAMPLE-BUCKET/' TBLPROPERTIES ( 'skip.header.line.count'='2' )
  2. Run the query in Athena console. After the query completes, Athena registers the cloudfront_real_time_logs table, making the data in it ready for you to issue queries.

Example queries for standard CloudFront logs

The following query adds up the number of bytes served by CloudFront between June 9 and June 11, 2018. Surround the date column name with double quotes because it is a reserved word.

SELECT SUM(bytes) AS total_bytes FROM cloudfront_standard_logs WHERE "date" BETWEEN DATE '2018-06-09' AND DATE '2018-06-11' LIMIT 100;

To eliminate duplicate rows (for example, duplicate empty rows) from the query results, you can use the SELECT DISTINCT statement, as in the following example.

SELECT DISTINCT * FROM cloudfront_standard_logs LIMIT 10;

Additional resources

For more information about using Athena to query CloudFront logs, see the following posts from the Amazon big data blog.

Easily query Amazon Web Service logs using Amazon Athena (May 29, 2019).

Analyze your Amazon CloudFront access logs at scale (December 21, 2018).

Build a serverless architecture to analyze Amazon CloudFront access logs using Amazon Lambda, Amazon Athena, and Amazon Managed Service for Apache Flink (May 26, 2017).