Migrating from CUR to Data Exports CUR 2.0
Amazon Data Exports allows you to create exports of Cost and Usage Report 2.0 (CUR 2.0). The CUR 2.0 table provides the same information as Cost and Usage Reports (CUR) along with some improvements. Data Exports enables you to create a CUR 2.0 export that is backwards compatible with the data pipelines you’ve been using to process CUR.
CUR 2.0 provides the following improvements over CUR:
-
Consistent schema: CUR 2.0 contains a fixed set of columns, whereas the columns included for CUR can vary monthly depending on your usage of Amazon services, cost categories, and resource tags.
-
Nested data: CUR 2.0 reduces data sparsity by collapsing certain columns from CUR into individual columns with key-value pairs of the collapsed columns. Optionally, you can query the nested keys in Data Exports as separate columns to match the original CUR schema and data.
-
Additional columns: CUR 2.0 contains two additional columns: bill_payer_account_name and line_item_usage_account_name.
The following table outlines the differences between CUR 2.0 and legacy CUR in more detail:
CUR 2.0 | Legacy CUR | |
---|---|---|
Data schema |
Fixed schema. For the complete column list, see Cost and Usage Report (CUR) 2.0. |
Dynamic schema based on Amazon usage and activity. For the partial column list, see Data dictionary. |
Exclusive columns |
|
None |
Export customization |
Enables basic SQL for column selections, row filtering, and column aliasing (renaming). For details about the supported SQL syntax, see Data query. |
Not supported. You must manually set up Athena/QuickSight to create the view you require. |
Nested columns with key-value pairs |
|
No nested columns. The four nested columns in CUR 2.0 are split into separate columns in legacy CUR
(for example, |
File delivery destination | S3 bucket | S3 bucket |
File output formats | GZIP, Parquet | ZIP, GZIP, Parquet |
Integration with other Amazon services | Amazon QuickSight | Amazon Athena, Amazon Redshift, Amazon QuickSight |
Amazon CloudFormation support |
Yes For details, see Amazon Data Exports resource type reference in the Amazon CloudFormation User Guide. |
Yes For details, see Amazon Cost and Usage Report resource type reference in the Amazon CloudFormation User Guide. |
Tag and cost category data | Tag and cost category names are normalized to remove special characters and spaces. In the event that there are conflicting tags or cost categories after normalization, only one value is kept. For more information, see Column names. |
The behavior is different between legacy CUR Parquet and CSV file formats. Legacy CUR Parquet: Tag and cost category names are normalized to remove special characters and spaces. In the event that there are conflicting tags or cost categories after normalization, only one value is kept. For more information, see Column names. Legacy CUR CSV: Tag and cost category names are not changed. |
For more detailed information about the schema of CUR 2.0, see the Data Exports table dictionary.
You can migrate to CUR 2.0 in Data Exports in two ways:
-
Method one: Create an export with an SQL query using the CUR schema
-
Method two: Create an export of CUR 2.0 with its new schema
Method one: Create an export with an SQL query using the CUR schema
You can create an export with an SQL query. The export schema matches what you receive today in CUR. You do this using the Amazon API or SDK.
-
Determine (a) the list of columns and (b) the CUR content settings (Include resource IDs, Split cost allocation data, and Time granularity) needed in order to match your CUR today.
-
You can determine the list of columns either by viewing the schema of one of your CUR files or going to the manifest file and extracting the list of columns from there.
-
You can determine the CUR content settings by going to Data Exports in the console and choosing your CUR export to view its details.
-
-
Write an SQL query that selects the columns you identified from the CUR 2.0 table named
COST_AND_USAGE_REPORT
.-
All column names in the CUR 2.0 table are in snake case (for example,
line_item_usage_amount
). For your SQL statement, you might need to convert the previous column names to snake case. -
For your SQL statement, you need to convert all
resource_tag
andcost_category
columns, and certainproduct
anddiscount
columns, to have the dot operator in order to select the nested columns in CUR 2.0. For example, to select theproduct_from_location
column in CUR 2.0, write an SQL statement selectingproduct.from_location
.Example:
SELECT product.from_location FROM COST_AND_USAGE_REPORT
This selects the
from_location
column of theproduct
map column. -
By default, the column selected with a dot operator is named by the attribute (for example,
from_location
). To match your existing CUR, you’ll need to declare an alias for the column in order to have the same as before.Example:
SELECT product.from_location AS product_from_location FROM COST_AND_USAGE_REPORT
For more details on nested columns, see the Data Exports table dictionary.
-
-
Write the CUR content settings, identified in step 1, into the table configuration format for the
CreateExport
API. You need to provide these table configurations with your data query in the next step. -
In the Amazon SDK/CLI for Data Exports, use the
CreateExport
API to input your SQL query and table configurations into the data-query field.-
Specify delivery preferences, such as the target Amazon S3 bucket and the overwrite preference. We recommend choosing the same delivery preferences you had before. For more information on the required fields, see Amazon Data Exports in the Amazon Billing and Cost Management API Reference.
-
Update the permissions of the target Amazon S3 bucket to allow Data Exports to write to the bucket. For more information, see Setting up an Amazon S3 bucket for data exports.
-
-
Direct your data ingestion pipeline to read data from the directory in the Amazon S3 bucket where your CUR 2.0 is being delivered.
Method two: Create an export of CUR 2.0 with its new schema
You can create an export of CUR 2.0 with its new schema of nested columns and additional columns. However, you’ll need to adjust your current data pipeline to process these new columns. You do this using the console, the Amazon API, or SDK.
-
Determine the CUR content settings (Include resource IDs, Split cost allocation data, and Time granularity) needed in order to match your CUR today.
-
You can determine the CUR content settings by going to Data Exports in the console and choosing your CUR export to view its details.
-
-
Using either the Data Exports console page (Option A) or the Amazon SDK/CLI (Option B), create an export of CUR 2.0 that selects all columns from the “Cost and usage report” table.
-
(Option A) To create the export in the console:
-
In the navigation pane, choose Data Exports.
-
On the Data Exports page, choose Create.
-
Choose Standard data export.
For the Cost and Usage Report (CUR 2.0) table, all columns are selected by default.
-
Specify the CUR content settings that you identified in step 1.
-
Under Data table delivery options, choose your options.
-
Choose Create.
-
-
(Option B) To create the export using the Amazon API/SDK, first write a query that selects all columns in the
COST_AND_USAGE_REPORT
table.-
Use the
GetTable
API to determine the complete list of columns and receive the full schema. -
Write the CUR content settings, identified in step 1, into the table configuration format for the
CreateExport
API. -
Use the
CreateExport
API to input your SQL query and table configurations into thedata-query
field. -
Specify delivery preferences, such as the target Amazon S3 bucket and the overwrite preference. We recommend choosing the same delivery preferences you had before. For more information on the required fields, see Amazon Data Exports in the Amazon Billing and Cost Management API Reference.
-
Update the permissions of the target Amazon S3 bucket to allow Data Exports to write to the bucket. For more information, see Setting up an Amazon S3 bucket for data exports.
-
-
Direct your data ingestion pipeline to read data from the directory in the Amazon S3 bucket where your CUR 2.0 is being delivered.
You also need to update your data ingestion pipeline and your business intelligence tools to process the following new columns with nested key-values:
product
,resource_tags
,cost_category
, anddiscounts
.