CREATE TABLE AS
Creates a new table populated with the results of a SELECT query. To create an empty table, use CREATE TABLE. CREATE TABLE AS
combines a CREATE TABLE DDL statement with a SELECT DML statement
and therefore technically contains both DDL and DML. Note that although CREATE TABLE
AS is grouped here with other DDL statements, CTAS queries in Athena are treated
as DML for Service Quotas purposes. For information about Service Quotas in Athena, see Service Quotas.
Note
For CTAS statements, the expected bucket owner setting does not apply to the destination table location in Amazon S3. The expected bucket owner setting applies only to the Amazon S3 output location that you specify for Athena query results. For more information, see Specify a query result location using the Athena console.
For additional information about CREATE TABLE AS that is beyond the scope of
this reference topic, see Create a table from query results (CTAS).
Synopsis
CREATE TABLE table_name
[ WITH ( property_name = expression [, ...] ) ]
AS query
[ WITH [ NO ] DATA ]
Where:
- WITH ( property_name = expression [, ...] )
-
A list of optional CTAS table properties, some of which are specific to the data storage format. See CTAS table properties.
- query
-
A SELECT query that is used to create a new table.
Important
If you plan to create a query with partitions, specify the names of partitioned columns last in the list of columns in the
SELECTstatement. - [ WITH [ NO ] DATA ]
-
If
WITH NO DATAis used, a new empty table with the same schema as the original table is created.
Note
To include column headers in your query result output, you can use a simple
SELECT query instead of a CTAS query. You can retrieve the results
from your query results location or download the results directly using the Athena
console. For more information, see Work with query results and recent queries.
CTAS table properties
Each CTAS table in Athena has a list of optional CTAS table properties that you specify
using WITH (property_name = expression [, ...] ). For information about
using these parameters, see Examples of CTAS queries.
-
WITH (property_name = expression [, ...], ) -
-
table_type = ['HIVE', 'ICEBERG'] -
Optional. The default is
HIVE. Specifies the table type of the resulting tableExample:
WITH (table_type ='ICEBERG') -
external_location = [location] -
Note
Because Iceberg tables are not external, this property does not apply to Iceberg tables. To define the root location of an Iceberg table in a CTAS statement, use the
locationproperty described later in this section.Optional. The location where Athena saves your CTAS query in Amazon S3.
Example:
WITH (external_location ='s3://amzn-s3-demo-bucket/tables/parquet_table/')Athena does not use the same path for query results twice. If you specify the location manually, make sure that the Amazon S3 location that you specify has no data. Athena never attempts to delete your data. If you want to use the same location again, manually delete the data, or your CTAS query will fail.
If you run a CTAS query that specifies an
external_locationin a workgroup that enforces a query results location, the query fails with an error message. To see the query results location specified for the workgroup, see the workgroup's details.If your workgroup overrides the client-side setting for query results location, Athena creates your table in the following location:
s3://amzn-s3-demo-bucket/tables/query-id/If you do not use the
external_locationproperty to specify a location and your workgroup does not override client-side settings, Athena uses your client-side setting for the query results location to create your table in the following location:s3://amzn-s3-demo-bucket/Unsaved-or-query-name/year/month/date/tables/query-id/ -
is_external = [boolean] -
Optional. Indicates if the table is an external table. The default is true. For Iceberg tables, this must be set to false.
Example:
WITH (is_external = false) -
location = [location] -
Required for Iceberg tables. Specifies the root location for the Iceberg table to be created from the query results.
Example:
WITH (location ='s3://amzn-s3-demo-bucket/tables/iceberg_table/') -
field_delimiter = [delimiter] -
Optional and specific to text-based data storage formats. The single-character field delimiter for files in CSV, TSV, and text files. For example,
WITH (field_delimiter = ','). Currently, multicharacter field delimiters are not supported for CTAS queries. If you don't specify a field delimiter,\001is used by default. -
format = [storage_format] -
The storage format for the CTAS query results, such as
ORC,PARQUET,AVRO,JSON,ION, orTEXTFILE. For Iceberg tables, the allowed formats areORC,PARQUET, andAVRO. If omitted,PARQUETis used by default. The name of this parameter,format, must be listed in lowercase, or your CTAS query will fail.Example:
WITH (format = 'PARQUET') -
bucketed_by = ARRAY[ column_name[,…], bucket_count = [int] ] -
Note
This property does not apply to Iceberg tables. For Iceberg tables, use partitioning with bucket transform.
An array list of buckets to bucket data. If omitted, Athena does not bucket your data in this query.
-
bucket_count = [int] -
Note
This property does not apply to Iceberg tables. For Iceberg tables, use partitioning with bucket transform.
The number of buckets for bucketing your data. If omitted, Athena does not bucket your data. Example:
CREATE TABLE bucketed_table WITH ( bucketed_by = ARRAY[column_name], bucket_count = 30, format = 'PARQUET', external_location ='s3://amzn-s3-demo-bucket/tables/parquet_table/' ) AS SELECT * FROMtable_name -
partitioned_by = ARRAY[ col_name[,…] ] -
Note
This property does not apply to Iceberg tables. To use partition transforms for Iceberg tables, use the
partitioningproperty described later in this section.Optional. An array list of columns by which the CTAS table will be partitioned. Verify that the names of partitioned columns are listed last in the list of columns in the
SELECTstatement. -
partitioning = ARRAY[partition_transform, ...] -
Optional. Specifies the partitioning of the Iceberg table to be created. Iceberg supports a wide variety of partition transforms and partition evolution. Partition transforms are summarized in the following table.
Transform Description year(ts)Creates a partition for each year. The partition value is the integer difference in years between tsand January 1, 1970.month(ts)Creates a partition for each month of each year. The partition value is the integer difference in months between tsand January 1, 1970.day(ts)Creates a partition for each day of each year. The partition value is the integer difference in days between tsand January 1, 1970.hour(ts)Creates a partition for each hour of each day. The partition value is a timestamp with the minutes and seconds set to zero. bucket(x, nbuckets)Hashes the data into the specified number of buckets. The partition value is an integer hash of x, with a value between 0 andnbuckets - 1, inclusive.truncate(s, nchars)Makes the partition value the first ncharscharacters ofs.Example:
WITH (partitioning = ARRAY['month(order_date)', 'bucket(account_number, 10)', 'country'])) -
optimize_rewrite_min_data_file_size_bytes = [long] -
Optional. Data optimization specific configuration. Files smaller than the specified value are included for optimization. The default is 0.75 times the value of
write_target_data_file_size_bytes. This property applies only to Iceberg tables. For more information, see Optimize Iceberg tables.Example:
WITH (optimize_rewrite_min_data_file_size_bytes = 402653184) -
optimize_rewrite_max_data_file_size_bytes = [long] -
Optional. Data optimization specific configuration. Files larger than the specified value are included for optimization. The default is 1.8 times the value of
write_target_data_file_size_bytes. This property applies only to Iceberg tables. For more information, see Optimize Iceberg tables.Example:
WITH (optimize_rewrite_max_data_file_size_bytes = 966367641) -
optimize_rewrite_data_file_threshold = [int] -
Optional. Data optimization specific configuration. If there are fewer data files that require optimization than the given threshold, the files are not rewritten. This allows the accumulation of more data files to produce files closer to the target size and skip unnecessary computation for cost savings. The default is 5. This property applies only to Iceberg tables. For more information, see Optimize Iceberg tables.
Example:
WITH (optimize_rewrite_data_file_threshold = 5) -
optimize_rewrite_delete_file_threshold = [int] -
Optional. Data optimization specific configuration. If there are fewer delete files associated with a data file than the threshold, the data file is not rewritten. This allows the accumulation of more delete files for each data file for cost savings. The default is 2. This property applies only to Iceberg tables. For more information, see Optimize Iceberg tables.
Example:
WITH (optimize_rewrite_delete_file_threshold = 2) -
vacuum_min_snapshots_to_keep = [int] -
Optional. Vacuum specific configuration. The minimum number of most recent snapshots to retain. The default is 1. This property applies only to Iceberg tables. For more information, see VACUUM.
Note
The
vacuum_min_snapshots_to_keepproperty requires Athena engine version 3.Example:
WITH (vacuum_min_snapshots_to_keep = 1) -
vacuum_max_snapshot_age_seconds = [long] -
Optional. Vacuum specific configuration. A period in seconds that represents the age of the snapshots to retain. The default is 432000 (5 days). This property applies only to Iceberg tables. For more information, see VACUUM.
Note
The
vacuum_max_snapshot_age_secondsproperty requires Athena engine version 3.Example:
WITH (vacuum_max_snapshot_age_seconds = 432000) -
write_compression = [compression_format] -
The compression type to use for any storage format that allows compression to be specified. The
compression_formatvalue specifies the compression to be used when the data is written to the table. You can specify compression for theTEXTFILE,JSON,PARQUET, andORCfile formats.For example, if the
formatproperty specifiesPARQUETas the storage format, the value forwrite_compressionspecifies the compression format for Parquet. In this case, specifying a value forwrite_compressionis equivalent to specifying a value forparquet_compression.Similarly, if the
formatproperty specifiesORCas the storage format, the value forwrite_compressionspecifies the compression format for ORC. In this case, specifying a value forwrite_compressionis equivalent to specifying a value fororc_compression.Multiple compression format table properties cannot be specified in the same CTAS query. For example, you cannot specify both
write_compressionandparquet_compressionin the same query. The same applies forwrite_compressionandorc_compression. For information about the compression types that are supported for each file format, see Use compression in Athena. -
orc_compression = [compression_format] -
The compression type to use for the
ORCfile format whenORCdata is written to the table. For example,WITH (orc_compression = 'ZLIB'). Chunks within theORCfile (except theORCPostscript) are compressed using the compression that you specify. If omitted, ZLIB compression is used by default forORC.Note
For consistency, we recommend that you use the
write_compressionproperty instead oforc_compression. Use theformatproperty to specify the storage format asORC, and then use thewrite_compressionproperty to specify the compression format thatORCwill use. -
parquet_compression = [compression_format] -
The compression type to use for the Parquet file format when Parquet data is written to the table. For example,
WITH (parquet_compression = 'SNAPPY'). This compression is applied to column chunks within the Parquet files. If omitted, GZIP compression is used by default for Parquet.Note
For consistency, we recommend that you use the
write_compressionproperty instead ofparquet_compression. Use theformatproperty to specify the storage format asPARQUET, and then use thewrite_compressionproperty to specify the compression format thatPARQUETwill use. -
compression_level = [compression_level] -
The compression level to use. This property applies only to ZSTD compression. Possible values are from 1 to 22. The default value is 3. For more information, see Use ZSTD compression levels.
-
Examples
For examples of CTAS queries, consult the following resources.