Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions,
see Getting Started with Amazon Web Services in China
(PDF).
Unloading semistructured data
With Amazon Redshift, you can export semistructured data from your Amazon Redshift cluster to Amazon S3 in a variety of formats,
including text, Apache Parquet, Apache ORC, and Avro. The following sections will guide you through the
process of configuring and executing unload operations for your semistructured data in Amazon Redshift.
- CSV or text formats
-
You can unload tables with SUPER data columns to Amazon S3 in a comma-separated value (CSV)
or text format. Using a combination of navigation and unnest clauses, Amazon Redshift unloads
hierarchical data in SUPER data format to Amazon S3 in CSV or text formats. Subsequently, you
can create external tables against unloaded data and query them using Redshift Spectrum. For
information on using UNLOAD and the required IAM permissions, see UNLOAD.
Before running the following example, populate the region_nations table using the processes in Loading semistructured data into Amazon Redshift.
For information on the tables used in the following example, see SUPER sample dataset.
The following example unloads data into Amazon S3.
UNLOAD ('SELECT * FROM region_nations')
TO 's3://xxxxxx/'
IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3-Write'
DELIMITER AS '|'
GZIP
ALLOWOVERWRITE;
Unlike other data types where a user-defined string represents a null value, Amazon Redshift
exports the SUPER data columns using the JSON format and represents it as
null as determined by the JSON format. As a result, SUPER data
columns ignore the NULL [AS] option used in UNLOAD commands.
- Parquet format
-
You can unload tables with SUPER data columns to Amazon S3 in the Parquet format. Amazon Redshift represents SUPER columns in Parquet as the JSON data type. This enables semistructured data to be represented in Parquet. You can query these columns using Redshift Spectrum or ingest them back to Amazon Redshift using the COPY command. For
information on using UNLOAD and the required IAM permissions, see UNLOAD.
The following example unloads data into Amazon S3 in the Parquet format.
UNLOAD ('SELECT * FROM region_nations')
TO 's3://xxxxxx/'
IAM_ROLE 'arn:aws:iam::xxxxxxxxxxxx:role/Redshift-S3-Write'
FORMAT PARQUET;