Using the COPY command to load from Amazon S3
Use the COPY command to load a table in parallel from data files on Amazon S3. You can specify the files to be loaded by using an Amazon S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as follows:
COPY <table_name> FROM 's3://<bucket_name>/<object_prefix>' authorization;
The manifest file is a JSON-formatted file that lists the data files to be loaded. The syntax to specify the files to be loaded by using a manifest file is as follows:
COPY <table_name> FROM 's3://<bucket_name>/<manifest_file>' authorization MANIFEST;
The table to be loaded must already exist in the database. For information about creating a table, see CREATE TABLE in the SQL Reference.
The values for authorization provide the Amazon authorization Amazon Redshift needs to access the Amazon S3 objects. For information about required permissions, see IAM permissions for COPY, UNLOAD, and CREATE LIBRARY. The preferred method for authentication is to specify the IAM_ROLE parameter and provide the Amazon Resource Name (ARN) for an IAM role with the necessary permissions. For more information, see Role-based access control .
To authenticate using the IAM_ROLE parameter, replace
<aws-account-id>
and
<role-name>
as shown in the following syntax.
IAM_ROLE 'arn:aws:iam::
<aws-account-id>
:role/<role-name>
'
The following example shows authentication using an IAM role.
COPY customer FROM 's3://amzn-s3-demo-bucket/mydata' IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
For more information about other authorization options, see Authorization parameters
If you want to validate your data without actually loading the table, use the NOLOAD option with the COPY command.
The following example shows the first few rows of a pipe-delimited data in a file
named venue.txt
.
1|Toyota Park|Bridgeview|IL|0 2|Columbus Crew Stadium|Columbus|OH|0 3|RFK Stadium|Washington|DC|0
Before uploading the file to Amazon S3, split the file into multiple files so that the COPY command can load it using parallel processing. The number of files should be a multiple of the number of slices in your cluster. Split your load data files so that the files are about equal size, between 1 MB and 1 GB after compression. For more information, see Loading data from compressed and uncompressed files.
For example, the venue.txt
file might be split into four files, as
follows:
venue.txt.1 venue.txt.2 venue.txt.3 venue.txt.4
The following COPY command loads the VENUE table using the pipe-delimited data in
the data files with the prefix 'venue' in the Amazon S3 bucket amzn-s3-demo-bucket
.
Note
The Amazon S3 bucket amzn-s3-demo-bucket
in the following examples does not exist.
For sample COPY commands that use real data in an existing Amazon S3 bucket, see Load sample data.
COPY venue FROM 's3://amzn-s3-demo-bucket/venue' IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole' DELIMITER '|';
If no Amazon S3 objects with the key prefix 'venue' exist, the load fails.