Loading tables with the COPY command
The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from files on Amazon S3, from a DynamoDB table, or from text output from one or more remote hosts.
Before learning all the options of the COPY command, we recommend learning the basic options to load Amazon S3 data. The Amazon Redshift Getting Started Guide demonstrates a simple use of the COPY command to load Amazon S3 data using a default IAM role. See Step 4: Load data from Amazon S3 to Amazon Redshift for details.
Note
We strongly recommend using the COPY command to load large amounts of data. Using individual INSERT statements to populate a table might be prohibitively slow. Alternatively, if your data already exists in other Amazon Redshift database tables, use INSERT INTO ... SELECT or CREATE TABLE AS to improve performance. For information, see INSERT or CREATE TABLE AS.
To load data from another Amazon resource, Amazon Redshift must have permission to access the resource and perform the necessary actions.
To grant or revoke privilege to load data into a table using a COPY command, grant or revoke the INSERT privilege.
Your data needs to be in the proper format for loading into your Amazon Redshift table. This section presents guidelines for preparing and verifying your data before the load and for validating a COPY statement before you run it.
To protect the information in your files, you can encrypt the data files before you upload them to your Amazon S3 bucket; COPY will decrypt the data as it performs the load. You can also limit access to your load data by providing temporary security credentials to users. Temporary security credentials provide enhanced security because they have short life spans and cannot be reused after they expire.
Amazon Redshift has features built in to COPY to load uncompressed, delimited data quickly. But you can compress your files using gzip, lzop, or bzip2 to save time uploading the files.
If the following keywords are in the COPY query, automatic splitting of uncompressed data is not supported: ESCAPE, REMOVEQUOTES, and FIXEDWIDTH. But the CSV keyword is supported.
To help keep your data secure in transit within the Amazon Cloud, Amazon Redshift uses hardware accelerated SSL to communicate with Amazon S3 or Amazon DynamoDB for COPY, UNLOAD, backup, and restore operations.
When you load your table directly from an Amazon DynamoDB table, you have the option to control the amount of Amazon DynamoDB provisioned throughput you consume.
You can optionally let COPY analyze your input data and automatically apply optimal compression encodings to your table as part of the load process.
Topics
- Credentials and access permissions
- Preparing your input data
- Loading data from Amazon S3
- Loading data from Amazon EMR
- Loading data from remote hosts
- Loading data from an Amazon DynamoDB table
- Verifying that the data loaded correctly
- Validating input data
- Loading tables with automatic compression
- Optimizing storage for narrow tables
- Loading default column values
- Troubleshooting data loads