

 Amazon Redshift will no longer support the creation of new Python UDFs starting Patch 198. Existing Python UDFs will continue to function until June 30, 2026. For more information, see the [ blog post ](https://amazonaws-china.com/blogs/big-data/amazon-redshift-python-user-defined-functions-will-reach-end-of-support-after-june-30-2026/). 

# Loading tables with the COPY command
Loading tables with COPY

The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from files on Amazon S3, from a DynamoDB table, or from text output from one or more remote hosts.

Before learning all the options of the COPY command, we recommend learning the basic options to load Amazon S3 data. The *Amazon Redshift Getting Started Guide* demonstrates a simple use of the COPY command to load Amazon S3 data using a default IAM role. See [Step 4: Load data from Amazon S3 to Amazon Redshift](https://docs.amazonaws.cn/redshift/latest/gsg/rs-gsg-create-sample-db.html) for details.

**Note**  
We strongly recommend using the COPY command to load large amounts of data. Using individual INSERT statements to populate a table might be prohibitively slow. Alternatively, if your data already exists in other Amazon Redshift database tables, use INSERT INTO ... SELECT or CREATE TABLE AS to improve performance. For information, see [INSERT](r_INSERT_30.md) or [CREATE TABLE AS](r_CREATE_TABLE_AS.md).

To load data from another Amazon resource, Amazon Redshift must have permission to access the resource and perform the necessary actions. 

To grant or revoke privilege to load data into a table using a COPY command, grant or revoke the INSERT privilege.

Your data needs to be in the proper format for loading into your Amazon Redshift table. This section presents guidelines for preparing and verifying your data before the load and for validating a COPY statement before you run it.

To protect the information in your files, you can encrypt the data files before you upload them to your Amazon S3 bucket; COPY will decrypt the data as it performs the load. You can also limit access to your load data by providing temporary security credentials to users. Temporary security credentials provide enhanced security because they have short life spans and cannot be reused after they expire.

Amazon Redshift has features built in to COPY to load uncompressed, delimited data quickly. But you can compress your files using gzip, lzop, or bzip2 to save time uploading the files.

If the following keywords are in the COPY query, automatic splitting of uncompressed data is not supported: ESCAPE, REMOVEQUOTES, and FIXEDWIDTH. But the CSV keyword is supported.

To help keep your data secure in transit within the Amazon Cloud, Amazon Redshift uses hardware accelerated SSL to communicate with Amazon S3 or Amazon DynamoDB for COPY, UNLOAD, backup, and restore operations.

When you load your table directly from an Amazon DynamoDB table, you have the option to control the amount of Amazon DynamoDB provisioned throughput you consume.

You can optionally let COPY analyze your input data and automatically apply optimal compression encodings to your table as part of the load process.

**Topics**
+ [

# Credentials and access permissions
](loading-data-access-permissions.md)
+ [

# Preparing your input data
](t_preparing-input-data.md)
+ [

# Loading data from Amazon S3
](t_Loading-data-from-S3.md)
+ [

# Loading data from Amazon EMR
](loading-data-from-emr.md)
+ [

# Loading data from remote hosts
](loading-data-from-remote-hosts.md)
+ [

# Loading data from an Amazon DynamoDB table
](t_Loading-data-from-dynamodb.md)
+ [

# Verifying that the data loaded correctly
](verifying-that-data-loaded-correctly.md)
+ [

# Validating input data
](t_Validating_input_files.md)
+ [

# Loading tables with automatic compression
](c_Loading_tables_auto_compress.md)
+ [

# Optimizing storage for narrow tables
](c_load_compression_hidden_cols.md)
+ [

# Loading default column values
](c_loading_default_values.md)
+ [

# Troubleshooting data loads
](t_Troubleshooting_load_errors.md)

# Credentials and access permissions
Credentials and access permissions

 To load or unload data using another Amazon resource, such as Amazon S3, Amazon DynamoDB, Amazon EMR, or Amazon EC2, Amazon Redshift must have permission to access the resource and perform the necessary actions to access the data. For example, to load data from Amazon S3, COPY must have LIST access to the bucket and GET access for the bucket objects. 

To obtain authorization to access a resource, Amazon Redshift must be authenticated. You can choose either role-based access control or key-based access control. This section presents an overview of the two methods. For complete details and examples, see [Permissions to access other Amazon Resources](copy-usage_notes-access-permissions.md).

## Role-based access control


With role-based access control, Amazon Redshift temporarily assumes an Amazon Identity and Access Management (IAM) role on your behalf. Then, based on the authorizations granted to the role, Amazon Redshift can access the required Amazon resources.

We recommend using role-based access control because it is provides more secure, fine-grained control of access to Amazon resources and sensitive user data, in addition to safeguarding your Amazon credentials.

To use role-based access control, you must first create an IAM role using the Amazon Redshift service role type, and then attach the role to your data warehouse. The role must have, at a minimum, the permissions listed in [IAM permissions for COPY, UNLOAD, and CREATE LIBRARY](copy-usage_notes-access-permissions.md#copy-usage_notes-iam-permissions). For steps to create an IAM role and attach it to your cluster, see [Creating an IAM Role to Allow Your Amazon Redshift Cluster to Access Amazon Services](https://docs.amazonaws.cn/redshift/latest/mgmt/authorizing-redshift-service.html#authorizing-redshift-service-creating-an-iam-role) in the *Amazon Redshift Management Guide*.

You can add a role to a cluster or view the roles associated with a cluster by using the Amazon Redshift Management Console, CLI, or API. For more information, see [Authorizing COPY and UNLOAD Operations Using IAM Roles](https://docs.amazonaws.cn/redshift/latest/mgmt/copy-unload-iam-role.html) in the *Amazon Redshift Management Guide*.

When you create an IAM role, IAM returns an Amazon Resource Name (ARN) for the role. To run a COPY command using an IAM role, provide the role ARN using the IAM\$1ROLE parameter or the CREDENTIALS parameter.

The following COPY command example uses IAM\$1ROLE parameter with the role `MyRedshiftRole` for authentication.

```
COPY customer FROM 's3://amzn-s3-demo-bucket/mydata' 
IAM_ROLE 'arn:aws:iam::12345678901:role/MyRedshiftRole';
```

The Amazon user must have, at a minimum, the permissions listed in [IAM permissions for COPY, UNLOAD, and CREATE LIBRARY](copy-usage_notes-access-permissions.md#copy-usage_notes-iam-permissions).

## Key-based access control


With key-based access control, you provide the access key ID and secret access key for a user that is authorized to access the Amazon resources that contain the data. 

**Note**  
We strongly recommend using an IAM role for authentication instead of supplying a plain-text access key ID and secret access key. If you choose key-based access control, never use your Amazon account (root) credentials. Always create an IAM user and provide that user's access key ID and secret access key. For steps to create an IAM user, see [Creating an IAM User in Your Amazon Account](https://docs.amazonaws.cn/IAM/latest/UserGuide/id_users_create.html).

# Preparing your input data
Preparing your input data

If your input data is not compatible with the table columns that will receive it, the COPY command will fail.

Use the following guidelines to help ensure that your input data is valid: 
+ Your data can only contain UTF-8 characters up to four bytes long.
+ Verify that CHAR and VARCHAR strings are no longer than the lengths of the corresponding columns. VARCHAR strings are measured in bytes, not characters, so, for example, a four-character string of Chinese characters that occupy four bytes each requires a VARCHAR(16) column.
+ Multibyte characters can only be used with VARCHAR columns. Verify that multibyte characters are no more than four bytes long.
+ Verify that data for CHAR columns only contains single-byte characters.
+ Do not include any special characters or syntax to indicate the last field in a record. This field can be a delimiter.
+ If your data includes null terminators, also referred to as NUL (UTF-8 0000) or binary zero (0x000), you can load these characters as NULLS into CHAR or VARCHAR columns by using the NULL AS option in the COPY command: `null as '\0'` or `null as '\000'` . If you do not use NULL AS, null terminators will cause your COPY to fail.
+ If your strings contain special characters, such as delimiters and embedded newlines, use the ESCAPE option with the [COPY](r_COPY.md) command.
+ Verify that all single and double quotation marks are appropriately matched.
+ Verify that floating-point strings are in either standard floating-point format, such as 12.123, or an exponential format, such as 1.0E4.
+ Verify that all timestamp and date strings follow the specifications for [DATEFORMAT and TIMEFORMAT stringsExample](r_DATEFORMAT_and_TIMEFORMAT_strings.md). The default timestamp format is YYYY-MM-DD hh:mm:ss, and the default date format is YYYY-MM-DD.
+ For more information about boundaries and limitations on individual data types, see [Data types](c_Supported_data_types.md). For information about multibyte character errors, see [Multibyte character load errors](multi-byte-character-load-errors.md)

# Loading data from Amazon S3
Loading data from Amazon S3

The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from a file or multiple files in an Amazon S3 bucket. You can take maximum advantage of parallel processing by splitting your data into multiple files, in cases where the files are compressed. (There are exceptions to this rule. These are detailed in [Loading data files](https://docs.amazonaws.cn/redshift/latest/dg/c_best-practices-use-multiple-files.html).) You can also take maximum advantage of parallel processing by setting distribution keys on your tables. For more information about distribution keys, see [Data distribution for query optimization](t_Distributing_data.md). 

Data is loaded into the target table, one line per row. The fields in the data file are matched to table columns in order, left to right. Fields in the data files can be fixed-width or character delimited; the default delimiter is a pipe (\$1). By default, all the table columns are loaded, but you can optionally define a comma-separated list of columns. If a table column is not included in the column list specified in the COPY command, it is loaded with a default value. For more information, see [Loading default column values](c_loading_default_values.md).

**Topics**
+ [

# Loading data from compressed and uncompressed files
](t_splitting-data-files.md)
+ [

# Uploading files to Amazon S3 to use with COPY
](t_uploading-data-to-S3.md)
+ [

# Using the COPY command to load from Amazon S3
](t_loading-tables-from-s3.md)

# Loading data from compressed and uncompressed files


When you load compressed data, we recommend that you split the data for each table into multiple files. When you load uncompressed, delimited data, the COPY command uses massively parallel processing (MPP) and scan ranges to load data from large files in an Amazon S3 bucket.

## Loading data from multiple compressed files


In cases where you have compressed data, we recommend that you split the data for each table into multiple files. The COPY command can load data from multiple files in parallel. You can load multiple files by specifying a common prefix, or *prefix key*, for the set, or by explicitly listing the files in a manifest file.

Split your data into files so that the number of files is a multiple of the number of slices in your cluster. That way, Amazon Redshift can divide the data evenly among the slices. The number of slices per node depends on the node size of the cluster. For example, each dc2.large compute node has two slices, and each dc2.8xlarge compute node has 16 slices. For more information about the number of slices that each node size has, see [About clusters and nodes](https://docs.amazonaws.cn/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-nodes) in the *Amazon Redshift Management Guide*. 

The nodes all participate in running parallel queries, working on data that is distributed as evenly as possible across the slices. If you have a cluster with two dc2.large nodes, you might split your data into four files or some multiple of four. Amazon Redshift doesn't take file size into account when dividing the workload. Thus, you need to ensure that the files are roughly the same size, from 1 MB to 1 GB after compression. 

To use object prefixes to identify the load files, name each file with a common prefix. For example, you might split the `venue.txt` file might be split into four files, as follows.

```
venue.txt.1
venue.txt.2
venue.txt.3
venue.txt.4
```

If you put multiple files in a folder in your bucket and specify the folder name as the prefix, COPY loads all of the files in the folder. If you explicitly list the files to be loaded by using a manifest file, the files can reside in different buckets or folders.

For more information about manifest files, see [Using a manifest to specify data files](r_COPY_command_examples.md#copy-command-examples-manifest).

## Loading data from uncompressed, delimited files


When you load uncompressed, delimited data, the COPY command uses the massively parallel processing (MPP) architecture in Amazon Redshift. Amazon Redshift automatically uses slices working in parallel to load ranges of data from a large file in an Amazon S3 bucket. The file must be delimited for parallel loading to occur. For example, pipe delimited. Automatic, parallel data loading with the COPY command is also available for CSV files. You can also take advantage of parallel processing by setting distribution keys on your tables. For more information about distribution keys, see [Data distribution for query optimization](t_Distributing_data.md).

Automatic, parallel data loading isn't supported when the COPY query includes any of the following keywords: ESCAPE, REMOVEQUOTES, and FIXEDWIDTH.

Data from the file or files is loaded into the target table, one line per row. The fields in the data file are matched to table columns in order, left to right. Fields in the data files can be fixed-width or character delimited; the default delimiter is a pipe (\$1). By default, all the table columns are loaded, but you can optionally define a comma-separated list of columns. If a table column isn't included in the column list specified in the COPY command, it's loaded with a default value. For more information, see [Loading default column values](c_loading_default_values.md).

Follow this general process to load data from Amazon S3, when your data is uncompressed and delimited:

1. Upload your files to Amazon S3.

1. Run a COPY command to load the table. 

1. Verify that the data was loaded correctly.

For examples of COPY commands, see [COPY examples](r_COPY_command_examples.md). For information about data loaded into Amazon Redshift, check the [STL\$1LOAD\$1COMMITS](r_STL_LOAD_COMMITS.md) and [STL\$1LOAD\$1ERRORS](r_STL_LOAD_ERRORS.md) system tables. 

For more information about nodes and the slices contained in each, see [About clusters and nodes](https://docs.amazonaws.cn/redshift/latest/mgmt/working-with-clusters.html#rs-about-clusters-and-nodes) in the *Amazon Redshift Management Guide*.

# Uploading files to Amazon S3 to use with COPY
Uploading files

There are a couple approaches to take when uploading text files to Amazon S3:
+ If you have compressed files, we recommend that you split large files to take advantage of parallel processing in Amazon Redshift.
+ On the other hand, COPY automatically splits large, uncompressed, text-delimited file data to facilitate parallelism and effectively distribute the data from large files.

Create an Amazon S3 bucket to hold your data files, and then upload the data files to the bucket. For information about creating buckets and uploading files, see [Working with Amazon S3 Buckets](https://docs.amazonaws.cn/AmazonS3/latest/userguide/UsingBucket.html) in the *Amazon Simple Storage Service User Guide.* 

**Important**  
The Amazon S3 bucket that holds the data files must be created in the same Amazon Region as your cluster unless you use the [REGION](copy-parameters-data-source-s3.md#copy-region) option to specify the Region in which the Amazon S3 bucket is located.

Ensure that the S3 IP ranges are added to your allowlist. To learn more about the required S3 IP ranges, see [ Network isolation](https://docs.amazonaws.cn//redshift/latest/mgmt/security-network-isolation.html#network-isolation).

You can create an Amazon S3 bucket in a specific Region either by selecting the Region when you create the bucket by using the Amazon S3 console, or by specifying an endpoint when you create the bucket using the Amazon S3 API or CLI.

Following the data load, verify that the correct files are present on Amazon S3.

**Topics**
+ [

# Managing data consistency
](managing-data-consistency.md)
+ [

# Uploading encrypted data to Amazon S3
](t_uploading-encrypted-data.md)
+ [

# Verifying that the correct files are present in your bucket
](verifying-that-correct-files-are-present.md)

# Managing data consistency


Amazon S3 provides strong read-after-write consistency for COPY, UNLOAD, INSERT (external table), CREATE EXTERNAL TABLE AS, and Amazon Redshift Spectrum operations on Amazon S3 buckets in all Amazon Regions. In addition, read operations on Amazon S3 Select, Amazon S3 Access Control Lists, Amazon S3 Object Tags, and object metadata (for example, HEAD object) are strongly consistent. For more information about data consistency, see [Amazon S3 Data Consistency Model](https://docs.amazonaws.cn/AmazonS3/latest/userguide/Introduction.html#ConsistencyModel) in the *Amazon Simple Storage Service User Guide*.

# Uploading encrypted data to Amazon S3
Uploading encrypted data

Amazon S3 supports both server-side encryption and client-side encryption. This topic discusses the differences between the server-side and client-side encryption and describes the steps to use client-side encryption with Amazon Redshift. Server-side encryption is transparent to Amazon Redshift. 

## Server-side encryption


Server-side encryption is data encryption at rest—that is, Amazon S3 encrypts your data as it uploads it and decrypts it for you when you access it. When you load tables using a COPY command, there is no difference in the way you load from server-side encrypted or unencrypted objects on Amazon S3. For more information about server-side encryption, see [Using Server-Side Encryption](https://docs.amazonaws.cn/AmazonS3/latest/userguide/UsingServerSideEncryption.html) in the *Amazon Simple Storage Service User Guide*.

## Client-side encryption


In client-side encryption, your client application manages encryption of your data, the encryption keys, and related tools. You can upload data to an Amazon S3 bucket using client-side encryption, and then load the data using the COPY command with the ENCRYPTED option and a private encryption key to provide greater security.

You encrypt your data using envelope encryption. With *envelope encryption,* your application handles all encryption exclusively. Your private encryption keys and your unencrypted data are never sent to Amazon, so it's very important that you safely manage your encryption keys. If you lose your encryption keys, you won't be able to unencrypt your data, and you can't recover your encryption keys from Amazon. Envelope encryption combines the performance of fast symmetric encryption while maintaining the greater security that key management with asymmetric keys provides. A one-time-use symmetric key (the envelope symmetric key) is generated by your Amazon S3 encryption client to encrypt your data, then that key is encrypted by your root key and stored alongside your data in Amazon S3. When Amazon Redshift accesses your data during a load, the encrypted symmetric key is retrieved and decrypted with your real key, then the data is decrypted.

To work with Amazon S3 client-side encrypted data in Amazon Redshift, follow the steps outlined in [Protecting Data Using Client-Side Encryption](https://docs.amazonaws.cn/AmazonS3/latest/userguide/UsingClientSideEncryption.html) in the *Amazon Simple Storage Service User Guide*, with the additional requirements that you use:
+ **Symmetric encryption –** The Amazon SDK for Java `AmazonS3EncryptionClient` class uses envelope encryption, described preceding, which is based on symmetric key encryption. Use this class to create an Amazon S3 client to upload client-side encrypted data.
+ **A 256-bit AES root symmetric key –** A root key encrypts the envelope key. You pass the root key to your instance of the `AmazonS3EncryptionClient` class. Save this key, because you will need it to copy data into Amazon Redshift.
+ **Object metadata to store encrypted envelope key –** By default, Amazon S3 stores the envelope key as object metadata for the `AmazonS3EncryptionClient` class. The encrypted envelope key that is stored as object metadata is used during the decryption process. 

**Note**  
If you get a cipher encryption error message when you use the encryption API for the first time, your version of the JDK may have a Java Cryptography Extension (JCE) jurisdiction policy file that limits the maximum key length for encryption and decryption transformations to 128 bits. For information about addressing this issue, go to [Specifying Client-Side Encryption Using the Amazon SDK for Java](https://docs.amazonaws.cn/AmazonS3/latest/userguide/UsingClientSideEncryptionUpload.html) in the *Amazon Simple Storage Service User Guide*. 

For information about loading client-side encrypted files into your Amazon Redshift tables using the COPY command, see [Loading encrypted data files from Amazon S3](c_loading-encrypted-files.md).

## Example: Uploading client-side encrypted data


For an example of how to use the Amazon SDK for Java to upload client-side encrypted data, go to [Protecting data using client-side encryption](https://docs.amazonaws.cn/AmazonS3/latest/userguide/encrypt-client-side-symmetric-master-key.html) in the *Amazon Simple Storage Service User Guide*. 

The second option shows the choices you must make during client-side encryption so that the data can be loaded in Amazon Redshift. Specifically, the example shows using object metadata to store the encrypted envelope key and the use of a 256-bit AES root symmetric key. 

This example provides example code using the Amazon SDK for Java to create a 256-bit AES symmetric root key and save it to a file. Then the example upload an object to Amazon S3 using an S3 encryption client that first encrypts sample data on the client-side. The example also downloads the object and verifies that the data is the same.

# Verifying that the correct files are present in your bucket


After you upload your files to your Amazon S3 bucket, we recommend listing the contents of the bucket to verify that all of the correct files are present and that no unwanted files are present. For example, if the bucket `amzn-s3-demo-bucket` holds a file named `venue.txt.back`, that file will be loaded, perhaps unintentionally, by the following command:

```
COPY venue FROM 's3://amzn-s3-demo-bucket/venue' … ;
```

If you want to control specifically which files are loaded, you can use a manifest file to explicitly list the data files. For more information about using a manifest file, see the [copy_from_s3_manifest_file](copy-parameters-data-source-s3.md#copy-manifest-file) option for the COPY command and [Using a manifest to specify data files](r_COPY_command_examples.md#copy-command-examples-manifest) in the COPY examples. 

For more information about listing the contents of the bucket, see [Listing Object Keys](https://docs.amazonaws.cn/AmazonS3/latest/userguide/ListingKeysUsingAPIs.html) in the *Amazon S3 Developer Guide*.

# Using the COPY command to load from Amazon S3
Using the COPY command

Use the [COPY](r_COPY.md) command to load a table in parallel from data files on Amazon S3. You can specify the files to be loaded by using an Amazon S3 object prefix or by using a manifest file.

The syntax to specify the files to be loaded by using a prefix is as follows:

```
COPY <table_name> FROM 's3://<bucket_name>/<object_prefix>'
authorization;
```

 The manifest file is a JSON-formatted file that lists the data files to be loaded. The syntax to specify the files to be loaded by using a manifest file is as follows:

```
COPY <table_name> FROM 's3://<bucket_name>/<manifest_file>'
authorization
MANIFEST;
```

The table to be loaded must already exist in the database. For information about creating a table, see [CREATE TABLE](r_CREATE_TABLE_NEW.md) in the SQL Reference. 

The values for *authorization* provide the Amazon authorization Amazon Redshift needs to access the Amazon S3 objects. For information about required permissions, see [IAM permissions for COPY, UNLOAD, and CREATE LIBRARY](copy-usage_notes-access-permissions.md#copy-usage_notes-iam-permissions). The preferred method for authentication is to specify the IAM\$1ROLE parameter and provide the Amazon Resource Name (ARN) for an IAM role with the necessary permissions. For more information, see [Role-based access control](copy-usage_notes-access-permissions.md#copy-usage_notes-access-role-based) . 

To authenticate using the IAM\$1ROLE parameter, replace *<aws-account-id>* and *<role-name>* as shown in the following syntax. 

```
IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'
```

The following example shows authentication using an IAM role.

```
COPY customer 
FROM 's3://amzn-s3-demo-bucket/mydata' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
```

For more information about other authorization options, see [Authorization parameters](copy-parameters-authorization.md)

If you want to validate your data without actually loading the table, use the NOLOAD option with the [COPY](r_COPY.md) command.

The following example shows the first few rows of a pipe-delimited data in a file named `venue.txt`.

```
1|Toyota Park|Bridgeview|IL|0
2|Columbus Crew Stadium|Columbus|OH|0
3|RFK Stadium|Washington|DC|0
```

Before uploading the file to Amazon S3, split the file into multiple files so that the COPY command can load it using parallel processing. The number of files should be a multiple of the number of slices in your cluster. Split your load data files so that the files are about equal size, between 1 MB and 1 GB after compression. For more information, see [Loading data from compressed and uncompressed files](t_splitting-data-files.md).

For example, the `venue.txt` file might be split into four files, as follows:

```
venue.txt.1
venue.txt.2
venue.txt.3
venue.txt.4
```

The following COPY command loads the VENUE table using the pipe-delimited data in the data files with the prefix 'venue' in the Amazon S3 bucket `amzn-s3-demo-bucket`. 

**Note**  
The Amazon S3 bucket `amzn-s3-demo-bucket` in the following examples does not exist. For sample COPY commands that use real data in an existing Amazon S3 bucket, see [Load sample data](https://docs.amazonaws.cn/redshift/latest/gsg/cm-dev-t-load-sample-data.html).

```
COPY venue FROM 's3://amzn-s3-demo-bucket/venue'
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
DELIMITER '|';
```

If no Amazon S3 objects with the key prefix 'venue' exist, the load fails.

**Topics**
+ [

# Using a manifest to specify data files
](loading-data-files-using-manifest.md)
+ [

# Loading compressed data files from Amazon S3
](t_loading-gzip-compressed-data-files-from-S3.md)
+ [

# Loading fixed-width data from Amazon S3
](t_loading_fixed_width_data.md)
+ [

# Loading multibyte data from Amazon S3
](t_loading_unicode_data.md)
+ [

# Loading encrypted data files from Amazon S3
](c_loading-encrypted-files.md)

# Using a manifest to specify data files


You can use a manifest to make sure that the COPY command loads all of the required files, and only the required files, for a data load. You can use a manifest to load files from different buckets or files that do not share the same prefix. Instead of supplying an object path for the COPY command, you supply the name of a JSON-formatted text file that explicitly lists the files to be loaded. The URL in the manifest must specify the bucket name and full object path for the file, not just a prefix.

For more information about manifest files, see the COPY example [Using a manifest to specify data files](r_COPY_command_examples.md#copy-command-examples-manifest).

The following example shows the JSON to load files from different buckets and with file names that begin with date stamps.

```
{
  "entries": [
    {"url":"s3://amzn-s3-demo-bucket1/2013-10-04-custdata", "mandatory":true},
    {"url":"s3://amzn-s3-demo-bucket1/2013-10-05-custdata", "mandatory":true},
    {"url":"s3://amzn-s3-demo-bucket2/2013-10-04-custdata", "mandatory":true},
    {"url":"s3://amzn-s3-demo-bucket2/2013-10-05-custdata", "mandatory":true}
  ]
}
```

The optional `mandatory` flag specifies whether COPY should return an error if the file is not found. The default of `mandatory` is `false`. Regardless of any mandatory settings, COPY will terminate if no files are found. 

The following example runs the COPY command with the manifest in the previous example, which is named `cust.manifest`. 

```
COPY customer
FROM 's3://amzn-s3-demo-bucket/cust.manifest' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
MANIFEST;
```

## Using a manifest created by UNLOAD


A manifest created by an [UNLOAD](r_UNLOAD.md) operation using the MANIFEST parameter might have keys that are not required for the COPY operation. For example, the following `UNLOAD` manifest includes a `meta` key that is required for an Amazon Redshift Spectrum external table and for loading data files in an `ORC` or `Parquet` file format. The `meta` key contains a `content_length` key with a value that is the actual size of the file in bytes. The COPY operation requires only the `url` key and an optional `mandatory` key.

```
{
  "entries": [
    {"url":"s3://amzn-s3-demo-bucket/unload/manifest_0000_part_00", "meta": { "content_length": 5956875 }},
    {"url":"s3://amzn-s3-demo-bucket/unload/unload/manifest_0001_part_00", "meta": { "content_length": 5997091 }}
 ]
}
```

For more information about manifest files, see [Using a manifest to specify data files](r_COPY_command_examples.md#copy-command-examples-manifest).

# Loading compressed data files from Amazon S3
Loading compressed files

To load data files that are compressed using gzip, lzop, or bzip2, include the corresponding option: GZIP, LZOP, or BZIP2.

For example, the following command loads from files that were compressing using lzop.

```
COPY customer FROM 's3://amzn-s3-demo-bucket/customer.lzo' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
DELIMITER '|' LZOP;
```

**Note**  
If you compress a data file with lzop compression and use the *--filter* option, the COPY command doesn't support it.

# Loading fixed-width data from Amazon S3
Loading fixed-width data

Fixed-width data files have uniform lengths for each column of data. Each field in a fixed-width data file has exactly the same length and position. For character data (CHAR and VARCHAR) in a fixed-width data file, you must include leading or trailing spaces as placeholders in order to keep the width uniform. For integers, you must use leading zeros as placeholders. A fixed-width data file has no delimiter to separate columns.

To load a fixed-width data file into an existing table, USE the FIXEDWIDTH parameter in the COPY command. Your table specifications must match the value of fixedwidth\$1spec in order for the data to load correctly.

To load fixed-width data from a file to a table, issue the following command:

```
COPY table_name FROM 's3://amzn-s3-demo-bucket/prefix' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole' 
FIXEDWIDTH 'fixedwidth_spec';
```

The *fixedwidth\$1spec* parameter is a string that contains an identifier for each column and the width of each column, separated by a colon. The **column:width** pairs are delimited by commas. The identifier can be anything that you choose: numbers, letters, or a combination of the two. The identifier has no relation to the table itself, so the specification must contain the columns in the same order as the table.

The following two examples show the same specification, with the first using numeric identifiers and the second using string identifiers:

```
'0:3,1:25,2:12,3:2,4:6'
```

```
'venueid:3,venuename:25,venuecity:12,venuestate:2,venueseats:6'
```

The following example shows fixed-width sample data that could be loaded into the VENUE table using the preceding specifications:

```
1  Toyota Park               Bridgeview  IL0
2  Columbus Crew Stadium     Columbus    OH0
3  RFK Stadium               Washington  DC0
4  CommunityAmerica Ballpark Kansas City KS0
5  Gillette Stadium          Foxborough  MA68756
```

The following COPY command loads this data set into the VENUE table:

```
COPY venue
FROM 's3://amzn-s3-demo-bucket/data/venue_fw.txt' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole' 
FIXEDWIDTH 'venueid:3,venuename:25,venuecity:12,venuestate:2,venueseats:6';
```

# Loading multibyte data from Amazon S3
Loading multibyte data

If your data includes non-ASCII multibyte characters (such as Chinese or Cyrillic characters), you must load the data to VARCHAR columns. The VARCHAR data type supports four-byte UTF-8 characters, but the CHAR data type only accepts single-byte ASCII characters. You cannot load five-byte or longer characters into Amazon Redshift tables. For more information about CHAR and VARCHAR, see [Data types](c_Supported_data_types.md).

To check which encoding an input file uses, use the Linux * `file` * command: 

```
$ file ordersdata.txt
ordersdata.txt: ASCII English text
$ file uni_ordersdata.dat
uni_ordersdata.dat: UTF-8 Unicode text
```

# Loading encrypted data files from Amazon S3
Loading encrypted data files

You can use the COPY command to load data files that were uploaded to Amazon S3 using server-side encryption, client-side encryption, or both. 

The COPY command supports the following types of Amazon S3 encryption:
+ Server-side encryption with Amazon S3-managed keys (SSE-S3)
+ Server-side encryption with Amazon KMS keys (SSE-KMS)
+ Client-side encryption using a client-side symmetric root key

The COPY command doesn't support the following types of Amazon S3 encryption:
+ Server-side encryption with customer-provided keys (SSE-C)
+ Client-side encryption using an Amazon KMS key
+ Client-side encryption using a customer-provided asymmetric root key

For more information about Amazon S3 encryption, see [ Protecting Data Using Server-Side Encryption](https://docs.amazonaws.cn/AmazonS3/latest/userguide/serv-side-encryption.html) and [Protecting Data Using Client-Side Encryption](https://docs.amazonaws.cn/AmazonS3/latest/userguide/UsingClientSideEncryption.html) in the Amazon Simple Storage Service User Guide.

The [UNLOAD](r_UNLOAD.md) command automatically encrypts files using SSE-S3. You can also unload using SSE-KMS or client-side encryption with a customer managed symmetric key. For more information, see [Unloading encrypted data files](t_unloading_encrypted_files.md)

The COPY command automatically recognizes and loads files encrypted using SSE-S3 and SSE-KMS. You can load files encrypted using a client-side symmetric root key by specifying the ENCRYPTED option and providing the key value. For more information, see [Uploading encrypted data to Amazon S3](t_uploading-encrypted-data.md).

To load client-side encrypted data files, provide the root key value using the MASTER\$1SYMMETRIC\$1KEY parameter and include the ENCRYPTED option.

```
COPY customer FROM 's3://amzn-s3-demo-bucket/encrypted/customer' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
MASTER_SYMMETRIC_KEY '<root_key>' 
ENCRYPTED
DELIMITER '|';
```

To load encrypted data files that are gzip, lzop, or bzip2 compressed, include the GZIP, LZOP, or BZIP2 option along with the root key value and the ENCRYPTED option.

```
COPY customer FROM 's3://amzn-s3-demo-bucket/encrypted/customer' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
MASTER_SYMMETRIC_KEY '<root_key>'
ENCRYPTED 
DELIMITER '|' 
GZIP;
```

# Loading data from Amazon EMR


You can use the COPY command to load data in parallel from an Amazon EMR cluster configured to write text files to the cluster's Hadoop Distributed File System (HDFS) as fixed-width files, character-delimited files, CSV files, or JSON-formatted files.

## Process for loading data from Amazon EMR


This section walks you through the process of loading data from an Amazon EMR cluster. The following sections provide the details that you must accomplish each step.
+ **[Step 1: Configure IAM permissions](#load-from-emr-steps-configure-iam)**

  The users that create the Amazon EMR cluster and run the Amazon Redshift COPY command must have the necessary permissions.
+ **[Step 2: Create an Amazon EMR cluster](#load-from-emr-steps-create-cluster)**

  Configure the cluster to output text files to the Hadoop Distributed File System (HDFS). You will need the Amazon EMR cluster ID and the cluster's main public DNS (the endpoint for the Amazon EC2 instance that hosts the cluster). 
+ **[Step 3: Retrieve the Amazon Redshift cluster public key and cluster node IP addresses](#load-from-emr-steps-retrieve-key-and-ips)**

  The public key enables the Amazon Redshift cluster nodes to establish SSH connections to the hosts. You will use the IP address for each cluster node to configure the host security groups to permit access from your Amazon Redshift cluster using these IP addresses. 
+ **[Step 4: Add the Amazon Redshift cluster public key to each Amazon EC2 host's authorized keys file](#load-from-emr-steps-add-key-to-host)** 

  You add the Amazon Redshift cluster public key to the host's authorized keys file so that the host will recognize the Amazon Redshift cluster and accept the SSH connection. 
+ **[Step 5: Configure the hosts to accept all of the Amazon Redshift cluster's IP addresses](#load-from-emr-steps-configure-security-groups)** 

  Modify the Amazon EMR instance's security groups to add input rules to accept the Amazon Redshift IP addresses.
+ **[Step 6: Run the COPY command to load the data](#load-from-emr-steps-run-copy)**

  From an Amazon Redshift database, run the COPY command to load the data into an Amazon Redshift table. 

## Step 1: Configure IAM permissions


The users that create the Amazon EMR cluster and run the Amazon Redshift COPY command must have the necessary permissions.

**To configure IAM permissions**

1. Add the following permissions for the user that will create the Amazon EMR cluster.

   ```
   ec2:DescribeSecurityGroups
   ec2:RevokeSecurityGroupIngress
   ec2:AuthorizeSecurityGroupIngress
   redshift:DescribeClusters
   ```

1. Add the following permission for the IAM role or user that will run the COPY command.

   ```
   elasticmapreduce:ListInstances
   ```

1. Add the following permission to the Amazon EMR cluster's IAM role.

   ```
   redshift:DescribeClusters
   ```

## Step 2: Create an Amazon EMR cluster


The COPY command loads data from files on the Amazon EMR Hadoop Distributed File System (HDFS). When you create the Amazon EMR cluster, configure the cluster to output data files to the cluster's HDFS.

**To create an Amazon EMR cluster**

1. Create an Amazon EMR cluster in the same Amazon Region as the Amazon Redshift cluster. 

   If the Amazon Redshift cluster is in a VPC, the Amazon EMR cluster must be in the same VPC group. If the Amazon Redshift cluster uses EC2-Classic mode (that is, it is not in a VPC), the Amazon EMR cluster must also use EC2-Classic mode. For more information, see [Managing Clusters in Virtual Private Cloud (VPC)](https://docs.amazonaws.cn/redshift/latest/mgmt/managing-clusters-vpc.html) in the *Amazon Redshift Management Guide*.

1. Configure the cluster to output data files to the cluster's HDFS. The HDFS file names must not include asterisks (\$1) or question marks (?).
**Important**  
The file names must not include asterisks ( \$1 ) or question marks ( ? ).

1. Specify **No** for the **Auto-terminate** option in the Amazon EMR cluster configuration so that the cluster remains available while the COPY command runs. 
**Important**  
If any of the data files are changed or deleted before the COPY completes, you might have unexpected results, or the COPY operation might fail. 

1. Note the cluster ID and the main public DNS (the endpoint for the Amazon EC2 instance that hosts the cluster). You will use that information in later steps. 

## Step 3: Retrieve the Amazon Redshift cluster public key and cluster node IP addresses


You will use the IP address for each cluster node to configure the host security groups to permit access from your Amazon Redshift cluster using these IP addresses.

**To retrieve the Amazon Redshift cluster public key and cluster node IP addresses for your cluster using the console**

1. Access the Amazon Redshift Management Console. 

1. Choose the **Clusters** link in the navigation pane. 

1. Select your cluster from the list. 

1. Locate the **SSH Ingestion Settings** group. 

   Note the **Cluster Public Key** and **Node IP addresses**. You will use them in later steps.   
![\[Screenshot from the SSH Ingestion Settings group showing the Cluster Public Key and Node IP addresses.\]](http://docs.amazonaws.cn/en_us/redshift/latest/dg/images/copy-from-ssh-console-2.png)

   You will use the private IP addresses in Step 3 to configure the Amazon EC2 host to accept the connection from Amazon Redshift. 

To retrieve the cluster public key and cluster node IP addresses for your cluster using the Amazon Redshift CLI, run the describe-clusters command. For example:

```
aws redshift describe-clusters --cluster-identifier <cluster-identifier> 
```

The response will include a ClusterPublicKey value and the list of private and public IP addresses, similar to the following:

```
{
    "Clusters": [
        {
            "VpcSecurityGroups": [], 
            "ClusterStatus": "available", 
            "ClusterNodes": [
                {
                    "PrivateIPAddress": "10.nnn.nnn.nnn", 
                    "NodeRole": "LEADER", 
                    "PublicIPAddress": "10.nnn.nnn.nnn"
                }, 
                {
                    "PrivateIPAddress": "10.nnn.nnn.nnn", 
                    "NodeRole": "COMPUTE-0", 
                    "PublicIPAddress": "10.nnn.nnn.nnn"
                }, 
                {
                    "PrivateIPAddress": "10.nnn.nnn.nnn", 
                    "NodeRole": "COMPUTE-1", 
                    "PublicIPAddress": "10.nnn.nnn.nnn"
                }
            ], 
            "AutomatedSnapshotRetentionPeriod": 1, 
            "PreferredMaintenanceWindow": "wed:05:30-wed:06:00", 
            "AvailabilityZone": "us-east-1a", 
            "NodeType": "dc2.large", 
            "ClusterPublicKey": "ssh-rsa AAAABexamplepublickey...Y3TAl Amazon-Redshift", 
             ...
             ...
}
```

To retrieve the cluster public key and cluster node IP addresses for your cluster using the Amazon Redshift API, use the `DescribeClusters` action. For more information, see [describe-clusters](https://docs.amazonaws.cn/cli/latest/reference/redshift/describe-clusters.html) in the *Amazon Redshift CLI Guide* or [DescribeClusters](https://docs.amazonaws.cn/redshift/latest/APIReference/API_DescribeClusters.html) in the Amazon Redshift API Guide. 

## Step 4: Add the Amazon Redshift cluster public key to each Amazon EC2 host's authorized keys file


You add the cluster public key to each host's authorized keys file for all of the Amazon EMR cluster nodes so that the hosts will recognize Amazon Redshift and accept the SSH connection. 

**To add the Amazon Redshift cluster public key to the host's authorized keys file**

1. Access the host using an SSH connection. 

   For information about connecting to an instance using SSH, see [Connect to Your Instance](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/ec2-connect-to-instance-linux.html) in the *Amazon EC2 User Guide*. 

1. Copy the Amazon Redshift public key from the console or from the CLI response text. 

1. Copy and paste the contents of the public key into the `/home/<ssh_username>/.ssh/authorized_keys` file on the host. Include the complete string, including the prefix "`ssh-rsa` " and suffix "`Amazon-Redshift`". For example: 

   ```
   ssh-rsa AAAACTP3isxgGzVWoIWpbVvRCOzYdVifMrh… uA70BnMHCaMiRdmvsDOedZDOedZ Amazon-Redshift
   ```

## Step 5: Configure the hosts to accept all of the Amazon Redshift cluster's IP addresses


 To allow inbound traffic to the host instances, edit the security group and add one Inbound rule for each Amazon Redshift cluster node. For **Type**, select SSH with TCP protocol on Port 22. For **Source**, enter the Amazon Redshift cluster node private IP addresses you retrieved in [Step 3: Retrieve the Amazon Redshift cluster public key and cluster node IP addresses](#load-from-emr-steps-retrieve-key-and-ips). For information about adding rules to an Amazon EC2 security group, see [Authorizing Inbound Traffic for Your Instances](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html) in the *Amazon EC2 User Guide*. 

## Step 6: Run the COPY command to load the data


Run a [COPY](r_COPY.md) command to connect to the Amazon EMR cluster and load the data into an Amazon Redshift table. The Amazon EMR cluster must continue running until the COPY command completes. For example, do not configure the cluster to auto-terminate. 

**Important**  
If any of the data files are changed or deleted before the COPY completes, you might have unexpected results, or the COPY operation might fail.

In the COPY command, specify the Amazon EMR cluster ID and the HDFS file path and file name. 

```
COPY sales
FROM 'emr://myemrclusterid/myoutput/part*' CREDENTIALS 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
```

You can use the wildcard characters asterisk ( `*` ) and question mark ( `?` ) as part of the file name argument. For example, `part*` loads the files `part-0000`, `part-0001`, and so on. If you specify only a folder name, COPY attempts to load all files in the folder.

**Important**  
If you use wildcard characters or use only the folder name, verify that no unwanted files will be loaded or the COPY command will fail. For example, some processes might write a log file to the output folder.

# Loading data from remote hosts


You can use the COPY command to load data in parallel from one or more remote hosts, such as Amazon EC2 instances or other computers. COPY connects to the remote hosts using SSH and runs commands on the remote hosts to generate text output. 

The remote host can be an Amazon EC2 Linux instance or another Unix or Linux computer configured to accept SSH connections. This guide assumes your remote host is an Amazon EC2 instance. Where the procedure is different for another computer, the guide will point out the difference. 

Amazon Redshift can connect to multiple hosts, and can open multiple SSH connections to each host. Amazon Redshifts sends a unique command through each connection to generate text output to the host's standard output, which Amazon Redshift then reads as it would a text file.

## Before you begin


Before you begin, you should have the following in place: 
+ One or more host machines, such as Amazon EC2 instances, that you can connect to using SSH.
+ Data sources on the hosts. 

  You will provide commands that the Amazon Redshift cluster will run on the hosts to generate the text output. After the cluster connects to a host, the COPY command runs the commands, reads the text from the hosts' standard output, and loads the data in parallel into an Amazon Redshift table. The text output must be in a form that the COPY command can ingest. For more information, see [Preparing your input data](t_preparing-input-data.md)
+ Access to the hosts from your computer. 

  For an Amazon EC2 instance, you will use an SSH connection to access the host. You must access the host to add the Amazon Redshift cluster's public key to the host's authorized keys file.
+ A running Amazon Redshift cluster. 

  For information about how to launch a cluster, see [Amazon Redshift Getting Started Guide](https://docs.amazonaws.cn/redshift/latest/gsg/). 

## Loading data process


This section walks you through the process of loading data from remote hosts. The following sections provide the details that that you must accomplish in each step.
+ **[Step 1: Retrieve the cluster public key and cluster node IP addresses](#load-from-host-steps-retrieve-key-and-ips)**

  The public key enables the Amazon Redshift cluster nodes to establish SSH connections to the remote hosts. You will use the IP address for each cluster node to configure the host security groups or firewall to permit access from your Amazon Redshift cluster using these IP addresses. 
+ **[Step 2: Add the Amazon Redshift cluster public key to the host's authorized keys file](#load-from-host-steps-add-key-to-host)**

  You add the Amazon Redshift cluster public key to the host's authorized keys file so that the host will recognize the Amazon Redshift cluster and accept the SSH connection. 
+ **[Step 3: Configure the host to accept all of the Amazon Redshift cluster's IP addresses](#load-from-host-steps-configure-security-groups)** 

  For Amazon EC2, modify the instance's security groups to add input rules to accept the Amazon Redshift IP addresses. For other hosts, modify the firewall so that your Amazon Redshift nodes are able to establish SSH connections to the remote host. 
+ **[Step 4: Get the public key for the host](#load-from-host-steps-get-the-host-key)**

  You can optionally specify that Amazon Redshift should use the public key to identify the host. You must locate the public key and copy the text into your manifest file. 
+ **[Step 5: Create a manifest file](#load-from-host-steps-create-manifest)** 

  The manifest is a JSON-formatted text file with the details Amazon Redshift needs to connect to the hosts and fetch the data. 
+ **[Step 6: Upload the manifest file to an Amazon S3 bucket](#load-from-host-steps-upload-manifest)** 

  Amazon Redshift reads the manifest and uses that information to connect to the remote host. If the Amazon S3 bucket does not reside in the same Region as your Amazon Redshift cluster, you must use the [REGION](copy-parameters-data-source-s3.md#copy-region) option to specify the Region in which the data is located.
+ **[Step 7: Run the COPY command to load the data](#load-from-host-steps-run-copy)**

  From an Amazon Redshift database, run the COPY command to load the data into an Amazon Redshift table. 

## Step 1: Retrieve the cluster public key and cluster node IP addresses


You will use the IP address for each cluster node to configure the host security groups to permit access from your Amazon Redshift cluster using these IP addresses.

**To retrieve the cluster public key and cluster node IP addresses for your cluster using the console**

1. Access the Amazon Redshift Management Console.

1. Choose the **Clusters** link in the navigation pane.

1. Select your cluster from the list. 

1. Locate the **SSH Ingestion Settings** group.

   Note the **Cluster Public Key** and **Node IP addresses**. You will use them in later steps.  
![\[Screenshot from the SSH Ingestion Settings group showing the Cluster Public Key and Node IP addresses.\]](http://docs.amazonaws.cn/en_us/redshift/latest/dg/images/copy-from-ssh-console-2.png)

   You will use the IP addresses in Step 3 to configure the host to accept the connection from Amazon Redshift. Depending on what type of host you connect to and whether it is in a VPC, you will use either the public IP addresses or the private IP addresses.

To retrieve the cluster public key and cluster node IP addresses for your cluster using the Amazon Redshift CLI, run the describe-clusters command. 

For example: 

```
aws redshift describe-clusters --cluster-identifier <cluster-identifier> 
```

 The response will include the ClusterPublicKey and the list of private and public IP addresses, similar to the following: 

```
{
    "Clusters": [
        {
            "VpcSecurityGroups": [], 
            "ClusterStatus": "available", 
            "ClusterNodes": [
                {
                    "PrivateIPAddress": "10.nnn.nnn.nnn", 
                    "NodeRole": "LEADER", 
                    "PublicIPAddress": "10.nnn.nnn.nnn"
                }, 
                {
                    "PrivateIPAddress": "10.nnn.nnn.nnn", 
                    "NodeRole": "COMPUTE-0", 
                    "PublicIPAddress": "10.nnn.nnn.nnn"
                }, 
                {
                    "PrivateIPAddress": "10.nnn.nnn.nnn", 
                    "NodeRole": "COMPUTE-1", 
                    "PublicIPAddress": "10.nnn.nnn.nnn"
                }
            ], 
            "AutomatedSnapshotRetentionPeriod": 1, 
            "PreferredMaintenanceWindow": "wed:05:30-wed:06:00", 
            "AvailabilityZone": "us-east-1a", 
            "NodeType": "dc2.large", 
            "ClusterPublicKey": "ssh-rsa AAAABexamplepublickey...Y3TAl Amazon-Redshift", 
             ...
             ...
}
```

To retrieve the cluster public key and cluster node IP addresses for your cluster using the Amazon Redshift API, use the DescribeClusters action. For more information, see [describe-clusters](https://docs.amazonaws.cn/cli/latest/reference/redshift/describe-clusters.html) in the *Amazon Redshift CLI Guide* or [DescribeClusters](https://docs.amazonaws.cn/redshift/latest/APIReference/API_DescribeClusters.html) in the Amazon Redshift API Guide. 

## Step 2: Add the Amazon Redshift cluster public key to the host's authorized keys file


You add the cluster public key to each host's authorized keys file so that the host will recognize Amazon Redshift and accept the SSH connection. 

**To add the Amazon Redshift cluster public key to the host's authorized keys file**

1. Access the host using an SSH connection. 

   For information about connecting to an instance using SSH, see [Connect to Your Instance](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/ec2-connect-to-instance-linux.html) in the *Amazon EC2 User Guide*. 

1. Copy the Amazon Redshift public key from the console or from the CLI response text. 

1. Copy and paste the contents of the public key into the `/home/<ssh_username>/.ssh/authorized_keys` file on the remote host. The `<ssh_username>` must match the value for the "username" field in the manifest file. Include the complete string, including the prefix "`ssh-rsa` " and suffix "`Amazon-Redshift`". For example: 

   ```
   ssh-rsa AAAACTP3isxgGzVWoIWpbVvRCOzYdVifMrh… uA70BnMHCaMiRdmvsDOedZDOedZ Amazon-Redshift
   ```

## Step 3: Configure the host to accept all of the Amazon Redshift cluster's IP addresses


 If you are working with an Amazon EC2 instance or an Amazon EMR cluster, add Inbound rules to the host's security group to allow traffic from each Amazon Redshift cluster node. For **Type**, select SSH with TCP protocol on Port 22. For **Source**, enter the Amazon Redshift cluster node IP addresses you retrieved in [Step 1: Retrieve the cluster public key and cluster node IP addresses](#load-from-host-steps-retrieve-key-and-ips). For information about adding rules to an Amazon EC2 security group, see [Authorizing Inbound Traffic for Your Instances](https://docs.amazonaws.cn/AWSEC2/latest/UserGuide/authorizing-access-to-an-instance.html) in the *Amazon EC2 User Guide*. 

Use the private IP addresses when: 
+ You have an Amazon Redshift cluster that is not in a Virtual Private Cloud (VPC), and an Amazon EC2 -Classic instance, both of which are in the same Amazon Region. 
+  You have an Amazon Redshift cluster that is in a VPC, and an Amazon EC2 -VPC instance, both of which are in the same Amazon Region and in the same VPC.

 Otherwise, use the public IP addresses.

For more information about using Amazon Redshift in a VPC, see [Managing Clusters in Virtual Private Cloud (VPC)](https://docs.amazonaws.cn/redshift/latest/mgmt/managing-clusters-vpc.html) in the *Amazon Redshift Management Guide*. 

## Step 4: Get the public key for the host


You can optionally provide the host's public key in the manifest file so that Amazon Redshift can identify the host. The COPY command does not require the host public key but, for security reasons, we strongly recommend using a public key to help prevent 'man-in-the-middle' attacks. 

You can find the host's public key in the following location, where `<ssh_host_rsa_key_name>` is the unique name for the host's public key: 

```
:  /etc/ssh/<ssh_host_rsa_key_name>.pub
```

**Note**  
Amazon Redshift only supports RSA keys. We do not support DSA keys.

When you create your manifest file in Step 5, you will paste the text of the public key into the "Public Key" field in the manifest file entry.

## Step 5: Create a manifest file


The COPY command can connect to multiple hosts using SSH, and can create multiple SSH connections to each host. COPY runs a command through each host connection, and then loads the output from the commands in parallel into the table. The manifest file is a text file in JSON format that Amazon Redshift uses to connect to the host. The manifest file specifies the SSH host endpoints and the commands that are run on the hosts to return data to Amazon Redshift. Optionally, you can include the host public key, the login user name, and a mandatory flag for each entry.

Create the manifest file on your local computer. In a later step, you upload the file to Amazon S3. 

The manifest file is in the following format:

```
{ 
   "entries": [ 
     {"endpoint":"<ssh_endpoint_or_IP>", 
       "command": "<remote_command>",
       "mandatory":true, 
       "publickey": "<public_key>", 
       "username": "<host_user_name>"}, 
     {"endpoint":"<ssh_endpoint_or_IP>", 
       "command": "<remote_command>",
       "mandatory":true, 
       "publickey": "<public_key>", 
       "username": "host_user_name"} 
    ] 
}
```

The manifest file contains one "entries" construct for each SSH connection. Each entry represents a single SSH connection. You can have multiple connections to a single host or multiple connections to multiple hosts. The double quotation marks are required as shown, both for the field names and the values. The only value that does not need double quotation marks is the Boolean value **true** or **false** for the mandatory field. 

The following describes the fields in the manifest file. 

endpoint  
The URL address or IP address of the host. For example, "`ec2-111-222-333.compute-1.amazonaws.com.cn`" or "`22.33.44.56`" 

command   
The command that will be run by the host to generate text or binary (gzip, lzop, or bzip2) output. The command can be any command that the user *"host\$1user\$1name"* has permission to run. The command can be as simple as printing a file, or it could query a database or launch a script. The output (text file, gzip binary file, lzop binary file, or bzip2 binary file) must be in a form the Amazon Redshift COPY command can ingest. For more information, see [Preparing your input data](t_preparing-input-data.md).

publickey  
(Optional) The public key of the host. If provided, Amazon Redshift will use the public key to identify the host. If the public key is not provided, Amazon Redshift will not attempt host identification. For example, if the remote host's public key is: `ssh-rsa AbcCbaxxx…xxxDHKJ root@amazon.com`, enter the following text in the public key field: `AbcCbaxxx…xxxDHKJ`. 

mandatory  
(Optional) Indicates whether the COPY command should fail if the connection fails. The default is `false`. If Amazon Redshift does not successfully make at least one connection, the COPY command fails.

username  
(Optional) The username that will be used to log on to the host system and run the remote command. The user login name must be the same as the login that was used to add the public key to the host's authorized keys file in Step 2. The default username is "redshift".

The following example shows a completed manifest to open four connections to the same host and run a different command through each connection:

```
{ 
  "entries": [ 
       {"endpoint":"ec2-184-72-204-112.compute-1.amazonaws.com.cn", 
          "command": "cat loaddata1.txt", 
          "mandatory":true, 
          "publickey": "ec2publickeyportionoftheec2keypair", 
          "username": "ec2-user"}, 
       {"endpoint":"ec2-184-72-204-112.compute-1.amazonaws.com.cn", 
          "command": "cat loaddata2.txt", 
          "mandatory":true, 
          "publickey": "ec2publickeyportionoftheec2keypair", 
          "username": "ec2-user"},
       {"endpoint":"ec2-184-72-204-112.compute-1.amazonaws.com.cn", 
          "command": "cat loaddata3.txt", 
          "mandatory":true, 
          "publickey": "ec2publickeyportionoftheec2keypair", 
          "username": "ec2-user"},
       {"endpoint":"ec2-184-72-204-112.compute-1.amazonaws.com.cn", 
          "command": "cat loaddata4.txt", 
          "mandatory":true, 
          "publickey": "ec2publickeyportionoftheec2keypair", 
          "username": "ec2-user"}
     ] 
}
```

## Step 6: Upload the manifest file to an Amazon S3 bucket


Upload the manifest file to an Amazon S3 bucket. If the Amazon S3 bucket does not reside in the same Amazon Region as your Amazon Redshift cluster, you must use the [REGION](copy-parameters-data-source-s3.md#copy-region) option to specify the Amazon Region in which the manifest is located. For information about creating an Amazon S3 bucket and uploading a file, see [Amazon Simple Storage Service User Guide](https://docs.amazonaws.cn/AmazonS3/latest/userguide/). 

## Step 7: Run the COPY command to load the data


Run a [COPY](r_COPY.md) command to connect to the host and load the data into an Amazon Redshift table. In the COPY command, specify the explicit Amazon S3 object path for the manifest file and include the SSH option. For example, 

```
COPY sales
FROM 's3://amzn-s3-demo-bucket/ssh_manifest'  
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
DELIMITER '|'
SSH;
```

**Note**  
If you use automatic compression, the COPY command performs two data reads, which means it runs the remote command twice. The first read is to provide a sample for compression analysis, then the second read actually loads the data. If running the remote command twice might cause a problem because of potential side effects, you should turn off automatic compression. To turn off automatic compression, run the COPY command with the COMPUPDATE option set to OFF. For more information, see [Loading tables with automatic compression](c_Loading_tables_auto_compress.md). 

# Loading data from an Amazon DynamoDB table
Loading from Amazon DynamoDB

You can use the COPY command to load a table with data from a single Amazon DynamoDB table.

**Important**  
The Amazon DynamoDB table that provides the data must be created in the same Amazon Region as your cluster unless you use the [REGION](copy-parameters-data-source-s3.md#copy-region) option to specify the Amazon Region in which the Amazon DynamoDB table is located.

The COPY command uses the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from an Amazon DynamoDB table. You can take maximum advantage of parallel processing by setting distribution styles on your Amazon Redshift tables. For more information, see [Data distribution for query optimization](t_Distributing_data.md).

**Important**  
When the COPY command reads data from the Amazon DynamoDB table, the resulting data transfer is part of that table's provisioned throughput.

To avoid consuming excessive amounts of provisioned read throughput, we recommend that you not load data from Amazon DynamoDB tables that are in production environments. If you do load data from production tables, we recommend that you set the READRATIO option much lower than the average percentage of unused provisioned throughput. A low READRATIO setting will help minimize throttling issues. To use the entire provisioned throughput of an Amazon DynamoDB table, set READRATIO to 100.

The COPY command matches attribute names in the items retrieved from the DynamoDB table to column names in an existing Amazon Redshift table by using the following rules:
+ Amazon Redshift table columns are case-insensitively matched to Amazon DynamoDB item attributes. If an item in the DynamoDB table contains multiple attributes that differ only in case, such as Price and PRICE, the COPY command will fail.
+ Amazon Redshift table columns that do not match an attribute in the Amazon DynamoDB table are loaded as either NULL or empty, depending on the value specified with the EMPTYASNULL option in the [COPY](r_COPY.md) command.
+ Amazon DynamoDB attributes that do not match a column in the Amazon Redshift table are discarded. Attributes are read before they are matched, and so even discarded attributes consume part of that table's provisioned throughput.
+ Only Amazon DynamoDB attributes with scalar STRING and NUMBER data types are supported. The Amazon DynamoDB BINARY and SET data types are not supported. If a COPY command tries to load an attribute with an unsupported data type, the command will fail. If the attribute does not match an Amazon Redshift table column, COPY does not attempt to load it, and it does not raise an error.

The COPY command uses the following syntax to load data from an Amazon DynamoDB table:

```
COPY <redshift_tablename> FROM 'dynamodb://<dynamodb_table_name>'
authorization
readratio '<integer>';
```

The values for *authorization* are the Amazon credentials needed to access the Amazon DynamoDB table. If these credentials correspond to a user, that user must have permission to SCAN and DESCRIBE the Amazon DynamoDB table that is being loaded.

The values for *authorization* provide the Amazon authorization your cluster needs to access the Amazon DynamoDB table. The permission must include SCAN and DESCRIBE for the Amazon DynamoDB table that is being loaded. For more information about required permissions, see [IAM permissions for COPY, UNLOAD, and CREATE LIBRARY](copy-usage_notes-access-permissions.md#copy-usage_notes-iam-permissions). The preferred method for authentication is to specify the IAM\$1ROLE parameter and provide the Amazon Resource Name (ARN) for an IAM role with the necessary permissions. For more information, see [Role-based access control](copy-usage_notes-access-permissions.md#copy-usage_notes-access-role-based). 

To authenticate using the IAM\$1ROLE parameter, *<aws-account-id>* and *<role-name>* as shown in the following syntax. 

```
IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'
```

The following example shows authentication using an IAM role.

```
COPY favoritemovies 
FROM 'dynamodb://ProductCatalog'
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole';
```

For more information about other authorization options, see [Authorization parameters](copy-parameters-authorization.md)

If you want to validate your data without actually loading the table, use the NOLOAD option with the [COPY](r_COPY.md) command.

The following example loads the FAVORITEMOVIES table with data from the DynamoDB table my-favorite-movies-table. The read activity can consume up to 50% of the provisioned throughput.

```
COPY favoritemovies FROM 'dynamodb://my-favorite-movies-table' 
IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole' 
READRATIO 50;
```

To maximize throughput, the COPY command loads data from an Amazon DynamoDB table in parallel across the compute nodes in the cluster.

## Provisioned throughput with automatic compression
Provisioned throughput

By default, the COPY command applies automatic compression whenever you specify an empty target table with no compression encoding. The automatic compression analysis initially samples a large number of rows from the Amazon DynamoDB table. The sample size is based on the value of the COMPROWS parameter. The default is 100,000 rows per slice.

After sampling, the sample rows are discarded and the entire table is loaded. As a result, many rows are read twice. For more information about how automatic compression works, see [Loading tables with automatic compression](c_Loading_tables_auto_compress.md).

**Important**  
When the COPY command reads data from the Amazon DynamoDB table, including the rows used for sampling, the resulting data transfer is part of that table's provisioned throughput.

## Loading multibyte data from Amazon DynamoDB
Loading multibyte data

If your data includes non-ASCII multibyte characters (such as Chinese or Cyrillic characters), you must load the data to VARCHAR columns. The VARCHAR data type supports four-byte UTF-8 characters, but the CHAR data type only accepts single-byte ASCII characters. You cannot load five-byte or longer characters into Amazon Redshift tables. For more information about CHAR and VARCHAR, see [Data types](c_Supported_data_types.md).

# Verifying that the data loaded correctly


After the load operation is complete, query the [STL\$1LOAD\$1COMMITS](r_STL_LOAD_COMMITS.md) system table to verify that the expected files were loaded. Run the COPY command and load verification within the same transaction so that if there is problem with the load you can roll back the entire transaction.

The following query returns entries for loading the tables in the TICKIT database:

```
SELECT query, trim(filename) AS filename, curtime, status
FROM stl_load_commits
WHERE filename like '%tickit%' order by query;


 query |         filename          |          curtime           | status
-------+---------------------------+----------------------------+--------
 22475 | tickit/allusers_pipe.txt  | 2013-02-08 20:58:23.274186 |      1
 22478 | tickit/venue_pipe.txt     | 2013-02-08 20:58:25.070604 |      1
 22480 | tickit/category_pipe.txt  | 2013-02-08 20:58:27.333472 |      1
 22482 | tickit/date2008_pipe.txt  | 2013-02-08 20:58:28.608305 |      1
 22485 | tickit/allevents_pipe.txt | 2013-02-08 20:58:29.99489  |      1
 22487 | tickit/listings_pipe.txt  | 2013-02-08 20:58:37.632939 |      1
 22489 | tickit/sales_tab.txt      | 2013-02-08 20:58:37.632939 |      1
(6 rows)
```

# Validating input data
Validating input data

To validate the data in the Amazon S3 input files or Amazon DynamoDB table before you actually load the data, use the NOLOAD option with the [COPY](r_COPY.md) command. Use NOLOAD with the same COPY commands and options you would use to load the data. NOLOAD checks the integrity of all of the data without loading it into the database. The NOLOAD option displays any errors that occur if you attempt to load the data.

For example, if you specified the incorrect Amazon S3 path for the input file, Amazon Redshift would display the following error.

```
ERROR:  No such file or directory
DETAIL:
-----------------------------------------------
Amazon Redshift error:  The specified key does not exist
code:      2
context:   S3 key being read :
location:  step_scan.cpp:1883
process:   xenmaster [pid=22199]
-----------------------------------------------
```

To troubleshoot error messages, see the [Load error reference](r_Load_Error_Reference.md). 

For an example using the NOLOAD option, see [COPY command with the NOLOAD option](r_COPY_command_examples.md#r_COPY_command_examples-load-noload-option).

# Loading tables with automatic compression
Loading tables with automatic compression

You can apply compression encodings to columns in tables manually, based on your own evaluation of the data. Or you can use the COPY command with COMPUPDATE set to ON to analyze and apply compression automatically based on sample data. 

You can use automatic compression when you create and load a brand new table. The COPY command performs a compression analysis. You can also perform a compression analysis without loading data or changing the compression on a table by running the [ANALYZE COMPRESSION](r_ANALYZE_COMPRESSION.md) command on an already populated table. For example, you can run ANALYZE COMPRESSION when you want to analyze compression on a table for future use, while preserving the existing data definition language (DDL) statements.

Automatic compression balances overall performance when choosing compression encodings. Range-restricted scans might perform poorly if sort key columns are compressed much more highly than other columns in the same query. As a result, automatic compression skips the data analyzing phase on the sort key columns and keeps the user-defined encoding types. 

Automatic compression chooses RAW encoding if you haven't explicitly defined a type of encoding. ANALYZE COMPRESSION behaves the same. For optimal query performance, consider using RAW for sort keys.

## How automatic compression works
How automatic compression works

When the COMPUPDATE parameter is ON, the COPY command applies automatic compression whenever you run the COPY command with an empty target table and all of the table columns either have RAW encoding or no encoding.

To apply automatic compression to an empty table, regardless of its current compression encodings, run the COPY command with the COMPUPDATE option set to ON. To turn off automatic compression, run the COPY command with the COMPUPDATE option set to OFF.

You cannot apply automatic compression to a table that already contains data.

**Note**  
Automatic compression analysis requires enough rows in the load data (at least 100,000 rows per slice) to generate a meaningful sample.

Automatic compression performs these operations in the background as part of the load transaction:

1. An initial sample of rows is loaded from the input file. Sample size is based on the value of the COMPROWS parameter. The default is 100,000.

1. Compression options are chosen for each column.

1. The sample rows are removed from the table.

1. The table is recreated with the chosen compression encodings.

1. The entire input file is loaded and compressed using the new encodings.

After you run the COPY command, the table is fully loaded, compressed, and ready for use. If you load more data later, appended rows are compressed according to the existing encoding.

If you only want to perform a compression analysis, run ANALYZE COMPRESSION, which is more efficient than running a full COPY. Then you can evaluate the results to decide whether to use automatic compression or recreate the table manually.

Automatic compression is supported only for the COPY command. Alternatively, you can manually apply compression encoding when you create the table. For information about manual compression encoding, see [Column compression to reduce the size of stored data](t_Compressing_data_on_disk.md).

## Automatic compression example
Automatic compression example

In this example, assume that the TICKIT database contains a copy of the LISTING table called BIGLIST, and you want to apply automatic compression to this table when it is loaded with approximately 3 million rows.

**To load and automatically compress the table**

1. Make sure that the table is empty. You can apply automatic compression only to an empty table:

   ```
   TRUNCATE biglist;
   ```

1. Load the table with a single COPY command. Although the table is empty, some earlier encoding might have been specified. To facilitate that Amazon Redshift performs a compression analysis, set the COMPUPDATE parameter to ON.

   ```
   COPY biglist FROM 's3://amzn-s3-demo-bucket/biglist.txt' 
   IAM_ROLE 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
   DELIMITER '|' COMPUPDATE ON;
   ```

   Because no COMPROWS option is specified, the default and recommended sample size of 100,000 rows per slice is used.

1. Look at the new schema for the BIGLIST table in order to review the automatically chosen encoding schemes.

   ```
   SELECT "column", type, encoding 
   from pg_table_def where tablename = 'biglist';
   
   
        Column     |            Type             | Encoding 
   ----------------+-----------------------------+----------
    listid         | integer                     | az64
    sellerid       | integer                     | az64
    eventid        | integer                     | az64
    dateid         | smallint                    | none
    numtickets     | smallint                    | az64
    priceperticket | numeric(8,2)                | az64
    totalprice     | numeric(8,2)                | az64
    listtime       | timestamp without time zone | az64
   ```

1. Verify that the expected number of rows were loaded: 

   ```
   select count(*) from biglist;
   
   count
   ---------
   3079952
   (1 row)
   ```

When rows are later appended to this table using COPY or INSERT statements, the same compression encodings are applied.

# Optimizing storage for narrow tables
Optimizing for narrow tables

If you have a table with very few columns but a very large number of rows, the three hidden metadata identity columns (INSERT\$1XID, DELETE\$1XID, ROW\$1ID) will consume a disproportionate amount of the disk space for the table.

 In order to optimize compression of the hidden columns, load the table in a single COPY transaction where possible. If you load the table with multiple separate COPY commands, the INSERT\$1XID column will not compress well. You must perform a vacuum operation if you use multiple COPY commands, but it will not improve compression of INSERT\$1XID.

# Loading default column values
Loading default column values

You can optionally define a column list in your COPY command. If a column in the table is omitted from the column list, COPY will load the column with either the value supplied by the DEFAULT option that was specified in the CREATE TABLE command, or with NULL if the DEFAULT option was not specified.

If COPY attempts to assign NULL to a column that is defined as NOT NULL, the COPY command fails. For information about assigning the DEFAULT option, see [CREATE TABLE](r_CREATE_TABLE_NEW.md).

When loading from data files on Amazon S3, the columns in the column list must be in the same order as the fields in the data file. If a field in the data file does not have a corresponding column in the column list, the COPY command fails.

When loading from Amazon DynamoDB table, order does not matter. Any fields in the Amazon DynamoDB attributes that do not match a column in the Amazon Redshift table are discarded.

The following restrictions apply when using the COPY command to load DEFAULT values into a table: 
+ If an [IDENTITY](r_CREATE_TABLE_NEW.md#identity-clause) column is included in the column list, the EXPLICIT\$1IDS option must also be specified in the [COPY](r_COPY.md) command, or the COPY command will fail. Similarly, if an IDENTITY column is omitted from the column list, and the EXPLICIT\$1IDS option is specified, the COPY operation will fail.
+ Because the evaluated DEFAULT expression for a given column is the same for all loaded rows, a DEFAULT expression that uses a RANDOM() function will assign to same value to all the rows.
+ DEFAULT expressions that contain CURRENT\$1DATE or SYSDATE are set to the timestamp of the current transaction.

For an example, see "Load data from a file with default values" in [COPY examples](r_COPY_command_examples.md).

# Troubleshooting data loads
Troubleshooting data loads

When you load data into Amazon Redshift tables you might encounter errors from Amazon S3, invalid input data, and COPY command errors. The following sections provide information about identifying and resolving data load errors.

**Topics**
+ [

# Troubleshooting S3 event integration and COPY JOB errors
](s3-integration-troubleshooting.md)
+ [

# S3ServiceException errors
](s3serviceexception-error.md)
+ [

# System tables for troubleshooting data loads
](system-tables-for-troubleshooting-data-loads.md)
+ [

# Multibyte character load errors
](multi-byte-character-load-errors.md)
+ [

# Load error reference
](r_Load_Error_Reference.md)

# Troubleshooting S3 event integration and COPY JOB errors


Use the following information to troubleshoot common issues with Amazon S3 event integrations and COPY JOB with Amazon Redshift.

## Creation of the S3 event integration failed


If the creation of the S3 event integration failed, the status of the integration is `Inactive`. Make sure that the following are correct for your Amazon Redshift data warehouse.
+ You added the correct authorized principal and integration source for your target namespace in Amazon Redshift. See [Prerequisites to creating an S3 event integration](loading-data-copy-job.md#loading-data-copy-job-prerequisites).
+ You added the correct resource-based policy to the source Amazon S3 bucket. See [Prerequisites to creating an S3 event integration](loading-data-copy-job.md#loading-data-copy-job-prerequisites).

## Your Amazon S3 data is not appearing in the target database


If data from a COPY JOB doesn't appear, check the following.
+ Query SYS\$1COPY\$1JOB\$1DETAIL to view if the Amazon S3 file has been loaded, whether its pending ingestion, or there is an error. For more information, see [SYS\$1COPY\$1JOB\$1DETAIL](SYS_COPY_JOB_DETAIL.md).
+ Consult STL\$1ERROR or SYS\$1COPY\$1JOB\$1INFO if the Amazon S3 file is not there or there is unexpected wait time. Look for credential errors or anything that suggests the integration is inactive. For more information, see [STL\$1ERROR](r_STL_ERROR.md) and [SYS\$1COPY\$1JOB\$1INFO](SYS_COPY_JOB_INFO.md).

# S3ServiceException errors


The most common s3ServiceException errors are caused by an improperly formatted or incorrect credentials string, having your cluster and your bucket in different Amazon Regions, and insufficient Amazon S3 permissions.

The section provides troubleshooting information for each type of error.

## Invalid credentials string


If your credentials string was improperly formatted, you will receive the following error message: 

```
ERROR: Invalid credentials. Must be of the format: credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>
[;token=<temporary-session-token>]'
```

Verify that the credentials string does not contain any spaces or line breaks, and is enclosed in single quotation marks. 

## Invalid access key ID


If your access key ID does not exist, you will receive the following error message: 

```
[Amazon](500310) Invalid operation: S3ServiceException:The Amazon Access Key Id you provided does not exist in our records.
```

This is often a copy and paste error. Verify that the access key ID was entered correctly. Also, if you are using temporary session keys, check that the value for `token` is set.

## Invalid secret access key


If your secret access key is incorrect, you will receive the following error message: 

```
[Amazon](500310) Invalid operation: S3ServiceException:The request signature we calculated does not match the signature you provided. 
Check your key and signing method.,Status 403,Error SignatureDoesNotMatch
```

This is often a copy and paste error. Verify that the secret access key was entered correctly and that it is the correct key for the access key ID.

## Bucket is in a different Region


The Amazon S3 bucket specified in the COPY command must be in the same Amazon Region as the cluster. If your Amazon S3 bucket and your cluster are in different Regions, you will receive an error similar to the following: 

```
ERROR: S3ServiceException:The bucket you are attempting to access must be addressed using the specified endpoint.
```

You can create an Amazon S3 bucket in a specific Region either by selecting the Region when you create the bucket by using the Amazon S3 Management Console, or by specifying an endpoint when you create the bucket using the Amazon S3 API or CLI. For more information, see [Uploading files to Amazon S3 to use with COPY](t_uploading-data-to-S3.md).

For more information about Amazon S3 regions, see [Accessing a Bucket](https://docs.amazonaws.cn/AmazonS3/latest/userguide/UsingBucket.html#access-bucket-intro) in the *Amazon Simple Storage Service User Guide*.

Alternatively, you can specify the Region using the [REGION](copy-parameters-data-source-s3.md#copy-region) option with the COPY command.

## Access denied


If the user does not have sufficient permissions, you will receive the following error message:

```
ERROR: S3ServiceException:Access Denied,Status 403,Error AccessDenied
```

One possible cause is the user identified by the credentials does not have LIST and GET access to the Amazon S3 bucket. For other causes, see [Troubleshoot Access Denied (403 Forbidden) errors in Amazon S3](https://docs.amazonaws.cn/AmazonS3/latest/userguide/troubleshoot-403-errors.html) in the *Amazon Simple Storage Service User Guide*.

For information about managing user access to buckets, see [Identity and access management in Amazon S3](https://docs.amazonaws.cn/AmazonS3/latest/userguide/s3-access-control.html) in the *Amazon Simple Storage Service User Guide*.

# System tables for troubleshooting data loads


The following Amazon Redshift system tables can be helpful in troubleshooting data load issues:
+ Query [STL\$1LOAD\$1ERRORS](r_STL_LOAD_ERRORS.md) to discover the errors that occurred during specific loads.
+ Query [STL\$1FILE\$1SCAN](r_STL_FILE_SCAN.md) to view load times for specific files or to see if a specific file was even read.
+ Query [STL\$1S3CLIENT\$1ERROR](r_STL_S3CLIENT_ERROR.md) to find details for errors encountered while transferring data from Amazon S3.

**To find and diagnose load errors**

1. Create a view or define a query that returns details about load errors. The following example joins the STL\$1LOAD\$1ERRORS table to the STV\$1TBL\$1PERM table to match table IDs with actual table names. 

   ```
   create view loadview as
   (select distinct tbl, trim(name) as table_name, query, starttime,
   trim(filename) as input, line_number, colname, err_code,
   trim(err_reason) as reason
   from stl_load_errors sl, stv_tbl_perm sp
   where sl.tbl = sp.id);
   ```

1. Set the MAXERRORS option in your COPY command to a large enough value to enable COPY to return useful information about your data. If the COPY encounters errors, an error message directs you to consult the STL\$1LOAD\$1ERRORS table for details.

1. Query the LOADVIEW view to see error details. For example: 

   ```
   select * from loadview where table_name='venue';
   ```

   ```
     tbl   | table_name | query |         starttime          
   --------+------------+-------+----------------------------
    100551 | venue      | 20974 | 2013-01-29 19:05:58.365391 
   
   |     input      | line_number | colname | err_code |       reason
   +----------------+-------------+---------+----------+--------------------
   | venue_pipe.txt |           1 |       0 |     1214 | Delimiter not found
   ```

1. Fix the problem in the input file or the load script, based on the information that the view returns. Some typical load errors to watch for include: 
   + Mismatch between data types in table and values in input data fields.
   + Mismatch between number of columns in table and number of fields in input data.
   + Mismatched quotation marks. Amazon Redshift supports both single and double quotation marks; however, these quotation marks must be balanced appropriately.
   + Incorrect format for date/time data in input files.
   + Out-of-range values in input files (for numeric columns).
   + Number of distinct values for a column exceeds the limitation for its compression encoding.

# Multibyte character load errors


Columns with a CHAR data type only accept single-byte UTF-8 characters, up to byte value 127, or 7F hex, which is also the ASCII character set. VARCHAR columns accept multibyte UTF-8 characters, to a maximum of four bytes. For more information, see [Character types](r_Character_types.md). 

If a line in your load data contains a character that is not valid for the column data type, COPY returns an error and logs a row in the STL\$1LOAD\$1ERRORS system log table with error number 1220. The ERR\$1REASON field includes the byte sequence, in hex, for the invalid character. 

An alternative to fixing not valid characters in your load data is to replace the not valid characters during the load process. To replace not valid UTF-8 characters, specify the ACCEPTINVCHARS option with the COPY command. If the ACCEPTINVCHARS option is set, the character you specify replaces the code point. If the ACCEPTINVCHARS option isn't set, Amazon Redshift accepts the characters as valid UTF-8. For more information, see [ACCEPTINVCHARS](copy-parameters-data-conversion.md#acceptinvchars).

The following list of code points are valid UTF-8, COPY operations don't return an error if the ACCEPTINVCHARS option is not set. However, these code points are not valid characters. You can use the [ACCEPTINVCHARS](copy-parameters-data-conversion.md#acceptinvchars) option to replace a code point with a character that you specify. These code points include the range of values from `0xFDD0` to `0xFDEF` and values up to `0x10FFFF`, ending with `FFFE` or `FFFF`:
+ `0xFFFE`, `0x1FFFE`, `0x2FFFE`, …, `0xFFFFE`, `0x10FFFE`
+ `0xFFFF`, `0x1FFFF`, `0x2FFFF`, …, `0xFFFFF`, `0x10FFFF`

The following example shows the error reason when COPY attempts to load UTF-8 character `e0 a1 c7a4` into a CHAR column.

```
Multibyte character not supported for CHAR 
(Hint: Try using  VARCHAR). Invalid char: e0 a1 c7a4
```

If the error is related to a VARCHAR data type, the error reason includes an error code as well as the not valid UTF-8 hex sequence. The following example shows the error reason when COPY attempts to load UTF-8 `a4` into a VARCHAR field.

```
String contains invalid or unsupported UTF-8 codepoints. 
Bad UTF-8 hex sequence: a4 (error 3)
```

The following table lists the descriptions and suggested workarounds for VARCHAR load errors. If one of these errors occurs, replace the character with a valid UTF-8 code sequence or remove the character.

[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/redshift/latest/dg/multi-byte-character-load-errors.html)

# Load error reference
Error reference

If any errors occur while loading data from a file, query the [STL\$1LOAD\$1ERRORS](r_STL_LOAD_ERRORS.md) table to identify the error and determine the possible explanation. The following table lists all error codes that might occur during data loads:

## Load error codes
Error codes

[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/redshift/latest/dg/r_Load_Error_Reference.html)