

# Creating a dataset using Amazon S3 files
Amazon S3 files

To create a dataset using one or more text files (.csv, .tsv, .clf, or .elf) from Amazon S3, create a manifest for Quick Sight. Quick Sight uses this manifest to identify the files that you want to use and to the upload settings needed to import them. When you create a dataset using Amazon S3, the file data is automatically imported into [SPICE](spice.md).

You must grant Quick Sight access to any Amazon S3 buckets that you want to read files from. For information about granting Quick Sight access to Amazon resources, see [Configuring Amazon Quick Sight access to Amazon data sources](access-to-aws-resources.md).

**Topics**
+ [

# Supported formats for Amazon S3 manifest files
](supported-manifest-file-format.md)
+ [

# Creating Amazon S3 datasets
](create-a-data-set-s3-procedure.md)
+ [

# Datasets using S3 files in another Amazon account
](using-s3-files-in-another-aws-account.md)

# Supported formats for Amazon S3 manifest files


You use JSON manifest files to specify files in Amazon S3 to import into Quick Sight. These JSON manifest files can use either the Quick Sight format described following or the Amazon Redshift format described in [Using a manifest to specify data files](https://docs.amazonaws.cn/redshift/latest/dg/loading-data-files-using-manifest.html) in the *Amazon Redshift Database Developer Guide*. You don't have to use Amazon Redshift to use the Amazon Redshift manifest file format. 

If you use an Quick Sight manifest file, it must have a .json extension, for example `my_manifest.json`. If you use an Amazon Redshift manifest file, it can have any extension. 

If you use an Amazon Redshift manifest file, Quick Sight processes the optional `mandatory` option as Amazon Redshift does. If the associated file isn't found, Quick Sight ends the import process and returns an error. 

Files that you select for import must be delimited text (for example, .csv or .tsv), log (.clf), or extended log (.elf) format, or JSON (.json). All files identified in one manifest file must use the same file format. Plus, they must have the same number and type of columns. Quick Sight supports UTF-8 file encoding, but not UTF-8 with byte-order mark (BOM). If you are importing JSON files, then for `globalUploadSettings` specify `format`, but not `delimiter`, `textqualifier`, or `containsHeader`.

Make sure that any files that you specify are in Amazon S3 buckets that you have granted Quick Sight access to. For information about granting Quick Sight access to Amazon resources, see [Configuring Amazon Quick Sight access to Amazon data sources](access-to-aws-resources.md).

## Manifest file format for Quick Sight


Quick Sight manifest files use the following JSON format.

```
{
    "fileLocations": [
        {
            "URIs": [
                "uri1",
                "uri2",
                "uri3"
            ]
        },
        {
            "URIPrefixes": [
                "prefix1",
                "prefix2",
                "prefix3"
            ]
        }
    ],
    "globalUploadSettings": {
        "format": "JSON",
        "delimiter": ",",
        "textqualifier": "'",
        "containsHeader": "true"
    }
}
```

Use the fields in the `fileLocations` element to specify the files to import, and the fields in the `globalUploadSettings` element to specify import settings for those files, such as field delimiters. 

The manifest file elements are described following:
+ **fileLocations** – Use this element to specify the files to import. You can use either or both of the `URIs` and `URIPrefixes` arrays to do this. You must specify at least one value in one or the other of them.
  + **URIs** – Use this array to list URIs for specific files to import.

    Quick Sight can access Amazon S3 files that are in any Amazon Web Services Region. However, you must use a URI format that identifies the Amazon Region of the Amazon S3 bucket if it's different from that used by your Quick account.

    URIs in the following formats are supported.  
****    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/quick/latest/userguide/supported-manifest-file-format.html)
  + **URIPrefixes** – Use this array to list URI prefixes for S3 buckets and folders. All files in a specified bucket or folder are imported. Quick Sight recursively retrieves files from child folders.

    Quick Sight can access Amazon S3 buckets or folders that are in any Amazon Web Services Region. Make sure to use a URI prefix format that identifies the S3 bucket's Amazon Web Services Region if it's different from that used by your Quick account.

    URI prefixes in the following formats are supported.  
****    
[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/quick/latest/userguide/supported-manifest-file-format.html)
+ **globalUploadSettings** – (Optional) Use this element to specify import settings for the Amazon S3 files, such as field delimiters. If this element is not specified, Quick Sight uses the default values for the fields in this section.
**Important**  
For log (.clf) and extended log (.elf) files, only the **format** field in this section is applicable, so you can skip the other fields. If you choose to include them, their values are ignored. 
  + **format** – (Optional) Specify the format of the files to be imported. Valid formats are **CSV**, **TSV**, **CLF**, **ELF**, and **JSON**. The default value is **CSV**.
  + **delimiter** – (Optional) Specify the file field delimiter. This must map to the file type specified in the `format` field. Valid formats are commas (**,**) for .csv files and tabs (**\$1t**) for .tsv files. The default value is comma (**,**).
  + **textqualifier** – (Optional) Specify the file text qualifier. Valid formats are single quote (**'**), double quotes (**\$1"**). The leading backslash is a required escape character for a double quote in JSON. The default value is double quotes (**\$1"**). If your text doesn't need a text qualifier, don't include this property.
  + **containsHeader** – (Optional) Specify whether the file has a header row. Valid formats are **true** or **false**. The default value is **true**.

### Manifest file examples for Quick Sight


The following are some examples of completed Quick Sight manifest files.

The following example shows a manifest file that identifies two specific .csv files for import. These files use double quotes for text qualifiers. The `format`, `delimiter`, and `containsHeader` fields are skipped because the default values are acceptable.

```
{
    "fileLocations": [
        {
            "URIs": [
                "https://yourBucket.s3.amazonaws.com/data-file.csv",
                "https://yourBucket.s3.amazonaws.com/data-file-2.csv"
            ]
        }
    ],
    "globalUploadSettings": {
        "textqualifier": "\""
    }
}
```

The following example shows a manifest file that identifies one specific .tsv file for import. This file also includes a bucket in another Amazon Region that contains additional .tsv files for import. The `textqualifier` and `containsHeader` fields are skipped because the default values are acceptable.

```
{
    "fileLocations": [
        {
            "URIs": [
                "https://s3.amazonaws.com/amzn-s3-demo-bucket/data.tsv"
            ]
        },
        {
            "URIPrefixes": [
                "https://s3-us-east-1.amazonaws.com/amzn-s3-demo-bucket/"
            ]
        }
    ],
    "globalUploadSettings": {
        "format": "TSV",
        "delimiter": "\t"
    }
}
```

The following example identifies two buckets that contain .clf files for import. One is in the same Amazon Web Services Region as the Quick account, and one in a different Amazon Web Services Region. The `delimiter`, `textqualifier`, and `containsHeader` fields are skipped because they are not applicable to log files.

```
{
    "fileLocations": [
        {
            "URIPrefixes": [
                "https://amzn-s3-demo-bucket1.your-s3-url.com",
                "s3://amzn-s3-demo-bucket2/"
            ]
        }
    ],
    "globalUploadSettings": {
        "format": "CLF"
    }
}
```

The following example uses the Amazon Redshift format to identify a .csv file for import.

```
{
    "entries": [
        {
            "url": "https://amzn-s3-demo-bucket.your-s3-url.com/myalias-test/file-to-import.csv",
            "mandatory": true
        }
    ]
}
```

The following example uses the Amazon Redshift format to identify two JSON files for import.

```
{
    "fileLocations": [
        {
            "URIs": [
                "https://yourBucket.s3.amazonaws.com/data-file.json",
                "https://yourBucket.s3.amazonaws.com/data-file-2.json"
            ]
        }
    ],
    "globalUploadSettings": {
        "format": "JSON"
    }
}
```

# Creating Amazon S3 datasets


**To create an Amazon S3 dataset**

1. Check [Data source quotas](data-source-limits.md) to make sure that your target file set doesn't exceed data source quotas.

1. Create a manifest file to identify the text files that you want to import, using one of the formats specified in [Supported formats for Amazon S3 manifest files](supported-manifest-file-format.md).

1. Save the manifest file to a local directory, or upload it into Amazon S3.

1. On the Quick start page, choose **Data**.

1. On the **Data** page, choose **Create** then **New dataset**.

1. Choose the Amazon S3 icon and then choose **Next**.

1. For **Data source name**, enter a description of the data source. This name should be something that helps you distinguish this data source from others.

1. For **Upload a manifest file**, do one of the following:
   + To use a local manifest file, choose **Upload**, and then choose **Upload a JSON manifest file**. For **Open**, choose a file, and then choose **Open**.
   + To use a manifest file from Amazon S3, choose **URL**, and enter the URL for the manifest file. To find the URL of a pre-existing manifest file in the Amazon S3 console, navigate to the appropriate file and choose it. A properties panel displays, including the link URL. You can copy the URL and paste it into Quick Sight.

1. Choose **Connect**.

1. To make sure that the connection is complete, choose **Edit/Preview data**. Otherwise, choose **Visualize** to create an analysis using the data as-is. 

   If you choose **Edit/Preview data**, you can specify a dataset name as part of preparing the data. Otherwise, the dataset name matches the name of the manifest file. 

   To learn more about data preparation, see [Preparing data in Amazon Quick Sight](preparing-data.md).

## Creating datasets based on multiple Amazon S3 files


You can use one of several methods to merge or combine files from Amazon S3 buckets inside Quick Sight:
+ **Combine files by using a manifest** – In this case, the files must have the same number of fields (columns). The data types must match between fields in the same position in the file. For example, the first field must have the same data type in each file. The same goes for the second field, and the third field, and so on. Quick Sight takes field names from the first file.

  The files must be listed explicitly in the manifest. However, they don't have to be inside the same Amazon S3 bucket.

  In addition, the files must follow the rules described in [Supported formats for Amazon S3 manifest files](supported-manifest-file-format.md).

  For more details about combining files using a manifest, see [Creating a dataset using Amazon S3 files](create-a-data-set-s3.md).
+ **Merge files without using a manifest** – To merge multiple files into one without having to list them individually in the manifest, you can use Athena. With this method, you can simply query your text files, like they are in a table in a database. For more information, see the post [Analyzing data in Amazon S3 using Athena](https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/) in the Big Data blog. 
+ **Use a script to append files before importing** – You can use a script designed to combine your files before uploading. 

# Datasets using S3 files in another Amazon account
Using another account's S3 files

Use this section to learn how to set up security so you can use Quick Sight to access Amazon S3 files in another Amazon account. 

For you to access files in another account, the owner of the other account must first set Amazon S3 to grant you permissions to read the file. Then, in Quick Sight, you must set up access to the buckets that were shared with you. After both of these steps are finished, you can use a manifest to create a dataset.

**Note**  
 To access files that are shared with the public, you don't need to set up any special security. However, you still need a manifest file.

**Topics**
+ [

## Setting up Amazon S3 to allow access from a different Quick account
](#setup-S3-to-allow-access-from-a-different-quicksight-account)
+ [

## Setting up Quick Sight to access Amazon S3 files in another Amazon account
](#setup-quicksight-to-access-S3-in-a-different-account)

## Setting up Amazon S3 to allow access from a different Quick account
Setting up Amazon S3 to allow a different account

Use this section to learn how to set permissions in Amazon S3 files so they can be accessed by Quick Sight in another Amazon account. 

For information on accessing another account's Amazon S3 files from your Quick Sight account, see [Setting up Quick Sight to access Amazon S3 files in another Amazon account](#setup-quicksight-to-access-S3-in-a-different-account). For more information about S3 permissions, see [Managing access permissions to your Amazon S3 resources](https://docs.amazonaws.cn/AmazonS3/latest/dev/s3-access-control.html) and [How do I set permissions on an object?](https://docs.amazonaws.cn/AmazonS3/latest/user-guide/set-object-permissions.html)

You can use the following procedure to set this access from the S3 console. Or you can grant permissions by using the Amazon CLI or by writing a script. If you have a lot of files to share, you can instead create an S3 bucket policy on the `s3:GetObject` action. To use a bucket policy, add it to the bucket permissions, not to the file permissions. For information on bucket policies, see [Bucket policy examples](https://docs.amazonaws.cn/AmazonS3/latest/dev/example-bucket-policies.html) in the *Amazon S3 Developer Guide. *

**To set access from a different Quick account from the S3 console**

1. Get the email address of the Amazon account email that you want to share with. Or you can get and use the canonical user ID. For more information on canonical user IDs, see [Amazon account identifiers](https://docs.amazonaws.cn/general/latest/gr/acct-identifiers.html) in the *Amazon General Reference.*

1. Sign in to the Amazon Web Services Management Console and open the Amazon S3 console at [https://console.amazonaws.cn/s3/](https://console.amazonaws.cn/s3/).

1. Find the Amazon S3 bucket that you want to share with Quick Sight. Choose **Permissions**.

1. Choose **Add Account**, and then enter an email address, or paste in a canonical user ID, for the Amazon account that you want to share with. This email address should be the primary one for the Amazon account. 

1. Choose **Yes** for both **Read bucket permissions** and **List objects**.

   Choose **Save** to confirm.

1. Find the file that you want to share, and open the file's permission settings. 

1. Enter an email address or the canonical user ID for the Amazon account that you want to share with. This email address should be the primary one for the Amazon account. 

1. Enable **Read object** permissions for each file that Quick Sight needs access to. 

1. Notify the Quick user that the files are now available for use.

## Setting up Quick Sight to access Amazon S3 files in another Amazon account
Setting up Quick Sight to access another Amazon S3 account

Use this section to learn how to set up Quick Sight so you can access Amazon S3 files in another Amazon account. For information on allowing someone else to access your Amazon S3 files from their Quick account, see [Setting up Amazon S3 to allow access from a different Quick account](#setup-S3-to-allow-access-from-a-different-quicksight-account).

Use the following procedure to access another account's Amazon S3 files from Quick Sight. Before you can use this procedure, the users in the other Amazon account must share the files in their Amazon S3 bucket with you.

**To access another account's Amazon S3 files from Quick Sight**

1. Verify that the user or users in the other Amazon account gave your account read and write permission to the S3 bucket in question. 

1. Choose your profile icon, and then choose **Manage Quick Sight**.

1. Choose **Security & permissions**.

1. Under **Quick Sight access to Amazon services**, choose **Manage**.

1. Choose **Select S3 buckets**.

1. On the **Select Amazon S3 buckets** screen, choose the **S3 buckets you can access across Amazon** tab.

   The default tab is named **S3 buckets linked to Quick Sight account**. It shows all the buckets your Quick account has access to. 

1. Do one of the following:
   + To add all the buckets that you have permission to use, choose **Choose accessible buckets from other Amazon accounts**. 
   + If you have one or more Amazon S3 buckets that you want to add, enter their names. Each must exactly match the unique name of the Amazon S3 bucket.

     If you don't have the appropriate permissions, you see the error message "We can't connect to this S3 bucket. Make sure that any S3 buckets you specify are associated with the Amazon account used to create this Quick account." This error message appears if you don't have either account permissions or Quick Sight permissions.
**Note**  
To use Amazon Athena, Quick Sight needs to access the Amazon S3 buckets that Athena uses.   
You can add them here one by one, or use the **Choose accessible buckets from other Amazon accounts** option.

1. Choose **Select buckets** to confirm your selection. 

1. Create a new dataset based on Amazon S3, and upload your manifest file. For more information Amazon S3 datasets, see [Creating a dataset using Amazon S3 files](create-a-data-set-s3.md).