

# Prerequisites for setting up a zero-ETL integration
Prerequisites

Setting up an integration between the source and target require some prerequisites such as configuring IAM roles which Amazon Glue uses to access data from the source and write to the target, and the use of KMS keys to encrypt the data in intermediate or the target location.

**Topics**
+ [

## Setting up source resources
](#zero-etl-setup-source-resources)
+ [

## Setting up target resources
](#zero-etl-setup-target-resources)
+ [

## Creating an Amazon Redshift data warehouse
](#zero-etl-setup-target-redshift-data-warehouse)
+ [

## Setting up a VPC for your zero-ETL integration
](#zero-etl-setup-vpc)
+ [

## Setting up a zero-ETL cross-account integration
](#zero-etl-setup-cross-account-integration)

## Setting up source resources
Setting up source resources

Perform the following set up tasks as required for your source.

### Setting up the source role
Setting up source role

This section describe how you pass a source role to allow the zero-ETL integration to access your connection. This is also applicable only for SaaS sources.

**Note**  
To restrict access to only a few connections, you can first create the connection to obtain the connection ARN. See [Configuring a source for a zero-ETL integration](zero-etl-sources.md).

Create a role which has permissions for the integration to access the connection:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "GlueConnections",
            "Effect": "Allow",
            "Action": [
                "glue:GetConnections",
                "glue:GetConnection"
            ],
            "Resource": [
							"arn:aws-cn:glue:*:111122223333:catalog",
							"arn:aws-cn:glue:us-east-1:111122223333:connection/*"
            ]
        },
        {
            "Sid": "GlueActionBasedPermissions",
            "Effect": "Allow",
            "Action": [
                "glue:ListEntities",
                "glue:RefreshOAuth2Tokens"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

------

Trust policy:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "glue.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

------

## Setting up target resources
Setting up target resources

Perform the following set up tasks as required for Amazon Glue Data Catalog or Amazon Redshift data warehouse integration target.

For integrations with an Amazon Glue database target:
+ [Setting up an Amazon Glue database](#zero-etl-setup-target-resources-glue-database)
+ [Providing a catalog Resource Based Access (RBAC) policy](#zero-etl-setup-target-resources-rbac-policy)
+ [Creating a target IAM role](#zero-etl-setup-target-resources-target-iam-role)

For integrations with an Amazon Redshift datawarehouse target:
+ [https://docs.amazonaws.cn/glue/latest/dg/zero-etl-prerequisites.html#zero-etl-setup-target-redshift-data-warehouse](https://docs.amazonaws.cn/glue/latest/dg/zero-etl-prerequisites.html#zero-etl-setup-target-redshift-data-warehouse)

### Setting up an Amazon Glue database
Setting up an Amazon Glue database

For integrations that use an Amazon Glue database:

To set up a target database in the Amazon Glue Data Catalog with an Amazon S3 location:

1. In the Amazon Glue console home page, select **Database** under Data Catalog.

1. Choose **Add database** in the top right corner. If you have already created a database, make sure that the location with Amazon S3 URI is set for the database.

1. Enter a name and **Location** (Amazon S3 URI). Note that the location is required for the zero-ETL integration. Click **Create database** when done.
**Note**  
The Amazon S3 bucket must be in the same region as the Amazon Glue database.

For information on creating a new database in Amazon Glue, see [Getting started with the Amazon Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/start-data-catalog.htm).

You can also use the [https://docs.aws.amazon.com/cli/latest/reference/glue/create-database.html](https://docs.aws.amazon.com/cli/latest/reference/glue/create-database.html) CLI to create the database in Amazon Glue. Note that the `LocationUri` in `--database-input` is required.

#### Optimizing Iceberg tables
Optimizing Iceberg tables

Once a table is created by Amazon Glue in the target database, you can enable the compaction to speed up queries in Amazon Athena. For information on setting up the resources (IAM Role) for compaction, see [Table optimization prerequisites](https://docs.aws.amazon.com/glue/latest/dg/optimization-prerequisites.html).

For more information on setting up compaction on the Amazon Glue table created by the integration, see [Optimizing Iceberg tables](https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html).

### Providing a catalog Resource Based Access (RBAC) policy
Provide an RBAC policy

For integrations that use an Amazon Glue database, add the following permissions to the catalog RBAC Policy to allow for integrations between source and target.

**Note**  
For cross-account integrations, both Alice (user creating the integration) role policy and catalog resource policy need to allow `glue:CreateInboundIntegration` on the resource. For same-account, either a resource policy or role policy allowing `glue:CreateInboundIntegration` on the resource is sufficient. Both scenarios do still need to allow `glue.amazonaws.com` to `glue:AuthorizeInboundIntegration`.

You can access the **Catalog settings** under **Data Catalog**. Then provide the following permissions and fill in the missing information.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Principal": {
        "AWS": [
        	"arn:aws-cn:iam::123456789012:user/Alice"
        ]
      },
      "Effect": "Allow",
      "Action": [
        "glue:CreateInboundIntegration"
      ],
      "Resource": [
      	"arn:aws-cn:glue:us-east-1:111122223333:catalog",
				"arn:aws-cn:glue:us-east-1:111122223333:database/DatabaseName"
      ],
      "Condition": {
        "StringLike": {
        "aws:SourceArn": "arn:aws-cn:dynamodb:us-east-1:444455556666:table/<table-name>"
        }
      }
    },
    {
      "Principal": {
        "Service": [
          "glue.amazonaws.com"
        ]
      },
      "Effect": "Allow",
      "Action": [
        "glue:AuthorizeInboundIntegration"
      ],
      "Resource": [
      	"arn:aws-cn:glue:us-east-1:111122223333:catalog",
				"arn:aws-cn:glue:us-east-1:111122223333:database/DatabaseName"
      ],
      "Condition": {
        "StringEquals": {
        "aws:SourceArn": "arn:aws-cn:dynamodb:us-east-1:444455556666:table/<table-name>"
        }
      }
    }
  ]
}
```

------

### Creating a target IAM role
Creating a target IAM role

Create a target IAM role with the following permissions and trust relationships:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": "s3:ListBucket",
            "Resource": "arn:aws-cn:s3:::amzn-s3-demo-bucket",
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws-cn:s3:::amzn-s3-demo-bucket/prefix/*",
            "Effect": "Allow"
        },
        {
            "Action": [
                "glue:GetDatabase"
            ],
            "Resource": [
                "arn:aws-cn:glue:us-east-1:111122223333:catalog",
                "arn:aws-cn:glue:us-east-1:111122223333:database/DatabaseName"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "glue:CreateTable",
                "glue:GetTable",
                "glue:GetTables",
                "glue:DeleteTable",
                "glue:UpdateTable",
                "glue:GetTableVersion",
                "glue:GetTableVersions",
                "glue:GetResourcePolicy"
            ],
            "Resource": [
                "arn:aws-cn:glue:us-east-1:111122223333:catalog",
                "arn:aws-cn:glue:us-east-1:111122223333:database/DatabaseName",
                "arn:aws-cn:glue:us-east-1:111122223333:table/DatabaseName/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "cloudwatch:namespace": "AWS/Glue/ZeroETL"
                }
            },
            "Effect": "Allow"
        },
        {
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "*",
            "Effect": "Allow"
        }
    ]
}
```

------

Add the following trust policy to allow the Amazon Glue service to assume the role:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

------

## Creating an Amazon Redshift data warehouse
Creating an Amazon Redshift data warehouse

When your zero-ETL integration target is an Amazon Redshift data warehouse, create the data warehouse if you don't already have one. To create an Amazon Redshift Serverless workgroup, see [Creating a workgroup with a namespace](https://docs.amazonaws.cn/redshift/latest/mgmt/serverless-console-workgroups-create-workgroup-wizard.html). To create an Amazon Redshift cluster, see [Creating a cluster](https://docs.amazonaws.cn/redshift/latest/mgmt/create-cluster.html).

The target Amazon Redshift workgroup or cluster must have the `enable_case_sensitive_identifier` parameter turned on for the integration to be successful. For more information on enabling case sensitivity, see [Turn on case sensitivity for your data warehouse](https://docs.amazonaws.cn/redshift/latest/mgmt/zero-etl-setting-up.case-sensitivity.html) in the Amazon Redshift management guide.

After the Amazon Redshift workgroup or cluster setup is complete, you need to configure your data warehouse. See [Getting started with zero-ETL integrations](https://docs.amazonaws.cn/redshift/latest/mgmt/zero-etl-using.setting-up.html) in the Amazon Redshift Management Guide for more information.

## Setting up a VPC for your zero-ETL integration
Setting up a VPC

To set up a VPC for your zero-ETL integration:

1. Go to **VPC** > Your VPCs and choose **Create VPC**.

   1. Select **VPC and more**.

   1. Set your VPC name.

   1. Set the IPv4 CIDR: 10.0.0.0/16.

   1. Set the number of AZ to 1.

   1. Set the number of public and private subnets to 1.

   1. Set **NAT gateways** to None.

   1. Set **VPC endpoints** to S3 Gateway.

   1. Enable DNS hostnames and DNS resolution.

1. Go to **Endpoints** and choose **Create Endpoint**.

1. Create endpoints for these services in the private subnet of your VPC (use the default security group):

   1. com.amazonaws.us-east-1.lambda

   1. com.amazonaws.us-east-1.glue

   1. com.amazonaws.us-east-1.sts

Create the Amazon Glue connection:

1. Go to **Amazon Glue** > **Data connections** and choose **Create connection**.

1. Select **Network**.

1. Select the VPC, Subnet (private), and default Security Group that you created.

### Setting up the target role for the VPC
Target role

The target role must have these permissions (in addition to the other permissions required by Zero-ETl integrations):

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CustomerVpc",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags",
        "ec2:DeleteTags",
        "ec2:DescribeRouteTables",
        "ec2:DescribeVpcEndpoints",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:CreateNetworkInterface",
        "ec2:DeleteNetworkInterface",
        "glue:GetConnection"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

### Setting up the target leg resource properties
Target leg resource properties



If you are using the CLI, set the target leg resource properties to the target Amazon Glue database you created. Pass the target role ARN, as well as the Amazon Glue connection name.

```
aws glue create-integration-resource-property \
--resource-arn arn:aws:glue:us-east-1:<account-id>:database/exampletarget \
--target-processing-properties '{"RoleArn" : "arn:aws:iam::<account-id>:role/example-role", "ConnectionName":"example-vpc-3"}' \
--endpoint-url https://example.amazonaws.com --region us-east-1
```

### Possible client errors
Client errors

The following are possible client errors for an integration configured with a VPC.


| Error message | Action required | 
| --- | --- | 
| Provided role is not authorized to perform glue:GetConnection on connection. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided role is not authorized to perform ec2:DescribeSubnets. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided role is not authorized to perform ec2:DescribeSecurityGroups. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided role is not authorized to perform ec2:DescribeVpcEndpoints. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided role is not authorized to perform ec2:DescribeRouteTables. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided role is not authorized to perform ec2:CreateTags. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided role is not authorized to perform ec2:CreateNetworkInterface. Add this permission to role policy, and then wait for the integration to recover. | Update role policy | 
| Provided connection subnet does not contain a valid S3 endpoint or NAT gateway. Update subnet, and then wait for the integration to recover. | Update VPC subnet endpoints | 
| Connection subnet not found. Update connection subnet, and then wait for the integration to recover. | Update Amazon Glue connection | 
| Connection security group not found. Update connection security group, and then wait for the integration to recover. | Update Amazon Glue connection | 
| Can't connect to S3 through provided VPC connection. Update subnet configurations, and then wait for the integration to recover. | Update VPC subnet endpoints | 
| Can't connect to Lambda through provided VPC connection. Update subnet configurations, and then wait for the integration to recover. | Update VPC subnet endpoints | 

## Setting up a zero-ETL cross-account integration
Setting up cross-account integration

To set up a zero-ETL cross-account integration:

1. Configure a target Resource Policy as described in [Providing a catalog Resource Based Access (RBAC) policy](#zero-etl-setup-target-resources-rbac-policy). Ensure that the source account role is explicitly allowed on the target resource.

1. Check that the source account role (the role used to create the integration) has the following:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "Stmt123456789012",
               "Action": [
                   "glue:CreateInboundIntegration"
               ],
               "Effect": "Allow",
               "Resource": [
               	"arn:aws-cn:glue:us-east-1:111122223333:catalog",
   							"arn:aws-cn:glue:us-east-1:111122223333:database/DatabaseName"
               ]
           }
       ]
   }
   ```

------

1. Create the integration as described in [Creating an integration](zero-etl-common-integration-tasks.md#zero-etl-creating).