Using Amazon DataSync agents with VPC endpoints - Amazon DataSync
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Using Amazon DataSync agents with VPC endpoints

With a virtual private cloud (VPC) endpoint, you don't have to move your data across the public internet. Amazon DataSync can transfer data to Amazon through a VPC that's based on the Amazon VPC service.

How DataSync agents work with VPC endpoints

VPC endpoints are provided by Amazon PrivateLink. These types of endpoints let you privately connect supported Amazon Web Services to your VPC. When you use a VPC endpoint with DataSync, all communication between your DataSync agent and Amazon remains in your VPC.

If you're transferring from an on-premises storage system, you must extend your VPC to the local network where your storage is located. You can do this with Amazon Direct Connect or a virtual private network (VPN), such as Amazon Site-to-Site VPN. This involves setting up a route table from your local network to access the VPC endpoint. For more information, see gateway endpoint routing in the Amazon PrivateLink Guide.

Once your agent's deployed and activated, you can create your transfer task. When you run the task, DataSync creates network interfaces to manage data traffic for your transfer. These interfaces are private IP addresses that are accessible only from inside your VPC.

DataSync limitations with VPCs

  • VPCs that you use with DataSync must have default tenancy. VPCs with dedicated tenancy are not supported. For more information, see Work with VPCs.

  • DataSync doesn't support shared VPCs.

  • DataSync VPC endpoints only support IPv4. IPv6 and dualstack options aren't supported.

Configuring your DataSync agent to use a VPC endpoint

In the following procedure, learn how to configure a DataSync agent to use a VPC endpoint.

The diagram following illustrates the setup process.

To configure a DataSync agent to communicate with Amazon by using a VPC endpoint
  1. Choose the VPC and subnet where you want to set up the DataSync private IP addresses.

    The VPC should extend to your local environment (where your self-managed object storage is located) by using routing rules over Amazon Direct Connect or VPN.

  2. Deploy a DataSync agent close to your storage.

    The agent must be able to access your source storage location by using NFS, SMB, or the Amazon S3 API. You can download the .ova file for the DataSync agent from the DataSync console. The agent doesn't need a public IP address. For more information about downloading and deploying an .ova image, see Creating an Amazon DataSync agent with the Amazon CLI.

    Note

    You can use an agent for only one type of endpoint—private, public, or Federal Information Processing Standards (FIPS). If you already have an agent configured for transferring data over the public internet, deploy a new agent to transfer data to private DataSync endpoints. For detailed instructions, see Deploy your Amazon DataSync agent.

  3. In the VPC that you chose in step 1, create a security group to ensure access to the private IP addresses that DataSync uses.

    These addresses include one VPC endpoint for control traffic and four network interfaces for data transfer traffic. You use this security group to manage access to these private IP addresses and ensure that your agent can route to them.

    The agent must be able to establish connections to these IP addresses. In the security group attached to the endpoints, configure inbound rules to allow the agent's private IP address to connect to these endpoints.

  4. Create a VPC endpoint for the DataSync service.

    To do this, open the Amazon VPC console at https://console.amazonaws.cn/vpc/, and choose Endpoints from the navigation pane at left. Choose Create endpoint.

    For Service category, choose Amazon Web Services. For Service Name, choose DataSync in your Amazon Web Services Region (for example, com.amazonaws.us-east-1.datasync). Then choose the VPC and security group that you chose in steps 1 and 3. Make sure that you clear the Enable Private DNS Name check box.

    Important

    If you have deployed a DataSync agent on an Amazon EC2 instance, choose the Availability Zone where your agent resides to avoid charges for network traffic between Availability Zones.

    To learn more about data transfer prices for all Amazon Web Services Regions, see Amazon EC2 On-Demand pricing.

    For additional details on creating VPC endpoints, see Creating an interface endpoint in Amazon VPC User Guide.

  5. When your new VPC endpoint is available, make sure that the network configuration for your storage environment allows agent activation.

    Activation is a one-time operation that securely associates the agent with your Amazon Web Services account. To activate the agent, use a computer that can reach the agent by using port 80. After activation, you can revoke this access. The agent must be able to reach the private IP address of the VPC endpoint that you created in step 4.

    To find this IP address, open the Amazon VPC console at https://console.amazonaws.cn/vpc/, and choose Endpoints from the navigation pane at left. Choose the DataSync endpoint, and check the Subnets list for the private IP address for the subnet that you chose. This is the IP address of your VPC endpoint.

    Note

    Make sure to allow outbound traffic from the agent to the VPC endpoint by using ports 443, 1024–1064, and port 22. Port 22 is optional and is used for the Amazon Web Services Support channel.

  6. Activate the agent. If you have a computer that can route to the agent by using port 80 and that can access the DataSync console, open the console, choose Agents in the left navigation pane, and then choose Create agent. In the Service endpoint section, choose VPC endpoints using Amazon PrivateLink.

    Choose the VPC endpoint from step 4, the subnet from step 1, and the security group from step 3. Enter the agent's IP address.

    If you can't access the agent and the DataSync console by using the same computer, activate the agent by using the command line from a computer that can reach the agent's port 80. For more information, see Creating an Amazon DataSync agent with the Amazon CLI.

  7. Choose Get key, optionally enter an agent name and tags, and choose Create agent.

    Your new agent appears on the Agents tab of the DataSync console. The green VPC endpoint status indicates that all tasks performed with this agent use private endpoints without crossing the public internet.

  8. Create your task by configuring a source and destination location for your transfer.

    For more information, see Where can I transfer my data with Amazon DataSync?.

    To make transfers easier by using private IP addresses, your task creates four network interfaces in the VPC and subnet that you chose.

  9. Make sure that your agent can reach the four network interfaces and related IP addresses that your task creates.

    To find these IP addresses, open the Amazon EC2 console at https://console.amazonaws.cn/ec2/, and choose Network Interfaces on the dashboard. Enter the task ID into the search filter to see the task's four network interfaces. These are the network interfaces used by your VPC endpoint. Make sure that you allow outbound traffic from the agent to these interfaces by using port 443.

You can now start your task. For each additional task that uses this agent, repeat step 9 to allow the task's traffic through port 443.