How Amazon DataSync works - Amazon DataSync
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

How Amazon DataSync works

Learn the key concepts and terminology related to Amazon DataSync transfers.

DataSync transfer architecture

The following diagrams show how and where DataSync commonly transfers storage data. For a full list of DataSync supported storage systems and services, see Where can I transfer my data with Amazon DataSync?.

Transferring between on-premises storage and Amazon

The following diagram shows a high-level overview of DataSync transferring files between self-managed, on-premises storage systems and Amazon Web Services.

An overview of a common DataSync scenario where data transfers from an on-premises storage system to a supported Amazon storage resource (such as an Amazon S3 bucket or Amazon EFS file system).

The diagram illustrates a common DataSync use case:

  • A DataSync agent copying data from an on-premises storage system.

  • Data moving into Amazon via Transport Layer Security (TLS).

  • DataSync copying data to a supported Amazon storage service.

Transferring between Amazon storage services

The following diagram shows a high-level overview of DataSync transferring files between Amazon Web Services in the same Amazon Web Services account.

An overview of a common DataSync scenario where data transfers between Amazon storage resources (such as an Amazon S3 bucket or Amazon EFS file system).

The diagram illustrates a common DataSync use case:

  • DataSync copying data from a supported Amazon storage service.

  • Data moving across Amazon Web Services Regions via TLS.

  • DataSync copying data to a supported Amazon storage service.

When transferring between Amazon storage services (whether in the same Amazon Web Services Region or across Amazon Web Services Regions), your data remains in the Amazon network and doesn't traverse the public internet.

Important

You pay for data transferred between Amazon Web Services Regions. This is billed as data transfer OUT from your source Region to your destination Region. For more information, see Data transfer pricing.

Transferring between cloud storage systems and Amazon storage services

With DataSync, you can transfer data between other cloud storage systems and Amazon Web Services. In this context, cloud storage systems can include:

  • Self-managed storage systems, such as an NFS file server in your virtual private cloud (VPC) within Amazon.

Concepts and terminology

Familiarize yourself with DataSync transfer features.

Agent

An agent is a virtual machine (VM) appliance that DataSync uses to read from and write to storage during a transfer.

You can deploy an agent in your storage environment on VMware ESXi, Linux Kernel-based Virtual Machine (KVM), or Microsoft Hyper-V hypervisors. For storage in a virtual private cloud (VPC) in Amazon, you can deploy an agent as an Amazon EC2 instance.

A DataSync transfer agent is no different than an agent that you can use for DataSync Discovery, but we don't recommend using the same agent for these scenarios.

To get started, see Do I need an Amazon DataSync agent?

Location

A location describes where you're copying data from or to. Each DataSync transfer (also known as a task) has a source and destination location. For more information, see Where can I transfer my data with Amazon DataSync?

Task

A task describes a DataSync transfer. It identifies a source and destination location along with details about how to copy data between those locations. You also can specify how a task treats metadata, deleted files, and permissions.

Task execution

A task execution is an individual run of a DataSync transfer task. There are several phases involved in a task execution. For more information, see Task execution statuses.

How DataSync transfers files, objects, and directories

When you start a task, DataSync prepares your transfer by examining your source and destination locations to determine what to transfer. This is done by recursively scanning the contents and metadata of both locations to identify differences between the two. This process can take just minutes or a few hours depending on the number of files, objects, or directories in both locations and the performance of your storage systems or services.

During preparation, the number of files, objects, or directories that DataSync takes inventory of in your source and destination counts towards your task quotas. The quotas aren't based on the number of items that DataSync transfers during each task execution.

Once DataSync is done preparing your transfer, it moves your data (including metadata) from the source to the destination based on your task settings. For example, you can specify what metadata gets copied, exclude certain files, limit how much bandwidth DataSync uses, among other options.

At the end of the transfer, DataSync can verify the integrity of your data.

For information on the specific steps that take place during a task execution, see DataSync task statuses.

Open and locked files

Keep in mind the following when trying to transfer files that are open (in use) or locked:

  • In general, DataSync can transfer open files without any limitations.

  • If a file is open and being written to during a transfer, DataSync can detect this kind of inconsistency during the transfer task's verification phase. To get the latest version of the file, you must run the task again.

  • If a file is locked and the server prevents DataSync from opening it, DataSync skips the file during the transfer and logs an error.

  • DataSync can't lock or unlock files.

Data integrity

DataSync always performs data-integrity checks during a transfer. When your transfer's complete, DataSync can also verify just the data copied or the entire dataset in the source and destination locations. Depending on how you configure data verification, this can take a significant amount of time on large datasets.

Tip

In most cases, we recommend verifying only the data that gets transferred.

DataSync checks data integrity by calculating and comparing the checksum and metadata of every file or object in both locations. If DataSync notices differences between locations, task verification fails with an error that specifies what failed. For example, you might see errors such as Checksum failure, Metadata failure, Files were added, Files were removed, and so on.

Recurring transfers

In addition to one-time transfers, DataSync can move data on a recurring basis. Some of the options for these situations include: