How Amazon DataSync works - Amazon DataSync
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

How Amazon DataSync works

Get a visual overview of how Amazon DataSync works and learn key concepts to help you move your data quickly.

DataSync architecture

The following diagrams show how and where DataSync commonly transfers storage data.

For a full list of DataSync supported storage systems and services, see Working with Amazon DataSync locations.

Transferring between on-premises storage and Amazon

The following diagram shows a high-level overview of DataSync transferring files between self-managed, on-premises storage systems and Amazon Web Services.


                    An overview of a common DataSync scenario where data transfers from an
                        on-premises storage system to a supported Amazon storage resource (such as an
                        Amazon S3 bucket or Amazon EFS file system).

The diagram illustrates a common DataSync use case:

  • A DataSync agent copying data from an on-premises storage system.

  • Data moving into Amazon via Transport Layer Security (TLS).

  • DataSync copying data to a supported Amazon storage service.

Transferring between Amazon storage services

The following diagram shows a high-level overview of DataSync transferring files between Amazon Web Services in the same Amazon Web Services account.


                    An overview of a common DataSync scenario where data transfers between
                        Amazon storage resources (such as an Amazon S3 bucket or Amazon EFS file
                        system).

The diagram illustrates a common DataSync use case:

  • DataSync copying data from a supported Amazon storage service.

  • Data moving across Amazon Web Services Regions via TLS.

  • DataSync copying data to a supported Amazon storage service.

When transferring between Amazon storage services (whether in the same Amazon Web Services Region or across Amazon Web Services Regions), your data remains in the Amazon network and doesn't traverse the public internet.

Important

You pay for data transferred between Amazon Web Services Regions. This is billed as data transfer OUT from your source Region to your destination Region. For more information, see Data transfer pricing.

Transferring between cloud storage systems and Amazon storage services

With DataSync, you can transfer data between cloud storage systems and Amazon Web Services. In this context, cloud storage systems can include:

  • Self-managed storage systems hosted by Amazon (for example, an NFS share in your virtual private cloud within Amazon).

  • Storage systems or services hosted by another cloud provider.

For more information, see:

Concepts and terminology

Familiarize yourself with DataSync features.

Agent

An agent is a virtual machine (VM) that you own that's used to read or write data from storage systems. The agent can be deployed on VMware ESXi, Linux Kernel-based Virtual Machine (KVM), Microsoft Hyper-V hypervisors, or it can be launched as an Amazon EC2 instance. You use the DataSync console, Amazon CLI, or DataSync API to set up and activate your agent. The activation process associates your agent VM with your Amazon Web Services account. For information about agents, see Working with Amazon DataSync agents.

Location

A location identifies where you're copying data from or to. Each DataSync transfer (also known as a task) has a source and destination location. For more information, see Working with Amazon DataSync locations.

Task

A task describes a DataSync transfer. It identifies a source and destination location along with details about how to copy data between those locations. You also can specify how a task treats metadata, deleted files, and permissions.

Task execution

A task execution is an individual run of a DataSync task. There are several phases involved in a task execution. For more information, see Task execution statuses.

How DataSync transfers files and objects

When you start a transfer, DataSync examines your source and destination storage systems to determine what to sync. It does this by recursively scanning the contents and metadata of both systems to identify differences between the two. This can take just minutes or a few hours depending on the number of files or objects involved (including the performance of the storage systems).

DataSync then begins moving your data (including metadata) from the source to destination based on how you set up the transfer. For example, DataSync always performs data-integrity checks during a transfer. When the transfer's complete, DataSync can also verify the entire dataset between locations or just the data you copied. (In most cases, we recommend verifying only what was transferred.) There are options for filtering what to transfer, too.

How DataSync verifies data integrity

DataSync locally calculates the checksum of every file or object in the source and destination storage systems and compares them. Additionally, DataSync compares the metadata of every file or object in the source and destination. If there are differences in either one, verification fails with an error code that specifies precisely what failed. For example, you might see error codes such as Checksum failure, Metadata failure, Files were added, Files were removed, and so on.

For more information, see Configuring how Amazon DataSync verifies data integrity.

How DataSync handles open and locked files

Keep in mind the following when trying to transfer files that are in use or locked:

  • In general, DataSync can transfer open files without any limitations.

  • If a file is open and being written to during a transfer, DataSync can detect this kind of inconsistency during the transfer task's verification phase. To get the latest version of the file, you must run the task again.

  • If a file is locked and the server prevents DataSync from opening it, DataSync skips the file during the transfer and logs an error.

  • DataSync can't lock or unlock files.