Configuring Amazon DataSync transfers with Amazon S3 - Amazon DataSync
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Configuring Amazon DataSync transfers with Amazon S3

To transfer data to or from your Amazon S3 bucket, you create an Amazon DataSync transfer location. DataSync can use this location as a source or destination for transferring data.

Important

Before you create your location, make sure that you read the following sections:

Accessing S3 buckets

DataSync needs access to the S3 bucket that you're transferring to or from. To do this, you must create an Amazon Identity and Access Management (IAM) role that DataSync assumes with the permissions required to access the bucket. You can then specify this role when creating your Amazon S3 location for DataSync.

Creating an IAM role for DataSync to access your Amazon S3 location

When creating your Amazon S3 location in the console, DataSync can automatically create and assume an IAM role that normally has the right permissions to access your S3 bucket.

In some situations, you might need to create this role manually (for example, accessing buckets with extra layers of security or transferring to or from a bucket in a different Amazon Web Services accounts).

  1. Open the IAM console at https://console.amazonaws.cn/iam/.

  2. In the left navigation pane, under Access management, choose Roles, and then choose Create role.

  3. On the Select trusted entity page, for Trusted entity type, choose Amazon Web Service.

  4. For Use case, choose DataSync in the dropdown list and select DataSync. Choose Next.

  5. On the Add permissions page, choose Next. Give your role a name and choose Create role.

  6. On the Roles page, search for the role that you just created and choose its name.

  7. On the role's details page, choose the Permissions tab. Choose Add permissions then Create inline policy.

  8. Choose the JSON tab and paste one of the following sample policies into the policy editor:

    Amazon S3 in Amazon Web Services Regions
    { "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetBucketLocation", "s3:ListBucket", "s3:ListBucketMultipartUploads" ], "Effect": "Allow", "Resource": "arn:aws-cn:s3:::bucket-name" }, { "Action": [ "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:GetObject", "s3:GetObjectTagging", "s3:GetObjectVersion", "s3:GetObjectVersionTagging", "s3:ListMultipartUploadParts", "s3:PutObject", "s3:PutObjectTagging" ], "Effect": "Allow", "Resource": "arn:aws-cn:s3:::bucket-name/*" } ] }
    Amazon S3 on Outposts
    { "Version": "2012-10-17", "Statement": [{ "Action": [ "s3-outposts:ListBucket", "s3-outposts:ListBucketMultipartUploads" ], "Effect": "Allow", "Resource": [ "arn:aws-cn:s3-outposts:region:account-id:outpost/outpost-id/bucket/bucket-name", "arn:aws-cn:s3-outposts:region:account-id:outpost/outpost-id/accesspoint/bucket-access-point-name" ] }, { "Action": [ "s3-outposts:AbortMultipartUpload", "s3-outposts:DeleteObject", "s3-outposts:GetObject", "s3-outposts:GetObjectTagging", "s3-outposts:GetObjectVersion", "s3-outposts:GetObjectVersionTagging", "s3-outposts:ListMultipartUploadParts", "s3-outposts:PutObject", "s3-outposts:PutObjectTagging" ], "Effect": "Allow", "Resource": [ "arn:aws-cn:s3-outposts:region:account-id:outpost/outpost-id/bucket/bucket-name/*", "arn:aws-cn:s3-outposts:region:account-id:outpost/outpost-id/accesspoint/bucket-access-point-name/*" ] }, { "Action": "s3-outposts:GetAccessPoint", "Effect": "Allow", "Resource": "arn:aws-cn:s3-outposts:region:account-id:outpost/outpost-id/accesspoint/bucket-access-point-name" } ] }
  9. Choose Next. Give your policy a name and choose Create policy.

  10. (Recommended) To prevent the cross-service confused deputy problem, do the following:

    1. On the role's details page, choose the Trust relationships tab. Choose Edit trust policy.

    2. Update the trust policy by using the following example, which includes the aws:SourceArn and aws:SourceAccount global condition context keys:

      { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "datasync.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "account-id" }, "StringLike": { "aws:SourceArn": "arn:aws-cn:datasync:region:account-id:*" } } }] }
    3. Choose Update policy.

You can specify this role when creating your Amazon S3 location.

Accessing S3 buckets using server-side encryption

DataSync can transfer data to or from S3 buckets that use server-side encryption. The type of encryption key a bucket uses can determine if you need a custom policy allowing DataSync to access the bucket.

When using DataSync with S3 buckets that use server-side encryption, remember the following:

  • If your S3 bucket is encrypted with an Amazon managed key – DataSync can access the bucket's objects by default if all your resources are in the same Amazon Web Services account.

  • If your S3 bucket is encrypted with a customer-managed Amazon Key Management Service (Amazon KMS) key (SSE-KMS) – The key's policy must include the IAM role that DataSync uses to access the bucket.

  • If your S3 bucket is encrypted with a customer-managed SSE-KMS key and in a different Amazon Web Services account – DataSync needs permission to access the bucket in the other Amazon Web Services account. You can set up this up by doing the following:

  • If your S3 bucket is encrypted with a customer-provided encryption key (SSE-C) – DataSync can't access this bucket.

The following example is a key policy for a customer-managed SSE-KMS key. The policy is associated with an S3 bucket that uses server-side encryption.

If you want to use this example, replace the following values with your own:

  • account-id – Your Amazon Web Services account.

  • admin-role-name – The name of the IAM role that can administer the key.

  • datasync-role-name – The name of the IAM role that allows DataSync to use the key when accessing the bucket.

{ "Id": "key-consolepolicy-3", "Version": "2012-10-17", "Statement": [ { "Sid": "Enable IAM Permissions", "Effect": "Allow", "Principal": { "AWS": "arn:aws-cn:iam::account-id:root" }, "Action": "kms:*", "Resource": "*" }, { "Sid": "Allow access for Key Administrators", "Effect": "Allow", "Principal": { "AWS": "arn:aws-cn:iam::account-id:role/admin-role-name" }, "Action": [ "kms:Create*", "kms:Describe*", "kms:Enable*", "kms:List*", "kms:Put*", "kms:Update*", "kms:Revoke*", "kms:Disable*", "kms:Get*", "kms:Delete*", "kms:TagResource", "kms:UntagResource", "kms:ScheduleKeyDeletion", "kms:CancelKeyDeletion" ], "Resource": "*" }, { "Sid": "Allow use of the key", "Effect": "Allow", "Principal": { "AWS": "arn:aws-cn:iam::account-id:role/datasync-role-name" }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*" ], "Resource": "*" }, { "Sid": "Allow attachment of persistent resources", "Effect": "Allow", "Principal": { "AWS": "arn:aws-cn:iam::account-id:role/datasync-role-name" }, "Action": [ "kms:CreateGrant", "kms:ListGrants", "kms:RevokeGrant" ], "Resource": "*", "Condition": { "Bool": { "kms:GrantIsForAWSResource": "true" } } } ] }

Accessing S3 buckets with restricted VPC access

An Amazon S3 bucket that limits access to specific virtual private cloud (VPC) endpoints or VPCs will deny DataSync from transferring to or from that bucket. To enable transfers in these situations, you can update the bucket's policy to include the IAM role that you specify with your DataSync location.

Option 1: Allowing access based on DataSync location role ARN

In the S3 bucket policy, you can specify the Amazon Resource Name (ARN) of your DataSync location IAM role.

The following example is an S3 bucket policy that denies access from all but two VPCs (vpc-1234567890abcdef0 and vpc-abcdef01234567890). However, the policy also includes the ArnNotLikeIfExists condition and aws:PrincipalArn condition key, which allow the ARN of a DataSync location role to access the bucket.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Access-to-specific-VPCs-only", "Effect": "Deny", "Principal": "*", "Action": "s3:*", "Resource": "arn:aws-cn:s3:::bucket-name/*", "Condition": { "StringNotEqualsIfExists": { "aws:SourceVpc": [ "vpc-1234567890abcdef0", "vpc-abcdef01234567890" ] }, "ArnNotLikeIfExists": { "aws:PrincipalArn": [ "arn:aws-cn:iam::account-id:role/datasync-location-role-name" ] } } } ] }
Option 2: Allowing access based on DataSync location role tag

In the S3 bucket policy, you can specify a tag attached to your DataSync location IAM role.

The following example is an S3 bucket policy that denies access from all but two VPCs (vpc-1234567890abcdef0 and vpc-abcdef01234567890). However, the policy also includes the StringNotEqualsIfExists condition and aws:PrincipalTag condition key, which allow a principal with the tag key exclude-from-vpc-restriction and value true. You can try a similar approach in your bucket policy by specifying a tag attached to your DataSync location role.

{ "Version": "2012-10-17", "Statement": [ { "Sid": "Access-to-specific-VPCs-only", "Effect": "Deny", "Principal": "*", "Action": "s3:*", "Resource": "arn:aws-cn:s3:::bucket-name/*", "Condition": { "StringNotEqualsIfExists": { "aws:SourceVpc": [ "vpc-1234567890abcdef0", "vpc-abcdef01234567890" ], "aws:PrincipalTag/exclude-from-vpc-restriction": "true" } } } ] }

Storage class considerations with Amazon S3 transfers

When Amazon S3 is your transfer destination, DataSync can transfer data directly into a specific Amazon S3 storage class. Some storage classes have behaviors that can affect your Amazon S3 storage costs. For more information, see Amazon S3 pricing.

Important

New objects copied to an S3 bucket are stored using the storage class that you specify when creating your Amazon S3 transfer location. DataSync won't change the storage class of existing objects in the bucket (even if that object was modified in the source location).

Amazon S3 storage class Considerations
S3 Standard Choose S3 Standard to store your frequently accessed files redundantly in multiple Availability Zones that are geographically separated. This is the default if you don't specify a storage class.
S3 Intelligent-Tiering

Choose S3 Intelligent-Tiering to optimize storage costs by automatically moving data to the most cost-effective storage access tier.

You pay a monthly charge per object stored in the S3 Intelligent-Tiering storage class. This Amazon S3 charge includes monitoring data access patterns and moving objects between tiers.

S3 Standard-IA

Choose S3 Standard-IA to store your infrequently accessed objects redundantly in multiple Availability Zones that are geographically separated.

Objects stored in the S3 Standard-IA storage class can incur additional charges for overwriting, deleting, or retrieving. Consider how often these objects change, how long you plan to keep these objects, and how often you need to access them. Changes to object data or metadata are equivalent to deleting an object and creating a new one to replace it. This results in additional charges for objects stored in the S3 Standard-IA storage class.

Objects less than 128 KB are smaller than the minimum capacity charge per object in the S3 Standard-IA storage class. These objects are stored in the S3 Standard storage class.

S3 One Zone-IA

Choose S3 One Zone-IA to store your infrequently accessed objects in a single Availability Zone.

Objects stored in the S3 One Zone-IA storage class can incur additional charges for overwriting, deleting, or retrieving. Consider how often these objects change, how long you plan to keep these objects, and how often you need to access them. Changes to object data or metadata are equivalent to deleting an object and creating a new one to replace it. This results in additional charges for objects stored in the S3 One Zone-IA storage class.

Objects less than 128 KB are smaller than the minimum capacity charge per object in the S3 One Zone-IA storage class. These objects are stored in the S3 Standard storage class.

S3 Glacier Instant Retrieval

Choose S3 Glacier Instant Retrieval to archive objects that are rarely accessed but require retrieval in milliseconds.

Data stored in the S3 Glacier Instant Retrieval storage class offers cost savings compared to the S3 Standard-IA storage class with the same latency and throughput performance. S3 Glacier Instant Retrieval has higher data access costs than S3 Standard-IA, though.

Objects stored in S3 Glacier Instant Retrieval can incur additional charges for overwriting, deleting, or retrieving. Consider how often these objects change, how long you plan to keep these objects, and how often you need to access them. Changes to object data or metadata are equivalent to deleting an object and creating a new one to replace it. This results in additional charges for objects stored in the S3 Glacier Instant Retrieval storage class.

Objects less than 128 KB are smaller than the minimum capacity charge per object in the S3 Glacier Instant Retrieval storage class. These objects are stored in the S3 Standard storage class.

S3 Glacier Flexible Retrieval

Choose S3 Glacier Flexible Retrieval for more active archives.

Objects stored in S3 Glacier Flexible Retrieval can incur additional charges for overwriting, deleting, or retrieving. Consider how often these objects change, how long you plan to keep these objects, and how often you need to access them. Changes to object data or metadata are equivalent to deleting an object and creating a new one to replace it. This results in additional charges for objects stored in the S3 Glacier Flexible Retrieval storage class.

Objects less than 40 KB are smaller than the minimum capacity charge per object in the S3 Glacier Flexible Retrieval storage class. These objects are stored in the S3 Standard storage class.

You must restore objects archived in this storage class before DataSync can read them. For information, see Working with archived objects in the Amazon S3 User Guide.

When using S3 Glacier Flexible Retrieval, choose the Verify only the data transferred task option to compare data and metadata checksums at the end of the transfer. You can't use the Verify all data in the destination option for this storage class because it requires retrieving all existing objects from the destination.

S3 Glacier Deep Archive

Choose S3 Glacier Deep Archive to archive your objects for long-term data retention and digital preservation where data is accessed once or twice a year.

Objects stored in S3 Glacier Deep Archive can incur additional charges for overwriting, deleting, or retrieving. Consider how often these objects change, how long you plan to keep these objects, and how often you need to access them. Changes to object data or metadata are equivalent to deleting an object and creating a new one to replace it. This results in additional charges for objects stored in the S3 Glacier Deep Archive storage class.

Objects less than 40 KB are smaller than the minimum capacity charge per object in the S3 Glacier Deep Archive storage class. These objects are stored in the S3 Standard storage class.

You must restore objects archived in this storage class before DataSync can read them. For information, see Working with archived objects in the Amazon S3 User Guide.

When using S3 Glacier Deep Archive, choose the Verify only the data transferred task option to compare data and metadata checksums at the end of the transfer. You can't use the Verify all data in the destination option for this storage class because it requires retrieving all existing objects from the destination.

S3 Outposts

The storage class for Amazon S3 on Outposts.

Evaluating S3 request costs when using DataSync

With Amazon S3 locations, you incur costs related to S3 API requests made by DataSync. This section can help you understand how DataSync uses these requests and how they might affect your Amazon S3 costs.

S3 requests made by DataSync

The following table describes the S3 requests that DataSync can make when you’re copying data to or from an Amazon S3 location.

S3 request How DataSync uses it

ListObjectV2

DataSync makes at least one LIST request for every object ending in a forward slash (/) to list the objects that start with that prefix. This request is called during a task’s preparing phase.

HeadObject

DataSync makes HEAD requests to retrieve object metadata during a task’s preparing and verifying phases. There can be multiple HEAD requests per object depending on how you want DataSync to verify the integrity of the data it transfers.

GetObject

DataSync makes GET requests to read data from an object during a task’s transferring phase. There can be multiple GET requests for large objects.

GetObjectTagging

If you configure your task to copy object tags, DataSync makes these GET requests to check for object tags during the task's preparing and transferring phases.

PutObject

DataSync makes PUT requests to create objects and prefixes in a destination S3 bucket during a task’s transferring phase. Since DataSync uses the Amazon S3 multipart upload feature, there can be multiple PUT requests for large objects.

PutObjectTagging

If your source objects have tags and you configure your task to copy object tags, DataSync makes these PUT requests when transferring those tags.

CopyObject

DataSync makes a COPY request to create a copy of an object only if that object’s metadata changes. This can happen if you originally copied data to the S3 bucket using another service or tool that didn’t carry over its metadata.

Cost considerations

DataSync makes S3 requests on S3 buckets every time you run your task. This can lead to charges adding up in certain situations. For example:

  • You’re frequently transferring objects to or from an S3 bucket.

  • You may not be transferring much data, but your S3 bucket has lots of objects in it. You can still see high charges in this scenario because DataSync makes S3 requests on each of the bucket's objects.

  • You're transferring between S3 buckets, so DataSync is making S3 requests on the source and destination.

To help minimize S3 request costs related to DataSync, consider the following:

What S3 storage classes am I using?

S3 request charges can vary based on the Amazon S3 storage class your objects are using, particularly for classes that archive objects (such as S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive).

Here are some scenarios in which storage classes can affect your S3 request charges when using DataSync:

  • Each time you run a task, DataSync makes HEAD requests to retrieve object metadata. These requests result in charges even if you aren’t moving any objects. How much these requests affect your bill depends on the storage class your objects are using along with the number of objects that DataSync scans.

  • If you moved objects into the S3 Glacier Instant Retrieval storage class (either directly or through a bucket lifecycle configuration), requests on objects in this class are more expensive than objects in other storage classes.

  • If you configure your DataSync task to verify that your source and destination locations are fully synchronized, there will be GET requests for each object in all storage classes (except S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive).

  • In addition to GET requests, you incur data retrieval costs for objects in the S3 Standard-IA, S3 One Zone-IA, or S3 Glacier Instant Retrieval storage class.

For more information, see Amazon S3 pricing.

How often do I need to transfer my data?

If you need to move data on a recurring basis, think about a schedule that doesn't run more tasks than you need.

You may also consider limiting the scope of your transfers. For example, you can configure DataSync to focus on objects in certain prefixes or filter what data gets transferred. These options can help reduce the number of S3 requests made each time you run your DataSync task.

Other considerations with Amazon S3 transfers

When using Amazon S3 with DataSync, remember the following:

  • Changes to object data or metadata are equivalent to deleting and replacing an object. These changes result in additional charges in the following scenarios:

    • When using object versioning – Changes to object data or metadata create a new version of the object.

    • When using storage classes that can incur additional charges for overwriting, deleting, or retrieving objects – Changes to object data or metadata result in such charges. For more information, see Storage class considerations with Amazon S3 transfers.

  • When using object versioning in Amazon S3, running a DataSync transfer task once might create more than one version of an Amazon S3 object.

  • DataSync might not transfer an object if it has nonstandard characters in its name. For more information, see the object key naming guidelines in the Amazon S3 User Guide.

  • To help minimize your Amazon S3 storage costs, we recommend using a lifecycle configuration to stop incomplete multipart uploads.

  • After initially transferring data from an S3 bucket to a file system (for example, NFS or Amazon FSx), subsequent runs of the same DataSync task won't include objects that have been modified but are the same size they were during the first transfer.

  • If you're transferring from an S3 bucket, use Amazon S3 Storage Lens to figure out how many objects you're moving.

    Tip

    When transferring between S3 buckets, DataSync can't work with more than 25 million objects per task execution. If there are more than 25 million objects involved, we recommend a couple options:

Creating your Amazon S3 transfer location

To create the location, you need an existing S3 bucket. If you don't have one, see Getting started with Amazon S3 in the Amazon S3 User Guide.

Tip

If your S3 bucket has objects with different storage classes, learn how DataSync works with these storage classes and how it can affect your Amazon bill.

To create an Amazon S3 location
  1. Open the Amazon DataSync console at https://console.amazonaws.cn/datasync/.

  2. In the left navigation pane, expand Data transfer, then choose Locations and Create location.

  3. For Location type, choose Amazon S3.

  4. For S3 bucket, choose the bucket that you want to use as a location. (When creating your DataSync task later, you specify whether this location is a transfer source or destination.)

    If your S3 bucket is located on an Amazon Outposts resource, you must specify an Amazon S3 access point. For more information, see Managing data access with Amazon S3 access points in the Amazon S3 User Guide.

  5. For S3 storage class, choose a storage class that you want your objects to use when Amazon S3 is a transfer destination.

    For more information, see Storage class considerations with Amazon S3 transfers.

  6. (Amazon S3 on Outposts only) For Agents, specify the Amazon Resource Name (ARN) of the DataSync agent on your Outpost.

    For more information, see Deploy your agent on Amazon Outposts.

  7. For Folder, enter a prefix in the S3 bucket that DataSync reads from or writes to (depending on whether the bucket is a source or destination location).

    Warning

    DataSync can't transfer objects with a prefix that begins with a slash (/) or includes //, /./, or /../ patterns. For example:

    • /photos

    • photos//2006/January

    • photos/./2006/February

    • photos/../2006/March

  8. For IAM role, do one of the following:

    • Choose Autogenerate for DataSync to automatically create an IAM role with the permissions required to access the S3 bucket.

      If DataSync previously created an IAM role for this S3 bucket, that role is chosen by default.

    • Choose a custom IAM role that you created. For more information, see Creating an IAM role for DataSync to access your Amazon S3 location.

  9. (Optional) Choose Add tag to tag your Amazon S3 location.

    A tag is a key-value pair that helps you manage, filter, and search for your locations.

  10. Choose Create location.

Once created, you can use this location as a source or destination for your transfer.