Metadata copied by Amazon DataSync - Amazon DataSync
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Metadata copied by Amazon DataSync

How Amazon DataSync handles your file or object metadata during a transfer depends on what storage systems you're working with.

Note

DataSync doesn't copy system-level settings. For example, when copying objects, DataSync doesn't copy your storage system's encryption setting. If you're copying from an SMB share, DataSync doesn't copy the permissions you configured at the file system level.

Metadata copied between systems with similar metadata structures

DataSync preserves metadata between storage systems that have a similar metadata structure.

NFS transfers

The following table describes what metadata DataSync can copy between locations that use Network File System (NFS).

When copying between these locations DataSync can copy
  • NFS

  • Amazon EFS

  • Amazon FSx for Lustre

  • Amazon FSx for OpenZFS

  • Amazon FSx for NetApp ONTAP (using NFS)

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID (UID) and group ID (GID)

  • POSIX permissions

SMB transfers

The following table describes what metadata DataSync can copy between locations that use Server Message Block (SMB).

When copying between these locations DataSync can copy
  • SMB

  • Amazon FSx for Windows File Server

  • FSx for ONTAP (using SMB)

  • File timestamps: access time, modification time, and creation time

  • File owner security identifier (SID)

  • Standard file attributes: read-only (R), archive (A), system (S), hidden (H), compressed (C), not content indexed (N), encrypted (E), temporary (T), offline (O), and sparse (P)

    DataSync attempts to copy the archive, compressed, and sparse attributes. If these attributes aren't applied on the destination, they're ignored during task verification.

  • NTFS discretionary access lists (DACLs), which determine whether to grant access to an object

  • NTFS system access control lists (SACLs), which are used by administrators to log attempts to access a secured object

    Copying DACLs and SACLs requires granting specific permissions to the Windows user that DataSync uses to access your location using SMB. For more information, see creating a location for SMB, FSx for Windows File Server, or FSx for ONTAP (depending on the type of location in your transfer).

HDFS transfers

The following table describes what metadata DataSync can copy when a transfer involves a Hadoop Distributed File System (HDFS) location.

When copying from this location To one of these locations DataSync can copy
  • HDFS

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • POSIX permissions

HDFS uses strings to store file and folder user and group ownership, rather than numeric identifiers (such as UIDs and GIDs). Default values for UIDs and GIDs are applied on the destination file system. For more information about default values, see Default POSIX metadata applied by DataSync.

Amazon S3 transfers

The following tables describe what metadata DataSync can copy when a transfer involves an Amazon S3 location.

To Amazon S3

When copying from one of these locations To this location DataSync can copy
  • NFS

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Amazon S3

The following as Amazon S3 user metadata:

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID and group ID

  • POSIX permissions

The file metadata stored in Amazon S3 user metadata is interoperable with NFS shares on file gateways using Amazon Storage Gateway. A file gateway enables low-latency access from on-premises networks to data that was copied to Amazon S3 by DataSync. This metadata is also interoperable with FSx for Lustre.

When DataSync copies objects that contain this metadata back to an NFS server, the file metadata is restored. Restoring metadata requires granting elevated permissions to the NFS server. For more information, see Creating an NFS location for Amazon DataSync.

Between HDFS and Amazon S3

When copying between these locations DataSync can copy
  • Hadoop Distributed File System (HDFS)

  • Amazon S3

The following as Amazon S3 user metadata:

  • File and folder modification timestamps

  • File and folder access timestamps (DataSync can only do this on a best-effort basis)

  • User ID and group ID

  • POSIX permissions

HDFS uses strings to store file and folder user and group ownership, rather than numeric identifiers, such as UIDs and GIDs.

Between object storage and Amazon S3

When copying between these locations DataSync can copy
  • Object storage

  • Amazon S3

  • User-defined object metadata

  • Object tags

  • The following system-defined object metadata:

    • Content-Disposition

    • Content-Encoding

    • Content-Language

    • Content-Type

    Note: DataSync copies system metadata for all objects during an initial transfer. If you configure your task to transfer only data that has changed, DataSync won't copy system metadata in subsequent transfers unless an object's content or user metadata has also been modified.

DataSync doesn't copy other object metadata, such as object access control lists (ACLs) or prior object versions.

Important: If you're transferring objects from a Google Cloud Storage bucket, copying object tags may cause your DataSync task to fail. To prevent this, deselect the Copy object tags option when configuring your task settings. For more information, see Managing how Amazon DataSync transfers files, objects, and metadata.

Metadata copied between systems with different metadata structures

When copying between storage systems that don't have a similar metadata structure, DataSync handles metadata using the following rules.

When copying from these locations To these locations DataSync can copy
  • SMB

  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Amazon S3

Default POSIX metadata for all files and folders on the destination file system or objects in the destination S3 bucket. This approach includes using the default POSIX user ID and group ID values.

Windows-based metadata (such as ACLs) is not preserved.

  • FSx for Windows File Server

  • FSx for ONTAP (using SMB)

  • NFS

  • FSx for Windows File Server

  • FSx for ONTAP (using SMB)

  • HDFS

File and folder timestamps from the source location. The file or folder owner is set based on the HDFS user or Kerberos principal you specified when creating the HDFS location. The Groups Mapping configuration on the Hadoop cluster determines the group.
  • Amazon EFS

  • FSx for Lustre

  • FSx for OpenZFS

  • FSx for ONTAP (using NFS)

  • Amazon S3

  • SMB

File and folder timestamps from the source location. Ownership is set based on the Windows user that was specified in DataSync to access the Amazon FSx or SMB share. Permissions are inherited from the parent directory.
  • NFS

  • HDFS

  • FSx for Windows File Server

  • FSx for ONTAP (using SMB)

Default POSIX metadata applied by DataSync

When your source and destination locations don't have a similar metadata structure, or when source metadata is missing, DataSync applies default POSIX metadata.

This is how DataSync applies default POSIX metadata specifically in these situations:

  • When transferring from Amazon S3 or object storage (in cases where Amazon S3 objects don't have DataSync POSIX metadata) to Amazon EFS, FSx for Lustre, FSx for OpenZFS, FSx for ONTAP (using NFS), NFS, or HDFS

  • When transferring from SMB to an NFS, HDFS, Amazon S3, FSx for Lustre, FSx for OpenZFS, FSx for ONTAP (using NFS), or Amazon EFS

The following table describes the default POSIX metadata and permissions that DataSync applies.

Permission Value

UID

65534

GID

65534

Folder Permission

0755

File Permission

0644

HDFS stores file and folder user and group ownership using strings rather than numeric identifiers (such as UIDs and GIDs). When there's no equivalent metadata on the source location, file and folder ownership is set based on the HDFS user or Kerberos principal that you specified when creating the DataSync location. The group is determined by the Groups Mapping configuration on the Hadoop cluster.