Checking object integrity for data at rest in Amazon S3
If you need to verify the content of datasets stored in Amazon S3, the S3 Batch Operations Compute checksum operation calculates both full object or composite checksums for objects at rest. The Compute checksum operation uses Batch Operations to asynchronously calculate the checksum values for a group of objects and automatically generates a consolidated integrity report, without creating new copies of your data, or restoring or downloading any data.
With the Compute checksum operation, you can efficiently verify billions of objects with a single job request. For each Compute checksum job request, S3 calculates the checksum values, and includes it in an automatically generated integrity report (also known as a completion report). You can then use the completion report to validate the integrity of your dataset.
The Compute checksum operation works with any object stored in S3, regardless of storage
class or object size. Whether you need to verify your objects as a data preservation best
practice, or meet compliance requirements, the Compute checksum
operation reduces the cost, time, and effort required for data validation by performing
checksum calculations at rest. For information about Compute checksum
pricing, see Amazon S3 pricing
Then, you can use the output of the generated completion report to compare against the checksum values that you’ve stored in your databases to verify that your datasets remain intact over time. This approach helps you maintain end-to-end data integrity for business and compliance needs. For example, you can use the Compute checksum operation to submit a list of stored objects in S3 Glacier storage classes for annual security audits. Additionally, the range of supported checksum algorithms allow you to maintain continuity with the algorithms that are used in your applications.
Using supported checksum algorithms
For data at rest, you can calculate both the full object and composite checksum types in Amazon S3, using any of the supported checksum algorithms:
-
CRC-64/NVME (
CRC64NVME
) -
CRC-32 (
CRC32
) -
CRC-32C (
CRC32C
) -
SHA-1 (
SHA1
) -
SHA-256 (
SHA256
) -
MD5 (
MD5
)
Full object and composite checksum types
Amazon S3 supports the following full object and composite checksum algorithm types:
-
CRC-64/NVME (
CRC64NVME
): Supports the full object checksum type only. -
CRC-32 (
CRC32
): Supports both full object and composite checksum types. -
CRC-32C (
CRC32C
): Supports both full object and composite checksum types. -
SHA-1 (
SHA1
): Supports both full object and composite checksum types. -
SHA-256 (
SHA256
): Supports both full object and composite checksum types. -
MD5 (
MD5
): Supports both full object and composite checksum types.
Using Compute checksum
For objects stored in Amazon S3, you can use the Compute checksum operation with S3 Batch Operations to check the content of stored data at rest. You can create a Compute checksum Batch Operations job by using the Amazon S3 console, Amazon Command Line Interface (Amazon CLI), REST API, or Amazon SDK. When the Compute checksum job finishes, you receive a completion report. For more information about how to use the completion report, see Tracking job status and completion reports.
Before creating your Compute checksum job, you must create an S3 Batch Operations Amazon Identity and Access Management (IAM) role to grant Amazon S3 permissions to perform actions on your behalf. You’ll need to grant permissions to read the manifest file and to write a completion report to the S3 bucket. For more information, see Compute checksums.
To use the Compute checksum operation
-
Sign in to the Amazon Web Services Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/
. -
In the navigation bar on the top of the page, choose the name of the currently displayed Amazon Web Services Region. Next, choose the Region in which you want to create your job.
Note
For copy operations, you must create the job in the same Region as the destination bucket. For all other operations, you must create the job in the same Region as the objects in the manifest.
-
Choose Batch Operations on the left navigation pane of the Amazon S3 console.
-
Choose Create job.
-
View the Amazon Web Services Region where you want to create your job.
Note
For copy operations, you must create the job in the same Region as the destination bucket. For all other operations, you must create the job in the same Region as the objects in the manifest.
-
Under Manifest format, choose the type of manifest object to use.
-
If you choose S3 inventory report (manifest.json), enter the path to the
manifest.json
object, and (optionally) the Manifest object version ID if you want to use a specific object version. Alternatively, you can choose Browse S3 and choose the manifest JSON file, which auto populates all the manifest object field entries. -
If you choose CSV, choose the Manifest location type, Then, either enter the path to a CSV-formatted manifest object or choose Browse S3 to select a manifest object. The manifest object must follow the format described in the console. If you want to use a specific version of the manifest object, then you can also specify the object version ID.
-
If you choose Create manifest using S3 Replication configuration, a list of objects will be generated using the replication configuration and optionally saved to the destination you choose. When using a replication configuration to generate the manifest, the only operation that will be available is Replicate.
-
-
Choose Next.
-
Under Operation, choose the Compute checksum operation to calculate the checksums on all objects listed in the manifest. Choose the Checksum type and Checksum function for your job. Then, choose Next.
-
Fill out the information for Configure additional options, and then choose Next.
-
On the Configure additional options page, fill out the information for your Compute checksum job.
Note
Under Completion report, make sure to confirm the acknowledgement statement. This acknowledgement statement confirms that you understand that the completion report contains checksum values, which are used to verify the integrity of data stored in Amazon S3. Therefore, the completion report should be shared with caution. Also, be aware that if you're creating a Compute checksum request and you specify an external account owner's bucket location to store your completion report, make sure to specify the Amazon Web Services account ID of the external bucket owner.
-
Choose Next.
-
On the Review page, review and confirm your settings.
-
(Optional) If you need to make changes, choose Previous to go back to the previous page, or choose Edit to update a specific step.
After you've confirmed your changes, choose Create job.
To list and monitor the progress of all Compute checksum requests
Sign in to the Amazon Web Services Management Console and open the Amazon S3 console at https://console.amazonaws.cn/s3/
. -
In the left navigation pane, choose Batch Operations.
-
On the Batch Operations page, you can review job details such as the job priority, job completion rate, and total objects.
-
If you want to manage or clone a specific Compute checksum job, click on the Job ID to review additional job information.
-
On the specific Compute checksum job page, review the job details.
Each batch operations job progresses through different job statuses. You can also enable Amazon CloudTrail events in the S3 console to receive alerts on any job state changes. For active jobs, you can review the running job and completion rate on the Job details page.
You can use the create-job command to create a new batch
operations job, and to provide the list of objects. Then, specify the checksum
algorithm and checksum type, and the destination bucket where you want to save
the Compute checksum report. The following example creates
an S3 Batch Operations Compute checksum job by using an S3
generated manifest for the Amazon Web Services account
111122223333
.
To use this command, replace the user input
placeholders
with your own information:
aws s3control create-job \ --account-id
111122223333
\ --manifest '{"Spec":{"Format":"S3BatchOperations_CSV_20180820
","Fields":["Bucket","Key"]},"Location":{"ObjectArn":"arn:aws:s3:::my-manifest-bucket/manifest
.csv","ETag":"e0e8bfc50e0f0c5d5a1a5f0e0e8bfc50
"}}' \ --manifest-generator '{ "S3JobManifestGenerator": { "ExpectedBucketOwner": "111122223333
", "SourceBucket": "arn:aws:s3:::amzn-s3-demo-source-bucket
", "EnableManifestOutput":true
, "ManifestOutputLocation": { "ExpectedManifestBucketOwner": "111122223333
", "Bucket": "arn:aws:s3:::amzn-s3-demo-manifest-bucket
", "ManifestPrefix": "prefix
", "ManifestFormat": "S3InventoryReport_CSV_20211130
" }, "Filter": { "CreatedAfter": "2023-09-01
", "CreatedBefore": "2023-10-01
", "KeyNameConstraint": { "MatchAnyPrefix": [ "prefix
" ], "MatchAnySuffix": [ "suffix
" ] }, "ObjectSizeGreaterThanBytes":100
, "ObjectSizeLessThanBytes":200
, "MatchAnyStorageClass": [ "STANDARD", "STANDARD_IA" ] } } }' \ --operation '{"S3ComputeObjectChecksum":{"ChecksumAlgorithm":"CRC64NVME
","ChecksumType":"FULL_OBJECT
"}}' \ --report '{"Bucket":"arn:aws:s3:::my-report-bucket
","Format":"Report_CSV_20180820
","Enabled":true
,"Prefix":"batch-op-reports
/","ReportScope":"AllTasks
","ExpectedBucketOwner":"111122223333
"}' \ --priority10
\ --role-arn arn:aws:iam::123456789012
:role/S3BatchJobRole
\ --client-request-token6e023a7e-4820-4654-8c81-7247361aeb73
\ --description "Compute object checksums
" \ --regionus-west-2
After you submit the Compute checksum job, you receive the job ID as a response and it appears on the S3 Batch Operations list page. Amazon S3 processes the list of objects and calculates checksums for each object. After the job finishes, S3 provides a consolidated Compute checksum report at the specified destination.
To monitor the progress of your Compute checksum job,
use the describe-jobuser input placeholders
with your own
information.
For example:
aws s3control describe-job --account-id
111122223333
--job-id1234567890abcdef0
To obtain a
list of all Active and
Complete batch operations jobs, see Listing
jobs or list-jobs
You can send REST requests to verify object integrity with Compute checksum using CreateJob. You can monitor the progress of Compute checksum requests by sending REST requests to the DescribeJob API operation. Each batch operations job progresses through the following statuses:
-
NEW
-
PREPARING
-
READY
-
ACTIVE
-
PAUSING
-
PAUSED
-
COMPLETE
-
CANCELLING
-
FAILED
The API response notifies you of the current job state.