元数据表查询示例 - Amazon Simple Storage Service
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅 中国的 Amazon Web Services 服务入门 (PDF)

元数据表查询示例

以下示例说明如何使用标准 SQL 查询从 S3 元数据表中获取不同类型的信息。

使用这些示例时请记住:

  • 这些示例是为与 Amazon Athena 结合使用而编写的。您可能需要修改示例才能使用其它查询引擎。

  • 确保您了解如何优化查询

  • amzn-s3-demo-bucket 替换为存储元数据表的 S3 表存储桶的名称。

  • my_metadata_table 替换为您正在查询的元数据表的名称。

  • 有关支持的列的完整列表,请参阅 S3 元数据表架构

按文件扩展名查找对象

以下查询返回具有特定文件扩展名(在本例中为 .jpg)的对象。

SELECT key FROM "s3tablescatalog/amzn-s3-demo-bucket"."aws_s3_metadata"."my_metadata_table" WHERE key LIKE '%.jpg' AND record_type = 'CREATE'

列出对象删除操作

以下查询返回对象删除事件,包括发出请求的 Amazon Web Services 账户 ID 或 Amazon 服务主体。

SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, requester, source_ip_address, version_id FROM "s3tablescatalog/amzn-s3-demo-bucket"."aws_s3_metadata"."my_metadata_table" WHERE record_type = 'DELETE';

列出您的对象使用的 Amazon KMS 加密密钥

以下查询返回加密对象的 Amazon Key Management Service(Amazon KMS)密钥的 ARN。

SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/amzn-s3-demo-bucket"."aws_s3_metadata"."my_metadata_table";

列出不使用 KMS 密钥的对象

以下查询返回未使用 Amazon KMS 密钥加密的对象。

SELECT DISTINCT kms_key_arn FROM "s3tablescatalog/amzn-s3-demo-bucket"."aws_s3_metadata"."my_metadata_table" WHERE encryption_status NOT IN ('SSE-KMS', 'DSSE-KMS') AND record_type = 'CREATE';

查看 Amazon Bedrock 提供的元数据

某些 Amazon 服务(例如 Amazon Bedrock)将对象上传到 Amazon S3。您可以查询这些服务提供的对象元数据。例如,以下查询包含用于确定 Amazon Bedrock 是否有对象上传到通用存储桶的 user_metadata 列。

SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, user_metadata FROM "s3tablescatalog/amzn-s3-demo-bucket"."aws_s3_metadata"."my_metadata_table" WHERE record_type = 'CREATE' AND user_metadata['content-source'] = 'AmazonBedrock';

如果 Amazon Bedrock 将对象上传到存储桶,则 user_metadata 列将在查询结果中显示与该对象关联的以下元数据:

user_metadata {content-additional-params -> requestid="CVK8FWYRW0M9JW65", signedContentSHA384="38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b", content-model-id -> bedrock-model-arn, content-source -> AmazonBedrock}

了解对象的当前状态

以下查询有助于您确定对象的当前状态。该查询可识别每个对象的最新版本,筛选掉已删除的对象,并根据序列号标记每个对象的最新版本。结果按 bucketkeysequence_number 列排序。

WITH records_of_interest as ( -- Start with a query that can narrow down the records of interest. SELECT * from "s3tablescatalog/amzn-s3-demo-bucket"."aws_s3_metadata"."my_metadata_table" ), version_stacks as ( SELECT *, -- Introduce a column called 'next_sequence_number', which is the next larger -- sequence_number for the same key version_id in sorted order. LEAD(sequence_number, 1) over (partition by (bucket, key, coalesce(version_id, '')) order by sequence_number ASC) as next_sequence_number from records_of_interest ), -- Pick the 'tip' of each version stack triple: (bucket, key, version_id). -- The tip of the version stack is the row of that triple with the largest sequencer. -- Selecting only the tip filters out any row duplicates. -- This isn't typical, but some events can be delivered more than once to the table -- and include rows that might no longer exist in the bucket (since the -- table contains rows for both extant and extinct objects). -- In the next subquery, eliminate the rows that contain deleted objects. current_versions as ( SELECT * from version_stacks where next_sequence_number is NULL ), -- Eliminate the rows that are extinct from the bucket by filtering with -- record_type. An object version has been deleted from the bucket if its tip is -- record_type==DELETE. existing_current_versions as ( SELECT * from current_versions where not (record_type = 'DELETE' and is_delete_marker = FALSE) ), -- Optionally, to determine which of several object versions is the 'latest', -- you can compare their sequence numbers. A version_id is the latest if its -- tip's sequencer is the largest among all other tips in the same key. with_is_latest as ( SELECT *, -- Determine if the sequence_number of this row is the same as the largest sequencer for the key that still exists. sequence_number = (MAX(sequence_number) over (partition by (bucket, key))) as is_latest_version FROM existing_current_versions ) SELECT * from with_is_latest ORDER BY bucket, key, sequence_number;