

# 元数据表查询示例
<a name="metadata-tables-example-queries"></a>

以下示例说明如何使用标准 SQL 查询从 S3 元数据表中获取不同类型的信息。

使用这些示例时请记住：
+ 这些示例是为与 Amazon Athena 结合使用而编写的。您可能需要修改示例才能使用其它查询引擎。
+ 确保您了解如何[优化查询](metadata-tables-optimizing-query-performance.md)。
+ 将 `b_general-purpose-bucket-name` 替换为命名空间的名称。
+ 有关支持的列的完整列表，请参阅 [S3 元数据日记表架构](metadata-tables-schema.md)和 [S3 元数据实时清单表架构](metadata-tables-inventory-schema.md)。

**Contents**
+ [日记表示例查询](#metadata-tables-example-queries-journal-tables)
  + [按文件扩展名查找对象](#metadata-tables-example-query-object-pattern)
  + [列出对象删除操作](#metadata-tables-example-query-delete-events)
  + [列出您的对象使用的 Amazon KMS 加密密钥](#metadata-tables-example-query-objects-using-kms-key)
  + [列出不使用 KMS 密钥的对象](#metadata-tables-example-query-objects-not-using-kms-key)
  + [列出过去 7 天内用于 `PUT` 操作的 Amazon KMS 加密密钥](#metadata-tables-example-query-objects-using-kms-key-puts)
  + [列出 S3 生命周期在过去 24 小时内删除的对象](#metadata-tables-example-query-objects-deleted-lifecycle)
  + [查看 Amazon Bedrock 提供的元数据](#metadata-tables-example-query-bedrock)
  + [了解对象的当前状态](#metadata-tables-example-query-current-state)
+ [清单表示例查询](#metadata-tables-example-queries-inventory-tables)
  + [发现使用特定标签的数据集](#metadata-tables-example-query-datasets-specific-tags)
  + [列出未使用 SSE-KMS 加密的对象](#metadata-tables-example-query-objects-not-kms-encrypted)
  + [列出未加密的对象](#metadata-tables-example-query-objects-not-encrypted)
  + [列出 Amazon Bedrock 生成的对象](#metadata-tables-example-query-objects-generated-bedrock)
  + [将清单表与日记表进行协调](#metadata-tables-example-query-generate-latest-inventory)
  + [查找对象的当前版本](#metadata-tables-example-query-latest-version)
+ [将自定义元数据与 S3 元数据表相联接](metadata-tables-join-custom-metadata.md)
+ [使用 Amazon Quick 实现元数据表数据可视化](metadata-tables-quicksight-dashboards.md)

## 日记表示例查询
<a name="metadata-tables-example-queries-journal-tables"></a>

可以使用以下示例查询来查询日记表。

### 按文件扩展名查找对象
<a name="metadata-tables-example-query-object-pattern"></a>

以下查询返回具有特定文件扩展名（在本例中为 `.jpg`）的对象：

```
SELECT key FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
WHERE key LIKE '%.jpg'
AND record_type = 'CREATE'
```

### 列出对象删除操作
<a name="metadata-tables-example-query-delete-events"></a>

以下查询返回对象删除事件，包括发出请求的 Amazon Web Services 账户 ID 或 Amazon 服务主体：

```
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, requester, source_ip_address, version_id
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
WHERE record_type = 'DELETE';
```

### 列出您的对象使用的 Amazon KMS 加密密钥
<a name="metadata-tables-example-query-objects-using-kms-key"></a>

以下查询返回用于加密对象的 Amazon Key Management Service（Amazon KMS）密钥的 ARN：

```
SELECT DISTINCT kms_key_arn
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal";
```

### 列出不使用 KMS 密钥的对象
<a name="metadata-tables-example-query-objects-not-using-kms-key"></a>

以下查询返回未使用 Amazon KMS 密钥加密的对象：

```
SELECT DISTINCT kms_key_arn
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
WHERE encryption_status NOT IN ('SSE-KMS', 'DSSE-KMS')
AND record_type = 'CREATE';
```

### 列出过去 7 天内用于 `PUT` 操作的 Amazon KMS 加密密钥
<a name="metadata-tables-example-query-objects-using-kms-key-puts"></a>

以下查询返回用于加密对象的 Amazon Key Management Service（Amazon KMS）密钥的 ARN：

```
SELECT DISTINCT kms_key_arn 
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
WHERE record_timestamp > (current_date - interval '7' day)
AND kms_key_arn is NOT NULL;
```

### 列出 S3 生命周期在过去 24 小时内删除的对象
<a name="metadata-tables-example-query-objects-deleted-lifecycle"></a>

以下查询返回 S3 生命周期在最后一天过期的对象：

```
SELECT bucket, key, version_id, last_modified_date, record_timestamp, requester
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
WHERE requester = 's3.amazonaws.com'
AND record_type = 'DELETE' 
AND record_timestamp > (current_date - interval '1' day)
```

### 查看 Amazon Bedrock 提供的元数据
<a name="metadata-tables-example-query-bedrock"></a>

某些 Amazon 服务（例如 [Amazon Bedrock](https://docs.amazonaws.cn/bedrock/latest/APIReference/welcome.html)）将对象上传到 Amazon S3。您可以查询这些服务提供的对象元数据。例如，以下查询包含用于确定 Amazon Bedrock 是否有对象上传到通用存储桶的 `user_metadata` 列：

```
SELECT DISTINCT bucket, key, sequence_number, record_type, record_timestamp, user_metadata
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
WHERE record_type = 'CREATE'
AND user_metadata['content-source'] = 'AmazonBedrock';
```

如果 Amazon Bedrock 将对象上传到存储桶，则 `user_metadata` 列将在查询结果中显示与该对象关联的以下元数据：

```
user_metadata
{content-additional-params -> requestid="CVK8FWYRW0M9JW65", signedContentSHA384="38b060a751ac96384cd9327eb1b1e36a21fdb71114be07434c0cc7bf63f6e1da274edebfe76f65fbd51ad2f14898b95b", content-model-id -> bedrock-model-arn, content-source -> AmazonBedrock}
```

### 了解对象的当前状态
<a name="metadata-tables-example-query-current-state"></a>

以下查询有助于您确定对象的当前状态。该查询可识别每个对象的最新版本，筛选掉已删除的对象，并根据序列号标记每个对象的最新版本。结果按 `bucket`、`key` 和 `sequence_number` 列排序。

```
WITH records_of_interest as (
   -- Start with a query that can narrow down the records of interest.
    SELECT * from "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."journal"
),

version_stacks as (
   SELECT *,
          -- Introduce a column called 'next_sequence_number', which is the next larger
          -- sequence_number for the same key version_id in sorted order.
          LEAD(sequence_number, 1) over (partition by (bucket, key, coalesce(version_id, '')) order by sequence_number ASC) as next_sequence_number
   from records_of_interest
),

-- Pick the 'tip' of each version stack triple: (bucket, key, version_id).
-- The tip of the version stack is the row of that triple with the largest sequencer.
-- Selecting only the tip filters out any row duplicates.
-- This isn't typical, but some events can be delivered more than once to the table
-- and include rows that might no longer exist in the bucket (since the
-- table contains rows for both extant and extinct objects).
-- In the next subquery, eliminate the rows that contain deleted objects.
current_versions as (
    SELECT * from version_stacks where next_sequence_number is NULL
),

-- Eliminate the rows that are extinct from the bucket by filtering with
-- record_type. An object version has been deleted from the bucket if its tip is
-- record_type==DELETE.
existing_current_versions as (
    SELECT * from current_versions where not (record_type = 'DELETE' and is_delete_marker = FALSE)
),

-- Optionally, to determine which of several object versions is the 'latest',
-- you can compare their sequence numbers. A version_id is the latest if its
-- tip's sequencer is the largest among all other tips in the same key.
with_is_latest as (
    SELECT *,
           -- Determine if the sequence_number of this row is the same as the largest sequencer for the key that still exists.
           sequence_number = (MAX(sequence_number) over (partition by (bucket, key))) as is_latest_version
    FROM existing_current_versions
)

SELECT * from with_is_latest
ORDER BY bucket, key, sequence_number;
```

## 清单表示例查询
<a name="metadata-tables-example-queries-inventory-tables"></a>

可以使用以下示例查询来查询清单表。

### 发现使用特定标签的数据集
<a name="metadata-tables-example-query-datasets-specific-tags"></a>

以下查询返回使用指定标签的数据集：

```
SELECT * 
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."inventory"
WHERE object_tags['key1'] = 'value1'
AND object_tags['key2'] = 'value2';
```

### 列出未使用 SSE-KMS 加密的对象
<a name="metadata-tables-example-query-objects-not-kms-encrypted"></a>

以下查询返回未使用 SSE-KMS 加密的对象。

```
SELECT key, encryption_status 
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."inventory"
WHERE encryption_status != 'SSE-KMS';
```

### 列出未加密的对象
<a name="metadata-tables-example-query-objects-not-encrypted"></a>

以下查询返回未加密的对象：

```
SELECT bucket, key, version_id  
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."inventory"
WHERE encryption_status IS NULL;
```

### 列出 Amazon Bedrock 生成的对象
<a name="metadata-tables-example-query-objects-generated-bedrock"></a>

以下查询列出了由 Amazon Bedrock 生成的对象：

```
SELECT DISTINCT bucket, key, sequence_number, user_metadata
FROM "s3tablescatalog/aws-s3"."b_general-purpose-bucket-name"."inventory"
WHERE user_metadata['content-source'] = 'AmazonBedrock';
```

### 将清单表与日记表进行协调
<a name="metadata-tables-example-query-generate-latest-inventory"></a>

以下查询生成一个类似清单表的列表，该列表包含存储桶的当前内容。更确切地说，生成的列表将清单表的最新快照与日记表中的最新事件相结合。

要使此查询生成最准确的结果，日记表和清单表都必须处于“活动”状态。

我们建议将此查询用于所含对象数少于十亿（10^9）的通用存储桶。

此示例查询将以下简化应用于列表结果（与清单表相比）：
+ **列忽略**：列 `bucket`、`is_multipart`、`encryption_status`、`is_bucket_key_enabled`、`kms_key_arn` 和 `checksum_algorithm` 不是最终结果的一部分。将可选列集保持在最低限度可以提高性能。
+ **包含所有记录**：查询返回所有对象键和版本，包括空版本（在未进行版本控制或暂停版本控制的存储桶中）和删除标记。有关如何筛选结果以仅显示您感兴趣的键的示例，请参阅查询末尾的 `WHERE` 子句。
+ **加速协调**：在极少数情况下，查询可能会临时报告不再位于存储桶中的对象。一旦清单表的下一个快照变为可用，就会消除这些差异。这种行为是在性能与准确性之间进行权衡。

要在 Amazon Athena 中运行此查询，请确保为包含日记表和清单表的通用存储桶元数据配置选择 `s3tablescatalog/aws-s3` 目录和 `b_general-purpose-bucket-name` 数据库。

```
WITH inventory_time_cte AS (
    SELECT COALESCE(inventory_time_from_property, inventory_time_default) AS inventory_time FROM
    (
      SELECT * FROM
        (VALUES (TIMESTAMP '2024-12-01 00:00')) AS T (inventory_time_default)
      LEFT OUTER JOIN
        (
         SELECT from_unixtime(CAST(value AS BIGINT) / 1000.0) AS inventory_time_from_property FROM "journal$properties"
         WHERE key = 'aws.s3metadata.oldest-uncoalesced-record-timestamp' LIMIT 1
        )
      ON TRUE
    )
),

working_set AS (
    SELECT
        key,
        sequence_number,
        version_id,
        is_delete_marker,
        size,
        COALESCE(last_modified_date, record_timestamp) AS last_modified_date,
        e_tag,
        storage_class,
        object_tags,
        user_metadata,
        (record_type = 'DELETE' AND NOT COALESCE(is_delete_marker, FALSE)) AS _is_perm_delete
    FROM journal j
    CROSS JOIN inventory_time_cte t
    WHERE j.record_timestamp > (t.inventory_time - interval '15' minute)

    UNION ALL

    SELECT
        key,
        sequence_number,
        version_id,
        is_delete_marker,
        size,
        last_modified_date,
        e_tag,
        storage_class,
        object_tags,
        user_metadata,
        FALSE AS _is_perm_delete
    FROM inventory i
),

updated_inventory AS (
    SELECT * FROM (
        SELECT *,
            MAX(sequence_number) OVER (PARTITION BY key, version_id) AS _supremum_sn
        FROM working_set
    )
    WHERE sequence_number = _supremum_sn
)

SELECT
    key,
    sequence_number,
    version_id,
    is_delete_marker,
    size,
    last_modified_date,
    e_tag,
    storage_class,
    object_tags,
    user_metadata
FROM updated_inventory
-- This filter omits only permanent deletes from the results. Delete markers will still be shown.
WHERE NOT _is_perm_delete
-- You can add additional filters here. Examples:
--    AND object_tags['department'] = 'billing'
--    AND starts_with(key, 'reports/')
ORDER BY key ASC, sequence_number DESC;
```

### 查找对象的当前版本
<a name="metadata-tables-example-query-latest-version"></a>

以下查询使用清单表来生成新的输出表，该输出表显示哪些对象版本是当前版本。输出表特意设计成与 S3 清单报告类似。输出表包含一个 `is_latest` 字段，用于指示对象是否为当前版本。`is_latest` 字段等同于 [S3 清单报告](storage-inventory.md#storage-inventory-contents)中的 **IsLatest** 字段。

此查询适用于 [S3 版本控制](Versioning.md)处于已启用版本控制或已暂停版本控制状态的通用存储桶。

**先决条件**  
此查询将结果输出到新的 S3 表中以支持进一步查询，与在屏幕上输出行相比，其性能更高。因此，在运行此查询之前，请确保您满足以下条件。如果您选择不将结果输出到新表，则可以跳过这些步骤。
+ 您必须有一个现有的客户管理的表存储桶，该存储桶具有现有的命名空间作为输出新表的地方。有关更多信息，请参阅[创建表存储桶](s3-tables-buckets-create.md)和[创建命名空间](s3-tables-namespace-create.md)。
+ 要查询新的输出表，必须设置用于查询该表的访问方法。有关更多信息，请参阅 [访问表数据](s3-tables-access.md)。如果您想使用 Amazon Athena 等 Amazon 分析服务来查询输出表，则客户管理的表存储桶必须与 Amazon 分析服务集成。有关更多信息，请参阅 [Amazon S3 表类数据存储服务与 Amazon 分析服务集成概述](s3-tables-integration-overview.md)。

要使用此查询，请将 `amzn-s3-demo-table-bucket` 替换为您要在其中创建新输出表的客户管理的现有表存储桶的名称。将 *`existing_namespace`* 替换为表存储桶中您要在其中创建输出表的命名空间的名称。将 *`new_table`* 替换为要用于输出表的名称。确保输出表的名称符合[表命名规则](s3-tables-buckets-naming.md#naming-rules-table)。

要在 Amazon Athena 中运行此查询，请确保为包含清单表的通用存储桶元数据配置选择 `s3tablescatalog/aws-s3` 目录和 `b_general-purpose-bucket-name` 数据库。

```
-- If you don't want to output the results to a new table, remove the following two lines 
-- (everything before the WITH clause). 
CREATE TABLE "s3tablescatalog/amzn-s3-demo-table-bucket"."existing_namespace"."new_table" 
as (
WITH 
my_inventory AS (
  SELECT 
        bucket,
        key,
        version_id,
        sequence_number,
        is_delete_marker,
        size,
        last_modified_date,
        storage_class
  FROM inventory
-- For prefix filtering, use a WHERE clause with % at the end.
--     WHERE key LIKE 'prefix%'
  ),
 
inventory_with_is_latest as (
SELECT *,
       ROW_NUMBER() OVER (
         PARTITION BY key 
         ORDER BY sequence_number DESC
       ) = 1 AS is_latest
FROM my_inventory
    )

SELECT
        bucket,
        key,
        version_id,
        sequence_number,
        is_delete_marker,
        size,
        last_modified_date,
        storage_class,
        is_latest

FROM inventory_with_is_latest

-- If you want only the current version of each key, uncomment the following WHERE clause.
-- WHERE is_latest = TRUE
-- If you aren't outputting the results to a new table, remove the next line: 
);
```