配置 Azure Cosmos DB 从 Azure Cosmos DB 读取写入 Azure Cosmos DB Azure Cosmos DB 连接选项参考

Azure Cosmos DB 连接

借助 NoSQL API，您可以在 Amazon Glue 4.0 及更高版本中使用 Amazon Glue for Spark 读取和写入 Azure Cosmos DB 中现有的容器。您可以使用 SQL 查询来定义要从 Azure Cosmos DB 中读取的信息。您可以通过 Amazon Glue 连接，使用存储在 Amazon Secrets Manager 中的 Azure Cosmos DB 密钥连接到 Azure Cosmos DB。

有关 Azure Cosmos DB for NoSQL 的更多信息，请参阅 Azure 文档。

配置 Azure Cosmos DB 连接

要从 Amazon Glue 连接到 Azure Cosmos DB，您需要创建一个 Azure Cosmos DB 密钥并将其存储在一个 Amazon Secrets Manager 密钥中，然后将该密钥关联到某个 Azure Cosmos DB Amazon Glue 连接。

先决条件：

在 Azure 中，您需要确定或生成一个 Azure Cosmos DB 密钥 cosmosKey，以供 Amazon Glue 使用。有关更多信息，请参阅 Azure 文档中的保护对 Azure Cosmos DB 中数据的访问。

配置 Azure Cosmos DB 连接：

在 Amazon Secrets Manager 中，使用您的 Azure Cosmos DB 密钥创建一个密钥。要在 Secrets Manager 中创建密钥，请按照 Amazon Secrets Manager 文档中创建 Amazon Secrets Manager 密钥中的教程进行操作。创建密钥后，保留密钥名称 secretName，以供下一步使用。
- 在选择键/值对时，请使用键 spark.cosmos.accountKey 和值 cosmosKey 创建一个键值对。
在 Amazon Glue 控制台中，按照添加 Amazon Glue 连接中的步骤创建一个连接。创建连接后，保留连接名为 connectionName，以供未来在 Amazon Glue 中使用。
- 选择连接类型时，请选择 Azure Cosmos DB。
- 选择 Amazon 密钥时，请提供 secretName。

创建 Amazon Glue Azure Cosmos DB 连接后，您需要完成以下操作，然后才能运行 Amazon Glue 作业：

向与您的 Amazon Glue 作业关联的 IAM 角色授予读取 secretName 的权限。
在 Amazon Glue 作业配置中，提供 connectionName 作为附加网络连接。

读取 Azure Cosmos DB for NoSQL 容器

先决条件：

您要读取的 Azure Cosmos DB for NoSQL 容器。您将需要该容器的标识信息。

Azure Cosmos DB for NoSQL 容器由其数据库和容器来标识。在连接到 Azure Cosmos for NoSQL API 时，您必须提供数据库 cosmosDBName 和容器 cosmosContainerName 的名称。
为了提供身份验证和网络位置信息而配置的 Amazon Glue Azure Cosmos DB 连接。要获得此信息，请完成前面“配置 Azure Cosmos DB 连接”中的步骤。您需要 Amazon Glue 连接的名称 connectionName。

例如：


azurecosmos_read = glueContext.create_dynamic_frame.from_options(
    connection_type="azurecosmos",
    connection_options={
    "connectionName": connectionName,
    "spark.cosmos.database": cosmosDBName,
    "spark.cosmos.container": cosmosContainerName,
    }
)

您还可以提供 SELECT SQL 查询来筛选返回到 DynamicFrame 的结果。您将需要配置 query。

例如：


azurecosmos_read_query = glueContext.create_dynamic_frame.from_options(
    connection_type="azurecosmos",
    connection_options={
        "connectionName": "connectionName",
        "spark.cosmos.database": cosmosDBName,
        "spark.cosmos.container": cosmosContainerName,
        "spark.cosmos.read.customQuery": "query"
    }
)

写入 Azure Cosmos DB for NoSQL 容器

此示例会将来自现有 DynamicFrame dynamicFrame 的信息写入 Azure Cosmos DB。如果容器中已经含有信息，Amazon Glue 会将来自 DynamicFrame 的数据附加到现有信息之后。如果容器中的信息与您写入的信息具有不同的 Schema，则会出现错误。

先决条件：

您要写入的 Azure Cosmos DB 表。您将需要该容器的标识信息。必须首先创建容器，然后才能调用连接方法。

Azure Cosmos DB for NoSQL 容器由其数据库和容器来标识。在连接到 Azure Cosmos for NoSQL API 时，您必须提供数据库 cosmosDBName 和容器 cosmosContainerName 的名称。
为了提供身份验证和网络位置信息而配置的 Amazon Glue Azure Cosmos DB 连接。要获得此信息，请完成前面“配置 Azure Cosmos DB 连接”中的步骤。您需要 Amazon Glue 连接的名称 connectionName。

例如：


azurecosmos_write = glueContext.write_dynamic_frame.from_options(
    frame=dynamicFrame,
    connection_type="azurecosmos",
    connection_options={
    "connectionName": connectionName,
    "spark.cosmos.database": cosmosDBName,
    "spark.cosmos.container": cosmosContainerName
)

Azure Cosmos DB 连接选项参考

connectionName – 必需。用于读/写。为了向您的连接方法提供身份验证和网络位置信息而配置的 Amazon Glue Azure Cosmos DB 连接的名称。
spark.cosmos.database – 必需。用于读/写。有效值：数据库名。Azure Cosmos DB for NoSQL 数据库名。
spark.cosmos.container – 必需。用于读/写。有效值：容器名。Azure Cosmos DB for NoSQL 容器名。
spark.cosmos.read.customQuery – 用于读取。有效值：SELECT SQL 查询。用于选择要读取的文档的自定义查询。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

Kafka 连接

Azure SQL 连接