Azure Cosmos DB connections - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Azure Cosmos DB connections

You can use Amazon Glue for Spark to read from and write to existing containers in Azure Cosmos DB using the NoSQL API in Amazon Glue 4.0 and later versions. You can define what to read from Azure Cosmos DB with a SQL query. You connect to Azure Cosmos DB using an Azure Cosmos DB Key stored in Amazon Secrets Manager through a Amazon Glue connection.

For more information about Azure Cosmos DB for NoSQL, consult the Azure documentation.

Configuring Azure Cosmos DB connections

To connect to Azure Cosmos DB from Amazon Glue, you will need to create and store your Azure Cosmos DB Key in a Amazon Secrets Manager secret, then associate that secret with a Azure Cosmos DB Amazon Glue connection.

Prerequisites:

  • In Azure, you will need to identify or generate an Azure Cosmos DB Key for use by Amazon Glue, cosmosKey. For more information, see Secure access to data in Azure Cosmos DB in the Azure documentation.

To configure a connection to Azure Cosmos DB:
  1. In Amazon Secrets Manager, create a secret using your Azure Cosmos DB Key. To create a secret in Secrets Manager, follow the tutorial available in Create an Amazon Secrets Manager secret in the Amazon Secrets Manager documentation. After creating the secret, keep the Secret name, secretName for the next step.

    • When selecting Key/value pairs, create a pair for the key spark.cosmos.accountKey with the value cosmosKey.

  2. In the Amazon Glue console, create a connection by following the steps in Adding an Amazon Glue connection. After creating the connection, keep the connection name, connectionName, for future use in Amazon Glue.

    • When selecting a Connection type, select Azure Cosmos DB.

    • When selecting an Amazon Secret, provide secretName.

After creating a Amazon Glue Azure Cosmos DB connection, you will need to perform the following steps before running your Amazon Glue job:

  • Grant the IAM role associated with your Amazon Glue job permission to read secretName.

  • In your Amazon Glue job configuration, provide connectionName as an Additional network connection.

Reading from Azure Cosmos DB for NoSQL containers

Prerequisites:

  • A Azure Cosmos DB for NoSQL container you would like to read from. You will need identification information for the container.

    An Azure Cosmos for NoSQL container is identified by its database and container. You must provide the database, cosmosDBName, and container, cosmosContainerName, names when connecting to the Azure Cosmos for NoSQL API.

  • A Amazon Glue Azure Cosmos DB connection configured to provide auth and network location information. To acquire this, complete the steps in the previous procedure, To configure a connection to Azure Cosmos DB. You will need the name of the Amazon Glue connection, connectionName.

For example:

azurecosmos_read = glueContext.create_dynamic_frame.from_options( connection_type="azurecosmos", connection_options={ "connectionName": connectionName, "spark.cosmos.database": cosmosDBName, "spark.cosmos.container": cosmosContainerName, } )

You can also provide a SELECT SQL query, to filter the results returned to your DynamicFrame. You will need to configure query.

For example:

azurecosmos_read_query = glueContext.create_dynamic_frame.from_options( connection_type="azurecosmos", connection_options={ "connectionName": "connectionName", "spark.cosmos.database": cosmosDBName, "spark.cosmos.container": cosmosContainerName, "spark.cosmos.read.customQuery": "query" } )

Writing to Azure Cosmos DB for NoSQL containers

This example writes information from an existing DynamicFrame, dynamicFrame to Azure Cosmos DB. If the container already has information, Amazon Glue will append data from your DynamicFrame. If the information in the container has a different schema from the information you write, you will run into errors.

Prerequisites:

  • A Azure Cosmos DB table you would like to write to. You will need identification information for the container. You must create the container before calling the connection method.

    An Azure Cosmos for NoSQL container is identified by its database and container. You must provide the database, cosmosDBName, and container, cosmosContainerName, names when connecting to the Azure Cosmos for NoSQL API.

  • A Amazon Glue Azure Cosmos DB connection configured to provide auth and network location information. To acquire this, complete the steps in the previous procedure, To configure a connection to Azure Cosmos DB. You will need the name of the Amazon Glue connection, connectionName.

For example:

azurecosmos_write = glueContext.write_dynamic_frame.from_options( frame=dynamicFrame, connection_type="azurecosmos", connection_options={ "connectionName": connectionName, "spark.cosmos.database": cosmosDBName, "spark.cosmos.container": cosmosContainerName )

Azure Cosmos DB connection option reference

  • connectionName — Required. Used for Read/Write. The name of a Amazon Glue Azure Cosmos DB connection configured to provide auth and network location information to your connection method.

  • spark.cosmos.database — Required. Used for Read/Write. Valid Values: database names. Azure Cosmos DB for NoSQL database name.

  • spark.cosmos.container — Required. Used for Read/Write. Valid Values: container names. Azure Cosmos DB for NoSQL container name.

  • spark.cosmos.read.customQuery — Used for Read. Valid Values: SELECT SQL queries. Custom query to select documents to be read.