Using a MongoDB connection - Amazon Glue
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China.

Using a MongoDB connection

After you create a connection for MongoDB, you can use the connection in your ETL job. You create a table in the Amazon Glue Data Catalog and specify the MongoDB connection for the connection attribute of the table.

Amazon Glue stores your connection url and credentials in the MongoDB connection. Additionally, you can specify the following options in your job script.

  • "database": (Required) The MongoDB database to read from.

  • "collection": (Required) The MongoDB collection to read from.

  • "ssl": (Optional) If true, then Amazon Glue initiates an SSL connection. The default value is false.

  • "ssl.domain_match": (Optional) If true and ssl is true, then Amazon Glue performs a domain match check. The default value is true.

  • "batchSize": (Optional): The number of documents to return per batch, used within the cursor of internal batches.

  • "partitioner": (Optional): The class name of the partitioner for reading input data from MongoDB. The connector provides the following partitioners:

    • MongoDefaultPartitioner (default)

    • MongoSamplePartitioner (Requires MongoDB 3.2 or later)

    • MongoShardedPartitioner

    • MongoSplitVectorPartitioner

    • MongoPaginateByCountPartitioner

    • MongoPaginateBySizePartitioner

  • "partitionerOptions": (Optional): Options for the designated partitioner. The following options are supported for each partitioner:

    • MongoSamplePartitionerpartitionKey, partitionSizeMB, and samplesPerPartition

    • MongoShardedPartitionershardkey

    • MongoSplitVectorPartitionerpartitionKey and partitionSizeMB

    • MongoPaginateByCountPartitionerpartitionKey and numberOfPartitions

    • MongoPaginateBySizePartitionerpartitionKey and partitionSizeMB

For more information about these options, see https://docs.mongodb.com/spark-connector/master/configuration/#partitioner-conf.