Managing collection-level document compression - Amazon DocumentDB
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Managing collection-level document compression

Amazon DocumentDB collection-level document compression allows you to lower storage and IO costs by compressing the documents in your collections. You can enable document compression at a collection level and view compression metrics as needed by measuring the storage gains through compression metrics such as storage size of compressed documents and compression status. Amazon DocumentDB uses the LZ4 compression algorithm to compress documents.

Amazon DocumentDB supports document compression starting with version 5.0. The following are collection-level document compression functions:

  • Default behavior — The default compression setting for new collections on a cluster is determined by the cluster parameter default_collection_compression. This parameter is set to “disabled” by default.

  • Compressing existing collections — The compression setting for existing collections can be changed using the collMod command.

  • Changing the compression threshold — The default compression threshold is 2KB. This value can be specified for new collections using the createCollection command, and changed for existing collections using collMod command.

Note

Amazon DocumentDB document compression is not supported on Amazon DocumentDB versions 3.6 and 4.0.

Managing document compression

Enabling document compression in a collection

Enable document compression while creating a collection onAmazon DocumentDB by using db.createCollection() method:

db.createCollection( sample_collection,{ storageEngine : { documentDB: { compression:{enable: <true | false>} } } })

Enabling document compression in a cluster

Document compression can be enabled by default for all new collections on a cluster level by setting the cluster parameter default_collection_compression to “enabled”. When this parameter is set to “enabled”, newly created collections on the cluster will have compression enabled by default with a compression threshold of 2 KB.

Compressing existing collections

You can also modify the compression settings for an existing collection using the collMod operation and specifying the following storageEngine configuration. Please note that the change made using this command will only apply to newly inserted/updated documents and the compression on previously inserted documents will not change.

db.runCommand({ collMod: "orders", storageEngine: { documentDB: {compression: {enable: <true | false>} } } })

Setting the compression thresholds

By default, the compression threshold for compressed collections is 2032 bytes. This threshold value can be set in the createCollection command when creating a new collection with compression enabled:

db.createCollection( sample_collection, { storageEngine : { documentDB: { compression: { enable: true, threshold: <128 - 8000> } } } })

You can also modify the compression threshold for an existing compressed collection using the collMod operation and specifying the following storageEngine configuration:

db.runCommand({ collMod: "orders", storageEngine: { documentDB: { compression: { enable: true, threshold: <128 - 8000> } } } })

Please note that the compression threshold can only be set to a value between 128 to 8000 bytes. Additionally, the enable option needs to be set to “true” when specifying the compression threshold.

Monitoring document compression

You can check if a collection is compressed and calculate it's compression ratio as follows.

View compression statistics by running the db.printCollectionStats() or db.collection.stats() command from the mongo shell. The output shows you the original size and compressed size that you can compare to analyze the storage gains from document compression. In this example, statistics for a collection named “sample_collection” are shown below. A scaling factor of 1024*1024 is used below to output the size and storageSize values in MB.

db.sample_collection.stats(1024*1024)

The following is an example of the output for the above command:

{ "ns" : "test.sample_collection", "count" : 1000000, "size" : 3906.3, "avgObjSize" : 4096, "storageSize" : 1953.1, compression:{"enabled" : true,"threshold" : 2032}, ... }
  • size - The original size of the document collection.

  • avgObjSize - The average document size before compression rounded off to first decimal. The unit of measure is bytes.

  • storageSize - The storage size of the collection after compression. The unit of measure is bytes.

  • enabled - Indicates if compression is enabled or disabled.

To calculate the actual compression ratio, divide the collection size by the storage size (size/storageSize). For the example above, the calculation is 3906.3/1953.1 which translates to a 2:1 compression ratio.