$substrCP - Amazon DocumentDB
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

$substrCP

The $substrCP operator in Amazon DocumentDB is used to extract a substring from a string, where the substring is specified as a range of UTF-8 code points (CP). This operator is particularly useful when working with Unicode strings, as it allows you to extract substrings without having to worry about the underlying byte representation of the characters.

Unlike the $substrBytes operator, which operates on byte positions, the $substrCP operator works with code point positions. This makes it easier to work with strings that contain non-ASCII characters, as the number of code points may not match the number of bytes or characters.

Parameters

  • string: The input string from which to extract the substring.

  • start: The starting code point position (zero-based) from which to extract the substring.

  • length: The number of code points to extract.

Example (MongoDB Shell)

In this example, we'll use the $substrCP operator to extract the state abbreviation from a string containing the employee's desk location.

Create sample documents

db.people.insert([ { "_id": 1, "first_name": "Jane", "last_name": "Doe", "Desk": "12 Main St, Minneapolis, MN 55401" }, { "_id": 2, "first_name": "John", "last_name": "Doe", "Desk": "456 Oak Rd, New Orleans, LA 70032" }, { "_id": 3, "first_name": "Steve", "last_name": "Smith", "Desk": "789 Elm Ln, Bakersfield, CA 93263" } ]);

Query example

db.people.aggregate([ { $project: { "state": { $substrCP: ["$Desk", 25, 2] } } } ]);

Output

{ "_id" : 1, "state" : "MN" } { "_id" : 2, "state" : "LA" } { "_id" : 3, "state" : "CA" }

In this example, we know that the state abbreviation starts at the 25th code point in the Desk field and is 2 code points long. By using the $substrCP operator, we can extract the state abbreviation without having to worry about the underlying byte representation of the string.

Code examples

To view a code example for using the $substrCP command, choose the tab for the language that you want to use:

Node.js
const { MongoClient } = require('mongodb'); async function findStates() { const client = await MongoClient.connect('mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false'); const db = client.db('test'); const result = await db.collection('people').aggregate([ { $project: { "state": { $substrCP: ["$Desk", 25, 2] } } } ]).toArray(); console.log(result); client.close(); } findStates();
Python
from pymongo import MongoClient def find_states(): client = MongoClient('mongodb://<username>:<password>@<cluster-endpoint>:27017/?tls=true&tlsCAFile=global-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false') db = client.test result = list(db.people.aggregate([ { '$project': { 'state': { '$substrCP': ['$Desk', 25, 2] } } } ])) print(result) client.close() find_states()

In both the Node.js and Python examples, we use the $substrCP operator to extract the state abbreviation from the Desk field, similar to the MongoDB Shell example.