$substrCP
The $substrCP operator in Amazon DocumentDB is used to extract a substring from a string, where the substring is specified as a range of UTF-8 code points (CP). This operator is particularly useful when working with Unicode strings, as it allows you to extract substrings without having to worry about the underlying byte representation of the characters.
Unlike the $substrBytes operator, which operates on byte positions, the $substrCP operator works with code point positions. This makes it easier to work with strings that contain non-ASCII characters, as the number of code points may not match the number of bytes or characters.
Parameters
-
string: The input string from which to extract the substring. -
start: The starting code point position (zero-based) from which to extract the substring. -
length: The number of code points to extract.
Example (MongoDB Shell)
In this example, we'll use the $substrCP operator to extract the state abbreviation from a string containing the employee's desk location.
Create sample documents
db.people.insert([ { "_id": 1, "first_name": "Jane", "last_name": "Doe", "Desk": "12 Main St, Minneapolis, MN 55401" }, { "_id": 2, "first_name": "John", "last_name": "Doe", "Desk": "456 Oak Rd, New Orleans, LA 70032" }, { "_id": 3, "first_name": "Steve", "last_name": "Smith", "Desk": "789 Elm Ln, Bakersfield, CA 93263" } ]);
Query example
db.people.aggregate([ { $project: { "state": { $substrCP: ["$Desk", 25, 2] } } } ]);
Output
{ "_id" : 1, "state" : "MN" }
{ "_id" : 2, "state" : "LA" }
{ "_id" : 3, "state" : "CA" }
In this example, we know that the state abbreviation starts at the 25th code point in the Desk field and is 2 code points long. By using the $substrCP operator, we can extract the state abbreviation without having to worry about the underlying byte representation of the string.
Code examples
To view a code example for using the $substrCP command, choose the tab for the language that you want to use: