支持的功能使用 Amazon DocumentDB 文本索引与 MongoDB 的差异最佳实践和准则限制

执行 Amazon DocumentDB 文本搜索

借助 Amazon DocumentDB 的原生全文搜索功能，您可以使用特殊用途的文本索引对大型文本数据集执行文本搜索。本节介绍文本索引功能的功能，并提供了有关如何在 Amazon DocumentDB 中创建和使用文本索引的步骤。还列出了文本搜索的限制。

主题

支持的功能
使用 Amazon DocumentDB 文本索引
与 MongoDB 的差异
最佳实践和准则
限制

支持的功能

Amazon DocumentDB 文本搜索支持以下 MongoDB API 兼容功能：

在单个字段上创建文本索引。
创建包含多个文本字段的复合文本索引。
执行单字或多字搜索。
使用权重控制搜索结果。
通过打分对搜索结果进行排序。
在聚合管道中使用文本索引。
搜索确切的短语。

使用 Amazon DocumentDB 文本索引

要在包含字符串数据的字段上创建文本索引，需指定以下所示的字符串“text”：

单个字段索引：


db.test.createIndex({"comments": "text"})

此索引支持在指定集合的“comments”字符串字段中进行文本搜索查询。

在多个字符串字段上创建复合文本索引：


db.test.createIndex({"comments": "text", "title":"text"})

此索引支持在指定集合中的“comments”和“title”字符串字段中进行文本搜索查询。在创建复合文本索引时，最多可指定 30 个字段。一旦创建后，文本搜索查询将对所有索引字段进行查询。

注意

每个集合只能有一个文本索引。

列出 Amazon DocumentDB 集合上的文本索引

您可以在集合上使用 getIndexes() 来识别和描述文本索引等索引，如下例所示：


rs0:PRIMARY> db.test.getIndexes()
[
   {
      "v" : 4,
      "key" : {
         "_id" : 1
      },
      "name" : "_id_",
      "ns" : "test.test"
   },
   {
      "v" : 1,
      "key" : {
         "_fts" : "text",
         "_ftsx" : 1
      },
      "name" : "contents_text",
      "ns" : "test.test",
      "default_language" : "english",
      "weights" : {
         "comments" : 1
      },
      "textIndexVersion" : 1
   }
]

创建索引后，开始将数据插入 Amazon DocumentDB 集合。


db.test.insertMany([{"_id": 1, "star_rating": 4, "comments": "apple is red"},
                    {"_id": 2, "star_rating": 5, "comments": "pie is delicious"},
                    {"_id": 3, "star_rating": 3, "comments": "apples, oranges - healthy fruit"},
                    {"_id": 4, "star_rating": 2, "comments": "bake the apple pie in the oven"},
                    {"_id": 5, "star_rating": 5, "comments": "interesting couch"},
                    {"_id": 6, "star_rating": 5, "comments": "interested in couch for sale, year 2022"}])

运行文本搜索查询

运行单字文本搜索查询

需要使用 $text 和 $search 运算符来执行文本搜索。以下示例返回文本索引字段中包含字符串“apple”或“apples”等其他格式字符串的所有文档：


db.test.find({$text: {$search: "apple"}})

输出：

该命令的输出内容类似如下所示：


{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red" }
{ "_id" : 3, "star_rating" : 3, "comments" : "apples, oranges - healthy fruit" }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }

运行多字文本搜索

您还可以对 Amazon DocumentDB 数据执行多字文本搜索。以下命令返回文本索引字段中含有“apple”或“pie”的文档：


db.test.find({$text: {$search: "apple pie"}})

输出：

该命令的输出内容类似如下所示：


{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red" }
{ "_id" : 2, "star_rating" : 5, "comments" : "pie is delicious" }
{ "_id" : 3, "star_rating" : 3, "comments" : "apples, oranges - healthy fruit" }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }

运行多字短语文本搜索

对于多字短语搜索，请使用以下示例：


db.test.find({$text: {$search: "\"apple pie\""}})

输出：

上述命令返回文本索引字段中含有确切短语“apple pie”的文档：该命令的输出内容类似如下所示：


{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }

使用筛选器进行文本搜索

您还可以将文本搜索与其他查询运算符结合起来使用，按照附加条件筛选结果：


db.test.find({$and: [{star_rating: 5}, {$text: {$search: "interest"}}]})

输出：

上述命令返回文本索引字段中包含任何形式的“interest”且“star_rating”等于 5 的文档。该命令的输出内容类似如下所示：


{ "_id" : 5, "star_rating" : 5, "comments" : "interesting couch" }
{ "_id" : 6, "star_rating" : 5, "comments" : "interested in couch for sale, year 2022" }

限制文本搜索中返回的文档数量

可以选择使用 limit 限制返回的文档数量：


db.test.find({$and: [{star_rating: 5}, {$text: {$search: "couch"}}]}).limit(1)

输出：

上述命令返回一个满足筛选器的结果：


{ "_id" : 5, "star_rating" : 5, "comments" : "interesting couch" }

通过文本打分对结果进行排序

以下示例通过文本打分对文本搜索结果进行排序：


db.test.find({$text: {$search: "apple"}}, {score: {$meta: "textScore"}}).sort({score: {$meta: "textScore"}})

输出：

上述命令返回文本索引字段中包含“apple”或“apples”等其他格式的文档，并根据文档与搜索词的相关程度对结果进行排序。该命令的输出内容类似如下所示：


{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red", "score" : 0.6079270860936958 }
{ "_id" : 3, "star_rating" : 3, "comments" : "apples, oranges - healthy fruit", "score" : 0.6079270860936958 }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven", "score" : 0.6079270860936958 }

aggregate、count、findAndModify、update 和 delete 命令也支持 $text 和 $search。

聚合运算符

使用 $match 的聚合管道


db.test.aggregate(
   [{ $match: { $text: { $search: "apple pie" } } }]
)

输出：

上述命令返回以下结果：


{ "_id" : 1, "star_rating" : 4, "comments" : "apple is red" }
{ "_id" : 3, "star_rating" : 3, "comments" : "apple - a healthy fruit" }
{ "_id" : 4, "star_rating" : 2, "comments" : "bake the apple pie in the oven" }
{ "_id" : 2, "star_rating" : 5, "comments" : "pie is delicious" }

其他聚合运算符的组合


db.test.aggregate(
   [
      { $match: { $text: { $search: "apple pie" } } },
      { $sort: { score: { $meta: "textScore" } } },
      { $project: { score: { $meta: "textScore" } } }
   ]
)

输出：

上述命令返回以下结果：


{ "_id" : 4, "score" : 0.6079270860936958 }
{ "_id" : 1, "score" : 0.3039635430468479 }
{ "_id" : 2, "score" : 0.3039635430468479 }
{ "_id" : 3, "score" : 0.3039635430468479 }

在创建文本索引时指定多个字段

最多可以为复合文本索引中的三个字段分配权重。分配给文本索引中的字段的默认权重为一 (1)。权重为可选参数，须介于 1 到 100000 之间。


db.test.createIndex(
   {
     "firstname": "text",
     "lastname": "text",
     ...
   },
   {
     weights: {
       "firstname": 5,
       "lastname":10,
       ...
     },
     name: "name_text_index"
   }
 )

与 MongoDB 的差异

Amazon DocumentDB 的文本索引功能使用反向索引和词频算法。文本索引默认为稀疏索引。由于解析逻辑、令牌化分隔符等方面的差异，对于相同的数据集或查询形状，可能无法返回与 MongoDB 相同的结果集。

Amazon DocumentDB 文本索引和 MongoDB 之间还存在以下差异：

不支持采用非文本索引的复合索引。
Amazon DocumentDB 文本索引不区分大小写。
文本索引仅支持英语。
不支持数组（或多键）字段的文本索引。例如，使用文档对“a”创建文本索引 {“a”:[“apple”, “pie”]} 将会失败。
不支持通配符文本索引。
不支持唯一文本索引。
不支持排除某个词。

最佳实践和准则

为了优化通过文本打分排序进行文本搜索查询的性能，我们建议在加载数据之前创建文本索引。
文本索引需要额外的存储空间来实现索引数据内部副本的最优化。这将产生额外的费用。

限制

Amazon DocumentDB 中的文本搜索存在以下限制：

只有 Amazon DocumentDB 5.0 基于实例的集群支持文本搜索。
文本索引存储词素及其位置信息。单个文档中所有词素及其位置信息的组合大小限制为 1MB。

Javascript 在您的浏览器中被禁用或不可用。

要使用 Amazon Web Services 文档，必须启用 Javascript。请参阅浏览器的帮助页面以了解相关说明。

部分索引

故障排除