Step 1: Creating a List of Unwanted Words - Amazon Transcribe
AWS 文档中描述的 AWS 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 AWS 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

Step 1: Creating a List of Unwanted Words

要创建词汇表筛选条件,您可以创建要从转录结果中筛选的字词的列表,并将其保存在文本文件中。或者,您可以使用 CreateVocabularyFilter 操作并在 Words 参数中输入要作为字符串数组筛选的字词。虽然在 CreateVocabularyFilter 操作中列出不需要的字词更加方便,但如果您使用文本文件,您可以稍后编辑您的字词列表并在另一个词汇表筛选条件中重复使用它。

以下指南适用于词汇表筛选条件:

  • Words in a vocabulary filter aren't case sensitive. For example, "curse" and "CURSE" are considered the same word.

  • Amazon Transcribe filters only words that exactly match words in the filter. For example, if your filter includes "swear," Amazon Transcribe filters "swear," but not "swears." You must provide every variation of a word that you want to filter.

  • Amazon Transcribe doesn't filter words that are contained in other words. For example, if a vocabulary filter contained "marine," but not "submarine," "submarine" would appear in your transcription results.

要使用控制台创建字词列表,请完成以下步骤。要使用 CreateVocabularyFilter 操作,请参阅Step 2: Creating a Vocabulary Filter

创建未筛选字词的列表(控制台)

  1. 在文本编辑器中,创建一个新文件,然后将每个字词放在单独的行中,后跟换行符 (\n),如以下示例所示。

    profanity curse swear ... obscenity
  2. 在本地或 Amazon Simple Storage Service (Amazon S3) 中将列表另存为纯文本文件。

下一步

Step 2: Creating a Vocabulary Filter