准备和导入批量输入数据 - Amazon Personalize
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

准备和导入批量输入数据

Batch 推理和批量Segment 作业使用解决方案版本根据您在输入 JSON 文件中提供的数据提供建议或用户Segment。在获取批量建议或用户区段之前,您必须准备 JSON 文件并将其上传到 Amazon S3 存储桶。我们建议您在 Amazon S3 存储桶中创建输出文件夹或使用单独的输出 Amazon S3 存储桶。然后,您可以使用相同的输入数据位置运行多个批量推理作业。

准备和导入数据

  1. 根据您使用的批处理工作流程类型和解决方案使用的配方,设置批量输入数据的格式。对于这两个工作流,使用新行分隔输入数据元素。

    • 对于批量推荐,您的输入数据是一个 JSON 文件,其中包含用户 ID 列表(USER_PERSONALIZATION 配方)、商品编号(RELATED_ITEMS)列表或每个 ItemID 集合(个性化_排名配方)配对的用户 ID 列表。有关输入数据示例,请参阅Batch 推理作业输入和输出 JSON 示例

    • 对于批量区段作业,您的输入数据可以是 ItemID(商品关联性)列表或商品属性(商品-属性-关联性)列表。对于项目属性,输入数据可以包含逻辑表达式AND运算符为每个查询获取多个项目或属性的用户。有关输入数据示例,请参阅Batch 分段作业输入和输出 JSON 示例.

  2. 将输入 JSON 上载到 Amazon S3 存储桶中的输入文件夹。有关更多信息,请参阅 。使用拖放功能上传文件和文件夹中的Amazon Simple Service

  3. 为输出数据创建一个单独的位置,可以是文件夹或其他 Amazon S3 存储桶。通过为输出 JSON 创建单独的位置,您可以使用相同的输入数据位置运行多个批量推理或批处理段作业。

  4. 创建批量推理作业或批量分段作业,Amazon Personalize 会将解决方案版本中的推荐或用户区段输出到您的输出数据位置。

输入和输出 JSON 示例

如何设置输入数据的格式取决于您创建的批处理作业的类型和使用的配方。以下部分列出了批量推理作业和批处理段作业格式正确的 JSON 输入和输出示例。

Batch 推理作业输入和输出 JSON 示例

以下是格式正确的 JSON 输入和按配方组织的批量推理作业的示例。

用户个性化和传统 HRNN 配方

Input

分开每个userId换成如下新行。

{"userId": "4638"} {"userId": "663"} {"userId": "3384"} ...
Output
{"input":{"userId":"4638"},"output":{"recommendedItems":["63992","115149","110102","148626","148888","31685","102445","69526","92535","143355","62374","7451","56171","122882","66097","91542","142488","139385","40583","71530","39292","111360","34048","47099","135137"],"scores":[0.0152238,0.0069081,0.0068222,0.006394,0.0059746,0.0055851,0.0049357,0.0044644,0.0042968,0.004015,0.0038805,0.0037476,0.0036563,0.0036178,0.00341,0.0033467,0.0033258,0.0032454,0.0032076,0.0031996,0.0029558,0.0029021,0.0029007,0.0028837,0.0028316]},"error":null} {"input":{"userId":"663"},"output":{"recommendedItems":["368","377","25","780","1610","648","1270","6","165","1196","1097","300","1183","608","104","474","736","293","141","2987","1265","2716","223","733","2028"],"scores":[0.0406197,0.0372557,0.0254077,0.0151975,0.014991,0.0127175,0.0124547,0.0116712,0.0091098,0.0085492,0.0079035,0.0078995,0.0075598,0.0074876,0.0072006,0.0071775,0.0068923,0.0066552,0.0066232,0.0062504,0.0062386,0.0061121,0.0060942,0.0060781,0.0059263]},"error":null} {"input":{"userId":"3384"},"output":{"recommendedItems":["597","21","223","2144","208","2424","594","595","920","104","520","367","2081","39","1035","2054","160","1370","48","1092","158","2671","500","474","1907"],"scores":[0.0241061,0.0119394,0.0118012,0.010662,0.0086972,0.0079428,0.0073218,0.0071438,0.0069602,0.0056961,0.0055999,0.005577,0.0054387,0.0051787,0.0051412,0.0050493,0.0047126,0.0045393,0.0042159,0.0042098,0.004205,0.0042029,0.0040778,0.0038897,0.0038809]},"error":null} ...

热门程度-计数

Input

分开每个userId换成如下新行。

{"userId": "12"} {"userId": "105"} {"userId": "41"} ...
Output
{"input": {"userId": "12"}, "output": {"recommendedItems": ["105", "106", "441"]}} {"input": {"userId": "105"}, "output": {"recommendedItems": ["105", "106", "441"]}} {"input": {"userId": "41"}, "output": {"recommendedItems": ["105", "106", "441"]}} ...

PERSONALIZED_RANKING recipes

Input

分开每个userId和列表itemIds将使用如下新行进行排名。

{"userId": "891", "itemList": ["27", "886", "101"]} {"userId": "445", "itemList": ["527", "55", "901"]} {"userId": "71", "itemList": ["27", "351", "101"]} ...
Output
{"input":{"userId":"891","itemList":["27","886","101"]},"output":{"recommendedItems":["27","101","886"],"scores":[0.48421,0.28133,0.23446]}} {"input":{"userId":"445","itemList":["527","55","901"]},"output":{"recommendedItems":["901","527","55"],"scores":[0.46972,0.31011,0.22017]}} {"input":{"userId":"71","itemList":["29","351","199"]},"output":{"recommendedItems":["351","29","199"],"scores":[0.68937,0.24829,0.06232]}} ...

RELATED_ITEMS 配方

Input

分开每个itemId换成如下新行。

{"itemId": "105"} {"itemId": "106"} {"itemId": "441"} ...
Output
{"input": {"itemId": "105"}, "output": {"recommendedItems": ["106", "107", "49"]}} {"input": {"itemId": "106"}, "output": {"recommendedItems": ["105", "107", "49"]}} {"input": {"itemId": "441"}, "output": {"recommendedItems": ["2", "442", "435"]}} ...

Batch 分段作业输入和输出 JSON 示例

创建批量区段作业时,您的输入数据可以是 ItemID 列表(商品关联配方)或商品属性(商品-属性-关联性)列表。每行输入数据都是一个单独的推理查询。根据每个用户与库存中的物品互动的概率,按降序对每个用户区段进行排序。

对于项目属性,您可以混合使用不同的元数据列。例如,一行可能是数字列,而下一行可能是分类列。此外,您的输入项元数据可以包含逻辑表达式AND运算符来获取多个属性的用户区段。例如,一行输入数据可能是{"itemAttributes": "ITEMS.genres = "\Comedy\" AND ITEMS.genres = "\Action\""}要么{"itemAttributes": "ITEMS.genres = "\Comedy\" AND ITEMS.audience = "\teen\""}. 当您将两个属性与ANDoperator,则可以创建一个用户区段,其中包含更有可能与基于用户交互历史记录同时具有这两个属性的项目进行交互的用户区段。与筛选器表达式(使用IN字符串相等的运算符),批处理段输入表达式仅支持=字符串匹配的相等符号。

以下是格式正确的 JSON 输入和按配方组织的批量Segment Segment 作业的示例。

物品关联

Input

输入数据最多可包含 500 个项目。分开每个itemId换成如下新行。

{"itemId": "105"} {"itemId": "106"} {"itemId": "441"} ...
Output
{"input": {"itemId": "105"}, "output": {"recommendedUsers": ["106", "107", "49"]}} {"input": {"itemId": "106"}, "output": {"recommendedUsers": ["105", "107", "49"]}} {"input": {"itemId": "441"}, "output": {"recommendedUsers": ["2", "442", "435"]}} ...

商品-属性-亲和力

Input

您的输入数据最多可以有 10 个查询,其中每个查询都是一个或多个项目属性。用新行分隔每个属性或属性表达式,如下所示。

{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\""} {"itemAttributes": "ITEMS.genres = \"Comedy\""} {"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\""} ...
Output
{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["25", "78", "108"]}} {"itemAttributes": "ITEMS.genres = \"Adventure\"", "output": {"recommendedUsers": ["87", "31", "129"]}} {"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["8", "442", "435"]}} ...