准备和导入批量输入数据 - Amazon Personalize
Amazon Web Services 文档中描述的 Amazon Web Services 服务或功能可能因区域而异。要查看适用于中国区域的差异,请参阅中国的 Amazon Web Services 服务入门

本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。

准备和导入批量输入数据

Batch 推理和批处理区段作业使用解决方案版本根据输入 JSON 文件中提供的数据提供建议或用户细分。在获得批量推荐或用户细分之前,您必须准备 JSON 文件并将其上传到 Amazon S3 存储桶。我们建议您在 Amazon S3 存储桶中创建输出文件夹,或使用单独的输出 Amazon S3 存储桶。然后,您可以使用相同的输入数据位置运行多个批量推理作业。

准备和导入数据

  1. 根据您使用的批处理工作流程类型和解决方案使用的配方设置批量输入数据的格式。对于这两个工作流,请使用新行分隔输入数据元素。

    • 对于批量推荐,您的输入数据是一个 JSON 文件,其中包含用户 ID 列表(USER_个性化配方)、itemID 列表(READE_ITEMS),或者每个用户 ID 列表与一组 ItemID(个性化 _RANGING 食谱)配对。有关输入数据示例,请参阅Batch 推理作业输入和输出 JSON 示例

    • 对于批处理区段作业,您的输入数据可以是 itemID(Item-Affinity)或项目属性(项目-属性-亲和力)的列表。对于项目属性,输入数据可以包含逻辑表达式AND运算符来为每个查询获取多个项目或属性的用户。有关输入数据示例,请参阅Batch 细分作业输入和输出 JSON 示例.

  2. 将您的输入 JSON 上传到 Amazon S3 存储桶中的输入文件夹。有关更多信息,请参阅 。使用拖放方式上传文件和文件夹中的Amazon Simple Storage Service 用户指南

  3. 为输出数据创建单独的位置,可以是文件夹或其他 Amazon S3 存储桶。通过为输出 JSON 创建单独的位置,您可以使用相同的输入数据位置运行多个批处理推理或批处理区段作业。

  4. 创建批量推理作业或批处理细分任务,Amazon Personalize 会将解决方案版本中的推荐或用户细分输出到输出数据位置。

输入和输出 JSON 示例

如何设置输入数据的格式取决于您创建的批处理作业的类型和使用的配方。以下部分列出了批量推理作业和批处理区段作业格式正确的 JSON 输入和输出示例。

Batch 推理作业输入和输出 JSON 示例

以下是按配方组织的批处理推理作业的格式正确的 JSON 输入和输出示例。

用户个性化和传统 HRNN 食谱

Input

分开每个userId有一条新线,如下所示。

{"userId": "4638"} {"userId": "663"} {"userId": "3384"} ...
Output
{"input":{"userId":"4638"},"output":{"recommendedItems":["63992","115149","110102","148626","148888","31685","102445","69526","92535","143355","62374","7451","56171","122882","66097","91542","142488","139385","40583","71530","39292","111360","34048","47099","135137"],"scores":[0.0152238,0.0069081,0.0068222,0.006394,0.0059746,0.0055851,0.0049357,0.0044644,0.0042968,0.004015,0.0038805,0.0037476,0.0036563,0.0036178,0.00341,0.0033467,0.0033258,0.0032454,0.0032076,0.0031996,0.0029558,0.0029021,0.0029007,0.0028837,0.0028316]},"error":null} {"input":{"userId":"663"},"output":{"recommendedItems":["368","377","25","780","1610","648","1270","6","165","1196","1097","300","1183","608","104","474","736","293","141","2987","1265","2716","223","733","2028"],"scores":[0.0406197,0.0372557,0.0254077,0.0151975,0.014991,0.0127175,0.0124547,0.0116712,0.0091098,0.0085492,0.0079035,0.0078995,0.0075598,0.0074876,0.0072006,0.0071775,0.0068923,0.0066552,0.0066232,0.0062504,0.0062386,0.0061121,0.0060942,0.0060781,0.0059263]},"error":null} {"input":{"userId":"3384"},"output":{"recommendedItems":["597","21","223","2144","208","2424","594","595","920","104","520","367","2081","39","1035","2054","160","1370","48","1092","158","2671","500","474","1907"],"scores":[0.0241061,0.0119394,0.0118012,0.010662,0.0086972,0.0079428,0.0073218,0.0071438,0.0069602,0.0056961,0.0055999,0.005577,0.0054387,0.0051787,0.0051412,0.0050493,0.0047126,0.0045393,0.0042159,0.0042098,0.004205,0.0042029,0.0040778,0.0038897,0.0038809]},"error":null} ...

热门程度-计数

Input

分开每个userId有一条新线,如下所示。

{"userId": "12"} {"userId": "105"} {"userId": "41"} ...
Output
{"input": {"userId": "12"}, "output": {"recommendedItems": ["105", "106", "441"]}} {"input": {"userId": "105"}, "output": {"recommendedItems": ["105", "106", "441"]}} {"input": {"userId": "41"}, "output": {"recommendedItems": ["105", "106", "441"]}} ...

PERSONALIZED_RANKING recipes

Input

分开每个userId和列表itemIds将按如下方式使用新行进行排名。

{"userId": "891", "itemList": ["27", "886", "101"]} {"userId": "445", "itemList": ["527", "55", "901"]} {"userId": "71", "itemList": ["27", "351", "101"]} ...
Output
{"input":{"userId":"891","itemList":["27","886","101"]},"output":{"recommendedItems":["27","101","886"],"scores":[0.48421,0.28133,0.23446]}} {"input":{"userId":"445","itemList":["527","55","901"]},"output":{"recommendedItems":["901","527","55"],"scores":[0.46972,0.31011,0.22017]}} {"input":{"userId":"71","itemList":["29","351","199"]},"output":{"recommendedItems":["351","29","199"],"scores":[0.68937,0.24829,0.06232]}} ...

RELATED_ITEMS 配方

Input

分开每个itemId有一条新线,如下所示。

{"itemId": "105"} {"itemId": "106"} {"itemId": "441"} ...
Output
{"input": {"itemId": "105"}, "output": {"recommendedItems": ["106", "107", "49"]}} {"input": {"itemId": "106"}, "output": {"recommendedItems": ["105", "107", "49"]}} {"input": {"itemId": "441"}, "output": {"recommendedItems": ["2", "442", "435"]}} ...

Batch 细分作业输入和输出 JSON 示例

创建批处理区段作业时,输入数据可以是 itemID(Item-Affinity 配方)或物料属性(物料-属性-亲和力)的列表。每行输入数据都是一个单独的推理查询。根据每个用户与库存中商品互动的概率,每个用户细分按降序排序。

对于项目属性,您可以混合不同列的元数据。例如,一行可能是数字列,而下一行可能是分类列。此外,您的输入项元数据可以包含逻辑表达式AND运算符来获取多个属性的用户区段。例如,输入数据的一行可能是{"itemAttributes": "ITEMS.genres = "\Comedy\" AND ITEMS.genres = "\Action\""}要么{"itemAttributes": "ITEMS.genres = "\Comedy\" AND ITEMS.audience = "\teen\""}. 当你将两个属性与AND操作员,您可以创建一个用户区段,其中的用户更有可能根据用户交互历史记录与具有两个属性的项目进行交互。与过滤器表达式不同(它们使用IN字符串相等的运算符),批处理段输入表达式仅支持=字符串匹配的相等符号。

以下是按配方组织的批处理区段作业的 JSON 输入和输出格式正确的示例。

Item-亲和力

Input

分开每个itemId有一条新线,如下所示。

{"itemId": "105"} {"itemId": "106"} {"itemId": "441"} ...
Output
{"input": {"itemId": "105"}, "output": {"recommendedUsers": ["106", "107", "49"]}} {"input": {"itemId": "106"}, "output": {"recommendedUsers": ["105", "107", "49"]}} {"input": {"itemId": "441"}, "output": {"recommendedUsers": ["2", "442", "435"]}} ...

物品-属性-亲和力

Input

用新行分隔每个属性,如下所示。

{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\""} {"itemAttributes": "ITEMS.genres = \"Comedy\""} {"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\""} ...
Output
{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["25", "78", "108"]}} {"itemAttributes": "ITEMS.genres = \"Adventure\"", "output": {"recommendedUsers": ["87", "31", "129"]}} {"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["8", "442", "435"]}} ...