Custom datasets and schemas - Amazon Personalize
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Custom datasets and schemas

When you create a Custom dataset group, you create your own schemas from scratch. Custom dataset group datasets and schemas have fewer required fields and more flexibility. The following topics explain the schema and data requirements for datasets a Custom dataset group. Each dataset section lists the required data for the dataset type and provides a JSON example of a schema.

For information on the types of data you can import into Amazon Personalize see Types of data Amazon Personalize can use. For information about general Amazon Personalize schema requirements, such as formatting requirements and available field data types, see Creating schema JSON files for Amazon Personalize schemas. These requirements apply to all Amazon Personalize schemas.

Custom dataset and schema requirements

When you create a dataset for a Custom dataset group, each dataset type has the following required fields and reserved keywords with required data types.

Dataset type Required fields Reserved keywords
Item interactions (schema example)

USER_ID (string)

ITEM_ID (string)

TIMESTAMP (long)

EVENT_TYPE (string)

EVENT_VALUE (float, null)

IMPRESSION (string, null)

RECOMMENDATION_ID (string, null)

EVENT_ATTRIBUTION_SOURCE (string, null)

Users (schema example)

USER_ID (string)

1 metadata field (categorical string or numerical)

Items (schema example)

ITEM_ID (string)

1 metadata field (categorical or textual string field or numerical field)

CREATION_TIMESTAMP (long)

Actions (schema example)

ACTION_ID (string)

1 metadata field (categorical string or numerical)

CREATION_TIMESTAMP (long)

VALUE (long, null)

TYPE (string, null)

EXPIRATION_TIMESTAMP (long, null)

REPEAT_FREQUENCY (long, null)

Action interactions (schema example)

USER_ID (string)

ACTION_ID (string)

EVENT_TYPE (string)

TIMESTAMP (long)

IMPRESSION (string, null)

RECOMMENDATION_ID (string, null)

Metadata fields

Metadata includes string or non-string fields that aren't required or don't use a reserved keyword. Metadata schemas have the following restrictions:

  • Users, Items, and Actions schemas require at least one metadata field.

  • You can add at most 25 metadata fields for a Users schema, 100 metadata fields for an Items schema, and 10 metadata fields for an Actions schema.

  • If you add your own metadata field of type string, it must include the categorical attribute or the textual attribute (only Items schemas support fields with the textual attribute). Otherwise, Amazon Personalize won't use the field when training a model.

Reserved keywords

Reserved keywords are optional, non-metadata fields. These fields are considered reserved because you must define the fields as their required data type when you use them, and the keywords can't be used as values in your data. Reserved categorical string fields must have categorical set to true, while reserved string fields can't be categorical. The following are reserved keywords:

  • EVENT_TYPE: For Item interactions datasets with one or more event types, such as both click and download, use an EVENT_TYPE field. You must define an EVENT_TYPE field as a string and can't be set as categorical.

  • EVENT_VALUE: For Item interactions datasets that include value data for events, such as the percentage of a video a user watched, use an EVENT_VALUE field with type float and optionally null.

  • CREATION_TIMESTAMP: For Items or Actions datasets with a timestamp for each item’s creation date, use a CREATION_TIMESTAMP field with a type long. Amazon Personalize uses CREATION_TIMESTAMP data to calculate the age of an item and adjust recommendations accordingly. See Creation timestamp data.

  • IMPRESSION: For Item interactions datasets with explicit impressions data, use an IMPRESSION field with type String and optionally type null. Impressions are lists of items that were visible to a user when they interacted with (for example, clicked or watched) a particular item. For more information, see Impressions data.

  • RECOMMENDATION_ID: For Item interactions datasets that use previous recommendations as implicit impressions data, optionally use a RECOMMENDATION_ID field with type String and optionally type null.

    You don't need to add a RECOMMENDATION_ID field for Amazon Personalize to use implicit impressions when generating recommendations. You can pass a recommendationId in a PutEvents operation without it. For more information, see Impressions data.

  • VALUE: For Actions datasets, if you have value you data for some or all of your actions, add a VALUE field to your schema. For its type, use long and optionally type null. For more information about actions and their value, see Value data.

  • ACTION_EXPIRATION_TIMESTAMP: For Actions datasets, if you have an expiration timestamp for some or all of your actions, add a ACTION_EXPIRATION_TIMESTAMP field to your schema. For its type, use long and optionally type null. For more information about expiration timestamps, see Action expiration timestamp data.

  • REPEAT_FREQUENCY: For Actions datasets, if you have repeat frequency data for some or all of your actions, add a REPEAT_FREQUENCY field to your schema. For its type, use long and optionally type null. For more information about repeat frequency data, see Repeat frequency data.