Custom datasets and schemas
When you create a Custom dataset group, you create your own schemas from scratch. Custom dataset group datasets and schemas have fewer required fields and more flexibility. The following topics explain the schema and data requirements for datasets a Custom dataset group. Each dataset section lists the required data for the dataset type and provides a JSON example of a schema.
For information on the types of data you can import into Amazon Personalize see Types of data Amazon Personalize can use. For information about general Amazon Personalize schema requirements, such as formatting requirements and available field data types, see Creating schema JSON files for Amazon Personalize schemas. These requirements apply to all Amazon Personalize schemas.
Topics
Custom dataset and schema requirements
When you create a dataset for a Custom dataset group, each dataset type has the following required fields and reserved keywords with required data types.
Dataset type | Required fields | Reserved keywords |
---|---|---|
Item interactions (schema example) |
USER_ID ( ITEM_ID ( TIMESTAMP ( |
EVENT_TYPE ( EVENT_VALUE ( IMPRESSION ( RECOMMENDATION_ID ( EVENT_ATTRIBUTION_SOURCE ( |
Users (schema example) |
USER_ID ( 1 metadata field (categorical |
|
Items (schema example) |
ITEM_ID ( 1 metadata field (categorical or textual |
CREATION_TIMESTAMP ( |
Actions (schema example) |
ACTION_ID ( 1 metadata field (categorical |
CREATION_TIMESTAMP ( VALUE ( TYPE ( EXPIRATION_TIMESTAMP ( REPEAT_FREQUENCY ( |
Action interactions (schema example) |
USER_ID ( ACTION_ID ( EVENT_TYPE ( TIMESTAMP ( |
IMPRESSION ( RECOMMENDATION_ID ( |
Metadata fields
Metadata includes string or non-string fields that aren't required or don't use a reserved keyword. Metadata schemas have the following restrictions:
-
Users, Items, and Actions schemas require at least one metadata field.
-
You can add at most 25 metadata fields for a Users schema, 100 metadata fields for an Items schema, and 10 metadata fields for an Actions schema.
-
If you add your own metadata field of type
string
, it must include thecategorical
attribute or thetextual
attribute (only Items schemas support fields with the textual attribute). Otherwise, Amazon Personalize won't use the field when training a model.
Reserved keywords
Reserved keywords are optional, non-metadata fields. These fields
are considered reserved because you must define the fields as their
required data type when you use them, and the keywords can't be used as
values in your data. Reserved categorical string fields must have
categorical
set to true
, while reserved
string fields can't be categorical. The following are reserved
keywords:
-
EVENT_TYPE: For Item interactions datasets with one or more event types, such as both click and download, use an
EVENT_TYPE
field. You must define an EVENT_TYPE field as astring
and can't be set as categorical. -
EVENT_VALUE: For Item interactions datasets that include value data for events, such as the percentage of a video a user watched, use an
EVENT_VALUE
field with typefloat
and optionallynull
. -
CREATION_TIMESTAMP: For Items or Actions datasets with a timestamp for each item’s creation date, use a
CREATION_TIMESTAMP
field with a typelong
. Amazon Personalize usesCREATION_TIMESTAMP
data to calculate the age of an item and adjust recommendations accordingly. See Creation timestamp data. -
IMPRESSION: For Item interactions datasets with explicit impressions data, use an
IMPRESSION
field with typeString
and optionally typenull
. Impressions are lists of items that were visible to a user when they interacted with (for example, clicked or watched) a particular item. For more information, see Impressions data. -
RECOMMENDATION_ID: For Item interactions datasets that use previous recommendations as implicit impressions data, optionally use a
RECOMMENDATION_ID
field with typeString
and optionally typenull
.You don't need to add a
RECOMMENDATION_ID
field for Amazon Personalize to use implicit impressions when generating recommendations. You can pass arecommendationId
in a PutEvents operation without it. For more information, see Impressions data. -
VALUE: For Actions datasets, if you have value you data for some or all of your actions, add a
VALUE
field to your schema. For its type, uselong
and optionally typenull
. For more information about actions and their value, see Value data. -
ACTION_EXPIRATION_TIMESTAMP: For Actions datasets, if you have an expiration timestamp for some or all of your actions, add a
ACTION_EXPIRATION_TIMESTAMP
field to your schema. For its type, uselong
and optionally typenull
. For more information about expiration timestamps, see Action expiration timestamp data. -
REPEAT_FREQUENCY: For Actions datasets, if you have repeat frequency data for some or all of your actions, add a
REPEAT_FREQUENCY
field to your schema. For its type, uselong
and optionally typenull
. For more information about repeat frequency data, see Repeat frequency data.