Datasets and schemas - Amazon Personalize
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Datasets and schemas

Amazon Personalize datasets are containers for data. There are five types of datasets:

  • Item interactions – This dataset stores historical and real-time data from interactions between users and items. In Amazon Personalize, an interaction is an event that you record and then import as training data. For both Domain dataset groups and Custom dataset groups, you must at minimum create an Item interactions dataset.

  • Users – This dataset stores metadata about your users. This might include information such as age, gender, loyalty membership, or item title.

  • Items – This dataset stores metadata about your items. This might include information such as price, SKU type, or availability.

  • Actions – This dataset stores metadata about your actions. An action is an engagement activity that you might want to recommend to your customers. Actions might include installing your mobile app, completing a membership profile, joining your loyalty program, or signing up for promotional emails. For the Next-Best-Action recipe, the Actions dataset is required. No other custom recipe or domain use case uses Actions data.

  • Action interactions – This dataset stores historical and real-time data from interactions between users and actions. The Next-Best-Action recipe uses this data and the data in your Actions dataset to recommend actions to your users. No other custom recipe or domain use case uses Action-Interactions data.

Each dataset group can have only one of each dataset type. You can't create next best action resources, including Actions and Action Interactions datasets, in a domain dataset group. Amazon Personalize stores your data in datasets until you delete the datasets. For all use cases (Domain dataset groups) and recipes (Custom dataset groups), your interactions data must have the following:

  • At minimum 1000 item interactions records from users interacting with items in your catalog. These interactions can be from bulk imports, or streamed events, or both.

  • At minimum 25 unique user IDs with at least two item interactions for each.

For quality recommendations, we recommend that you have at minimum 50,000 item interactions from at least 1,000 users with two or more item interactions each.

Before you create a dataset, you define a schema for that dataset. A schema tells Amazon Personalize about the structure of your data and allows Amazon Personalize to parse the data. A schema has a name key whose value must match the dataset type. After you create a schema, you can't make changes to the schema.

For Domain dataset groups, each dataset type has a default schema with required fields and reserved keywords. Each time you create a dataset, you can either use the existing domain schema or create a new one by modifying the existing default schema. Use the default schema as a guide for what data to import for your domain. Once you define the schema and create the dataset, you can't make changes to the schema.

If you import data in bulk, your data must be stored in comma-separated values (CSV) format. The first row of your CSV file must contain column headers, which must match your schema. For information about how to format your bulk data for Amazon Personalize, Data format guidelines.