Exporting a dataset
Note
You can't export data in an Action interactions dataset or Actions dataset.
After you import your data into an Amazon Personalize dataset, you can export the data to an Amazon S3 bucket. You might export data to verify and inspect the data that Amazon Personalize uses to generate recommendations, view the item interaction events that you previously recorded in real time, or perform offline analysis on your data.
You can choose to export only the data that you imported in bulk
(imported using an Amazon Personalize dataset import job), only the data that you
imported individually (records imported using the
console or the PutEvents
, PutUsers
, or
PutItems
operations), or both.
For records that match exactly for all fields, Amazon Personalize exports just one record. If two records have the same ID but one or more fields are different, Amazon Personalize includes or removes the records depending on data you choose to export:
-
If you export both bulk and incremental data, Amazon Personalize exports only the newest items with the same ID (in Items dataset exports), and only users with the same ID (in Users dataset exports). For Item interactions datasets, Amazon Personalize exports all item interactions data.
-
If you export incremental data only, Amazon Personalize exports all item, user, or item interaction data that you imported individually, including items or users with the same IDs. Only records that match exactly for all fields are excluded.
-
If you export bulk data only, Amazon Personalize includes all item, user, or item interaction data that you imported in bulk, including items or users with the same IDs. Only records that match exactly for all fields are excluded.
To export a dataset, you create a dataset export job. A dataset export job is a record export tool that outputs the records in a dataset to one or more CSV files in an Amazon S3 bucket. The output CSV file includes a header row with column names that match the fields in the dataset's schema.
You can create a dataset export job with the Amazon Personalize console, Amazon Command Line Interface (Amazon CLI), or Amazon SDKs.
Topics
Dataset export job permissions requirements
To export a dataset, Amazon Personalize needs permission to add files to your Amazon S3 bucket. To grant
permissions, attach a new Amazon Identity and Access Management (IAM) policy to your Amazon Personalize service role that
grants the role permission to use the PutObject
and ListBucket
Actions on your bucket, and attach a bucket policy to your output Amazon S3 bucket that
grants the Amazon Personalize principle permission to use the PutObject
and ListBucket
Actions.
If you use Amazon Key Management Service (Amazon KMS) for encryption, you must grant Amazon Personalize and your Amazon Personalize IAM service role permission to use your key. For more information, see Giving Amazon Personalize permission to use your Amazon KMS key.
Service role policy for exporting a dataset
The following example policy grants your Amazon Personalize service role permission to use the
PutObject
and ListBucket
Actions. Replace bucket-name
with the name of your
output bucket. For information about attaching policies to a IAM service role, see
Attaching an Amazon S3 policy to your Amazon Personalize service role.
{ "Version": "2012-10-17", "Id": "PersonalizeS3BucketAccessPolicy", "Statement": [ { "Sid": "PersonalizeS3BucketAccessPolicy", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::
bucket-name
", "arn:aws:s3:::bucket-name
/*" ] } ] }
Amazon S3 bucket policy for exporting a dataset
The following example policy grants Amazon Personalize permission to use the
PutObject
and ListBucket
Actions on an Amazon S3 bucket. Replace bucket-name
with the
name of your bucket. For information on adding an Amazon S3 bucket policy to a bucket, see How Do I Add
an S3 Bucket Policy? in the Amazon Simple Storage Service User Guide.
{ "Version": "2012-10-17", "Id": "PersonalizeS3BucketAccessPolicy", "Statement": [ { "Sid": "PersonalizeS3BucketAccessPolicy", "Effect": "Allow", "Principal": { "Service": "personalize.amazonaws.com" }, "Action": [ "s3:PutObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::
bucket-name
", "arn:aws:s3:::bucket-name
/*" ] } ] }
Creating a dataset export job (console)
After you import your data into a dataset and create an output Amazon S3 bucket, you can export the data to the bucket for analysis. To export a dataset using the Amazon Personalize console, you create a dataset export job. For information about creating an Amazon S3 bucket, see Creating a bucket in the Amazon Simple Storage Service User Guide.
Before you export a dataset, make sure that your Amazon Personalize service role can access and write to your output Amazon S3 bucket. See Dataset export job permissions requirements.
To create a dataset export job (console)
-
Open the Amazon Personalize console at https://console.amazonaws.cn/personalize/home
. -
In the navigation pane, choose Dataset groups.
-
On the Dataset groups page, choose your dataset group.
-
In the navigation pane, choose Datasets.
-
Choose the dataset that you want to export to an Amazon S3 bucket.
-
In Dataset export jobs, choose Create dataset export job.
-
In Dataset export job details, for Dataset export job name, enter a name for the export job.
-
For IAM service role, choose the Amazon Personalize service role that you created in Creating an IAM role for Amazon Personalize.
-
For Amazon S3 data output path, enter the destination Amazon S3 bucket. Use the following syntax:
s3://<name of your S3 bucket>/<folder path>
-
If you are using Amazon KMS for encryption, for KMS key ARN, enter the Amazon Resource Name (ARN) for the Amazon KMS key.
-
For Export data type, choose the type data to export based on how you originally imported the data.
-
Choose Bulk to export only data that you imported in bulk using a dataset import job.
-
Choose Incremental to export only data that you imported individually using the console or the
PutEvents
,PutUsers
, orPutItems
operations. -
Choose Both to export all of the data in the dataset.
-
-
For Tags, optionally add any tags. For more information about tagging Amazon Personalize resources, see Tagging Amazon Personalize resources.
-
Choose Create dataset export job.
On the Dataset overview page, in Dataset export jobs, the job is listed with an Export job status. The dataset export job is complete when the status is ACTIVE. You can then download the data from the output Amazon S3 bucket. For information on downloading objects from an Amazon S3 bucket, see Downloading an object in the Amazon Simple Storage Service User Guide..
Creating a dataset export job (Amazon CLI)
After you import your data into the dataset and create an output Amazon S3
bucket, you can export the dataset to the bucket for analysis. To export a
dataset using the Amazon CLI, create a dataset export job using the
create-dataset-export-job
Amazon CLI command. For information
about creating an Amazon S3 bucket, see Creating a
bucket in the Amazon Simple Storage Service User Guide.
Before you export a dataset, make sure that the Amazon Personalize service role can access and write to your output Amazon S3 bucket. See Dataset export job permissions requirements.
The following is an example of the
create-dataset-export-job
Amazon CLI command. Give the job a
name, replace dataset arn
with the Amazon Resource Name (ARN)
of the dataset that you want to export, and replace role ARN
with the ARN of the Amazon Personalize service role that you created in Creating an IAM role for Amazon Personalize. In
s3DataDestination
, for the kmsKeyArn
,
optionally provide the ARN for your Amazon KMS key, and for the
path
provide the path to your output Amazon S3 bucket.
For ingestion-mode
, specify the data to export from the
following options:
-
Specify
BULK
to export only data that you imported in bulk using a dataset import job. -
Specify
PUT
to export only data that you imported individually using the console or thePutEvents
, PutUsers, orPutItems
operations. -
Specify
ALL
to export all of the data in the dataset.
For more information see CreateDatasetExportJob.
aws personalize create-dataset-export-job \ --job-name
job name
\ --dataset-arndataset ARN
\ --job-output "{\"s3DataDestination\":{\"kmsKeyArn\":\"kms key ARN
\",\"path\":\"s3://bucket-name
/folder-name
/\"}}" \ --role-arnrole ARN
\ --ingestion-modePUT
The dataset export job ARN is displayed.
{ "datasetExportJobArn": "arn:aws:personalize:us-west-2:acct-id:dataset-export-job/DatasetExportJobName" }
Use the DescribeDatasetExportJob
operation to check the
status.
aws personalize describe-dataset-export-job \ --dataset-export-job-arn
dataset export job ARN
Creating a dataset export job (Amazon SDKs)
After you import your data into the dataset and create an output Amazon S3 bucket, you can export the dataset to the bucket for analysis. To export a dataset using the Amazon SDKs, create a dataset export job using the CreateDatasetExportJob operation. For information about creating an Amazon S3 bucket, see Creating a bucket in the Amazon Simple Storage Service User Guide.
The following code shows how to create a dataset export job using the SDK for Python (Boto3) or the SDK for Java 2.x SDK.
Before you export a dataset, make sure that the Amazon Personalize service role can access and write to your output Amazon S3 bucket. See Dataset export job permissions requirements.