Exporting data from an RDS for PostgreSQL DB instance to Amazon S3
You can query data from an RDS for PostgreSQL DB instance
and export it directly into files stored in an Amazon S3 bucket. To do this, you first install the
RDS for PostgreSQL aws_s3
extension. This extension provides you with
the functions that you use to export the results of queries to Amazon S3. Following, you can find out how to install the extension and how to export data to Amazon S3.
You can export from a provisioned or an Aurora Serverless v2 DB instance. These steps aren't supported for Aurora Serverless v1.
Note
Cross-account export to Amazon S3 isn't supported.
All currently available versions of RDS for PostgreSQL support exporting data to Amazon Simple Storage Service. For detailed version information, see Amazon RDS for PostgreSQL updates in the Amazon RDS for PostgreSQL Release Notes.
If you don't have a bucket set up for your export, see the following topics the Amazon Simple Storage Service User Guide.
The upload to Amazon S3 uses server-side encryption by default. If you are using encryption, the Amazon S3 bucket must be encrypted with an Amazon managed key. Currently, you can't export data to a bucket that's encrypted with a customer managed key.
Note
You can save DB snapshot data to Amazon S3 using the Amazon Web Services Management Console, Amazon CLI, or Amazon RDS API. For more information, see Exporting DB snapshot data to Amazon S3.
Topics
Installing the aws_s3 extension
Before you can use Amazon Simple Storage Service with your
RDS for PostgreSQL DB instance, you need to install the
aws_s3
extension. This extension provides functions for exporting data from
an RDS for PostgreSQL DB instance
to an Amazon S3 bucket. It also provides functions for importing data from an Amazon S3. For more information,
see Importing data from Amazon S3 into an RDS for PostgreSQL DB instance.
The aws_s3
extension depends on some of the helper functions in the
aws_commons
extension, which is installed automatically when needed.
To install the aws_s3
extension
Use psql (or pgAdmin) to connect to the RDS for PostgreSQL DB instance as a user that has
rds_superuser
privileges. If you kept the default name during the setup process, you connect aspostgres
.psql --host=
111122223333
.aws-region
.rds.amazonaws.com --port=5432 --username=postgres --passwordTo install the extension, run the following command.
postgres=>
CREATE EXTENSION aws_s3 CASCADE;NOTICE: installing required extension "aws_commons" CREATE EXTENSION
To verify that the extension is installed, you can use the psql
\dx
metacommand.postgres=>
\dxList of installed extensions Name | Version | Schema | Description -------------+---------+------------+--------------------------------------------- aws_commons | 1.2 | public | Common data types across Amazon services aws_s3 | 1.1 | public | Amazon S3 extension for importing data from S3 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language (3 rows)
The functions for importing data from Amazon S3 and exporting data to Amazon S3 are now available to use.
Verify that your RDS for PostgreSQL version supports exports to Amazon S3
You can verify that your RDS for PostgreSQL version supports export to Amazon S3 by
using the describe-db-engine-versions
command. The following example verifies support for version 10.14.
aws rds describe-db-engine-versions --region us-east-1 ^ --engine postgres --engine-version 10.14 | grep s3Export
If the output includes the string "s3Export"
, then the engine supports Amazon S3 exports. Otherwise,
the engine doesn't support them.
Overview of exporting data to Amazon S3
To export data stored in an RDS for PostgreSQL database to an Amazon S3 bucket, use the following procedure.
To export RDS for PostgreSQL data to S3
-
Identify an Amazon S3 file path to use for exporting data. For details about this process, see Specifying the Amazon S3 file path to export to.
-
Provide permission to access the Amazon S3 bucket.
To export data to an Amazon S3 file, give the RDS for PostgreSQL DB instance permission to access the Amazon S3 bucket that the export will use for storage. Doing this includes the following steps:
-
Create an IAM policy that provides access to an Amazon S3 bucket that you want to export to.
-
Create an IAM role.
-
Attach the policy you created to the role you created.
-
Add this IAM role to your DB instance.
For details about this process, see Setting up access to an Amazon S3 bucket.
-
-
Identify a database query to get the data. Export the query data by calling the
aws_s3.query_export_to_s3
function.After you complete the preceding preparation tasks, use the aws_s3.query_export_to_s3 function to export query results to Amazon S3. For details about this process, see Exporting query data using the aws_s3.query_export_to_s3 function.
Specifying the Amazon S3 file path to export to
Specify the following information to identify the location in Amazon S3 where you want to export data to:
-
Bucket name – A bucket is a container for Amazon S3 objects or files.
For more information on storing data with Amazon S3, see Create a bucket and View an object in the Amazon Simple Storage Service User Guide.
-
File path – The file path identifies where the export is stored in the Amazon S3 bucket. The file path consists of the following:
-
An optional path prefix that identifies a virtual folder path.
-
A file prefix that identifies one or more files to be stored. Larger exports are stored in multiple files, each with a maximum size of approximately 6 GB. The additional file names have the same file prefix but with
_part
appended. TheXX
represents 2, then 3, and so on.XX
For example, a file path with an
exports
folder and aquery-1-export
file prefix is/exports/query-1-export
. -
-
Amazon Region (optional) – The Amazon Region where the Amazon S3 bucket is located. If you don't specify an Amazon Region value, then Amazon RDS saves your files into Amazon S3 in the same Amazon Region as the exporting DB instance.
Note
Currently, the Amazon Region must be the same as the region of the exporting DB instance.
For a listing of Amazon Region names and associated values, see Regions, Availability Zones, and Local Zones.
To hold the Amazon S3 file information about where the export is to be stored, you can use
the aws_commons.create_s3_uri function to create an
aws_commons._s3_uri_1
composite structure as follows.
psql=> SELECT aws_commons.create_s3_uri( 'sample-bucket', 'sample-filepath', 'us-west-2' ) AS s3_uri_1 \gset
You later provide this s3_uri_1
value as a parameter in the call to the
aws_s3.query_export_to_s3 function. For examples, see Exporting query data using the
aws_s3.query_export_to_s3 function.
Setting up access to an Amazon S3 bucket
To export data to Amazon S3, give your PostgreSQL DB instance permission to access the Amazon S3 bucket that the files are to go in.
To do this, use the following procedure.
To give a PostgreSQL DB instance access to Amazon S3 through an IAM role
-
Create an IAM policy.
This policy provides the bucket and object permissions that allow your PostgreSQL DB instance to access Amazon S3.
As part of creating this policy, take the following steps:
-
Include in the policy the following required actions to allow the transfer of files from your PostgreSQL DB instance to an Amazon S3 bucket:
-
s3:PutObject
-
s3:AbortMultipartUpload
-
-
Include the Amazon Resource Name (ARN) that identifies the Amazon S3 bucket and objects in the bucket. The ARN format for accessing Amazon S3 is:
arn:aws-cn:s3:::
your-s3-bucket
/*
For more information on creating an IAM policy for Amazon RDS for PostgreSQL, see Creating and using an IAM policy for IAM database access. See also Tutorial: Create and attach your first customer managed policy in the IAM User Guide.
The following Amazon CLI command creates an IAM policy named
rds-s3-export-policy
with these options. It grants access to a bucket namedyour-s3-bucket
.Warning
We recommend that you set up your database within a private VPC that has endpoint policies configured for accessing specific buckets. For more information, see Using endpoint policies for Amazon S3 in the Amazon VPC User Guide.
We strongly recommend that you do not create a policy with all-resource access. This access can pose a threat for data security. If you create a policy that gives
S3:PutObject
access to all resources using"Resource":"*"
, then a user with export privileges can export data to all buckets in your account. In addition, the user can export data to any publicly writable bucket within your Amazon Region.After you create the policy, note the Amazon Resource Name (ARN) of the policy. You need the ARN for a subsequent step when you attach the policy to an IAM role.
aws iam create-policy --policy-name rds-s3-export-policy --policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "s3export", "Action": [ "s3:PutObject", "s3:AbortMultipartUpload" ], "Effect": "Allow", "Resource": [ "arn:aws-cn:s3:::
your-s3-bucket
/*" ] } ] }' -
-
Create an IAM role.
You do this so Amazon RDS can assume this IAM role on your behalf to access your Amazon S3 buckets. For more information, see Creating a role to delegate permissions to an IAM user in the IAM User Guide.
We recommend using the
aws:SourceArn
andaws:SourceAccount
global condition context keys in resource-based policies to limit the service's permissions to a specific resource. This is the most effective way to protect against the confused deputy problem.If you use both global condition context keys and the
aws:SourceArn
value contains the account ID, theaws:SourceAccount
value and the account in theaws:SourceArn
value must use the same account ID when used in the same policy statement.Use
aws:SourceArn
if you want cross-service access for a single resource.-
Use
aws:SourceAccount
if you want to allow any resource in that account to be associated with the cross-service use.
In the policy, be sure to use the
aws:SourceArn
global condition context key with the full ARN of the resource. The following example shows how to do so using the Amazon CLI command to create a role namedrds-s3-export-role
.Example
For Linux, macOS, or Unix:
aws iam create-role \ --role-name rds-s3-export-role \ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "
111122223333
", "aws:SourceArn": "arn:aws:rds:us-east-1:111122223333:db:dbname
" } } } ] }'For Windows:
aws iam create-role ^ --role-name rds-s3-export-role ^ --assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "
111122223333
", "aws:SourceArn": "arn:aws:rds:us-east-1:111122223333:db:dbname
" } } } ] }' -
Attach the IAM policy that you created to the IAM role that you created.
The following Amazon CLI command attaches the policy created earlier to the role named
rds-s3-export-role.
Replace
with the policy ARN that you noted in an earlier step.your-policy-arn
aws iam attach-role-policy --policy-arn
your-policy-arn
--role-name rds-s3-export-role -
Add the IAM role to the DB instance. You do so by using the Amazon Web Services Management Console or Amazon CLI, as described following.
To add an IAM role for a PostgreSQL DB instance using the console
Sign in to the Amazon Web Services Management Console and open the Amazon RDS console at https://console.amazonaws.cn/rds/
. -
Choose the PostgreSQL DB instance name to display its details.
-
On the Connectivity & security tab, in the Manage IAM roles section, choose the role to add under Add IAM roles to this instance.
-
Under Feature, choose s3Export.
-
Choose Add role.
To add an IAM role for a PostgreSQL DB instance using the CLI
-
Use the following command to add the role to the PostgreSQL DB instance named
my-db-instance
. Replace
with the role ARN that you noted in a previous step. Useyour-role-arn
s3Export
for the value of the--feature-name
option.Example
For Linux, macOS, or Unix:
aws rds add-role-to-db-instance \ --db-instance-identifier
my-db-instance
\ --feature-name s3Export \ --role-arnyour-role-arn
\ --regionyour-region
For Windows:
aws rds add-role-to-db-instance ^ --db-instance-identifier
my-db-instance
^ --feature-name s3Export ^ --role-arnyour-role-arn
^ --regionyour-region
Exporting query data using the aws_s3.query_export_to_s3 function
Export your PostgreSQL data to Amazon S3 by calling the aws_s3.query_export_to_s3 function.
Topics
Prerequisites
Before you use the aws_s3.query_export_to_s3
function, be sure to
complete the following prerequisites:
-
Install the required PostgreSQL extensions as described in Overview of exporting data to Amazon S3.
-
Determine where to export your data to Amazon S3 as described in Specifying the Amazon S3 file path to export to.
-
Make sure that the DB instance has export access to Amazon S3 as described in Setting up access to an Amazon S3 bucket.
The examples following use a database table called sample_table
.
These examples export the data into a bucket called sample-bucket
. The
example table and data are created with the following SQL statements in psql.
psql=> CREATE TABLE sample_table (bid bigint PRIMARY KEY, name varchar(80)); psql=> INSERT INTO sample_table (bid,name) VALUES (1, 'Monday'), (2,'Tuesday'), (3, 'Wednesday');
Calling aws_s3.query_export_to_s3
The following shows the basic ways of calling the aws_s3.query_export_to_s3 function.
These examples use the variable s3_uri_1
to identify a structure that
contains the information identifying the Amazon S3 file. Use the aws_commons.create_s3_uri
function to create the structure.
psql=> SELECT aws_commons.create_s3_uri( 'sample-bucket', 'sample-filepath', 'us-west-2' ) AS s3_uri_1 \gset
Although the parameters vary for the following two
aws_s3.query_export_to_s3
function calls, the results are the same
for these examples. All rows of the sample_table
table are exported
into a bucket called sample-bucket
.
psql=> SELECT * FROM aws_s3.query_export_to_s3('SELECT * FROM sample_table', :'s3_uri_1'); psql=> SELECT * FROM aws_s3.query_export_to_s3('SELECT * FROM sample_table', :'s3_uri_1', options :='format text');
The parameters are described as follows:
-
'SELECT * FROM sample_table'
– The first parameter is a required text string containing an SQL query. The PostgreSQL engine runs this query. The results of the query are copied to the S3 bucket identified in other parameters. -
:'s3_uri_1'
– This parameter is a structure that identifies the Amazon S3 file. This example uses a variable to identify the previously created structure. You can instead create the structure by including theaws_commons.create_s3_uri
function call inline within theaws_s3.query_export_to_s3
function call as follows.SELECT * from aws_s3.query_export_to_s3('select * from sample_table', aws_commons.create_s3_uri('sample-bucket', 'sample-filepath', 'us-west-2') );
-
options :='format text'
– Theoptions
parameter is an optional text string containing PostgreSQLCOPY
arguments. The copy process uses the arguments and format of the PostgreSQL COPYcommand.
If the file specified doesn't exist in the Amazon S3 bucket, it's created. If the file already exists, it's overwritten. The syntax for accessing the exported data in Amazon S3 is the following.
s3-
region
://bucket-name
[/path-prefix
]/file-prefix
Larger exports are stored in multiple files, each with a maximum size of
approximately 6 GB. The additional file names have the same file prefix but with
_part
appended. The
XX
represents 2, then 3, and so on.
For example, suppose that you specify the path where you store data files as the
following.XX
s3-us-west-2://my-bucket/my-prefix
If the export has to create three data files, the Amazon S3 bucket contains the following data files.
s3-us-west-2://my-bucket/my-prefix s3-us-west-2://my-bucket/my-prefix_part2 s3-us-west-2://my-bucket/my-prefix_part3
For the full reference for this function and additional ways to call it, see aws_s3.query_export_to_s3. For more about accessing files in Amazon S3, see View an object in the Amazon Simple Storage Service User Guide.
Exporting to a CSV file that uses a custom delimiter
The following example shows how to call the aws_s3.query_export_to_s3 function to export data to a
file that uses a custom delimiter. The example uses arguments of the PostgreSQL
COPY
SELECT * from aws_s3.query_export_to_s3('select * from basic_test', :'s3_uri_1', options :='format csv, delimiter $$:$$');
Exporting to a binary file with encoding
The following example shows how to call the aws_s3.query_export_to_s3 function to export data to a binary file that has Windows-1253 encoding.
SELECT * from aws_s3.query_export_to_s3('select * from basic_test', :'s3_uri_1', options :='format binary, encoding WIN1253');
Troubleshooting access to Amazon S3
If you encounter connection problems when attempting to export data to Amazon S3, first confirm that the outbound access rules for the VPC security group associated with your DB instance permit network connectivity. Specifically, the security group must have a rule that allows the DB instance to send TCP traffic to port 443 and to any IPv4 addresses (0.0.0.0/0). For more information, see Provide access to your DB instance in your VPC by creating a security group.
See also the following for recommendations:
-
Troubleshooting Amazon S3 in the Amazon Simple Storage Service User Guide
-
Troubleshooting Amazon S3 and IAM in the IAM User Guide
Function reference
aws_s3.query_export_to_s3
Exports a PostgreSQL query result to an Amazon S3 bucket. The aws_s3
extension provides the aws_s3.query_export_to_s3
function.
The two required parameters are query
and s3_info
. These
define the query to be exported and identify the Amazon S3 bucket to export to. An
optional parameter called options
provides for defining various export
parameters. For examples of using the aws_s3.query_export_to_s3
function, see Exporting query data using the
aws_s3.query_export_to_s3 function.
Syntax
aws_s3.query_export_to_s3( query text, s3_info aws_commons._s3_uri_1, options text )
Input parameters
- query
-
A required text string containing an SQL query that the PostgreSQL engine runs. The results of this query are copied to an S3 bucket identified in the
s3_info
parameter. - s3_info
-
An
aws_commons._s3_uri_1
composite type containing the following information about the S3 object:-
bucket
– The name of the Amazon S3 bucket to contain the file. -
file_path
– The Amazon S3 file name and path. -
region
– The Amazon Region that the bucket is in. For a listing of Amazon Region names and associated values, see Regions, Availability Zones, and Local Zones.Currently, this value must be the same Amazon Region as that of the exporting DB instance. The default is the Amazon Region of the exporting DB instance.
To create an
aws_commons._s3_uri_1
composite structure, see the aws_commons.create_s3_uri function. -
- options
-
An optional text string containing arguments for the PostgreSQL
COPY
command. These arguments specify how the data is to be copied when exported. For more details, see the PostgreSQL COPY documentation.
Alternate input parameters
To help with testing, you can use an expanded set of parameters instead of the
s3_info
parameter. Following are additional syntax variations
for the aws_s3.query_export_to_s3
function.
Instead of using the s3_info
parameter to identify an Amazon S3 file,
use the combination of the bucket
, file_path
, and
region
parameters.
aws_s3.query_export_to_s3( query text, bucket text, file_path text, region text, options text )
- query
-
A required text string containing an SQL query that the PostgreSQL engine runs. The results of this query are copied to an S3 bucket identified in the
s3_info
parameter. - bucket
-
A required text string containing the name of the Amazon S3 bucket that contains the file.
- file_path
-
A required text string containing the Amazon S3 file name including the path of the file.
- region
-
An optional text string containing the Amazon Region that the bucket is in. For a listing of Amazon Region names and associated values, see Regions, Availability Zones, and Local Zones.
Currently, this value must be the same Amazon Region as that of the exporting DB instance. The default is the Amazon Region of the exporting DB instance.
- options
-
An optional text string containing arguments for the PostgreSQL
COPY
command. These arguments specify how the data is to be copied when exported. For more details, see the PostgreSQL COPY documentation.
Output parameters
aws_s3.query_export_to_s3( OUT rows_uploaded bigint, OUT files_uploaded bigint, OUT bytes_uploaded bigint )
- rows_uploaded
-
The number of table rows that were successfully uploaded to Amazon S3 for the given query.
- files_uploaded
-
The number of files uploaded to Amazon S3. Files are created in sizes of approximately 6 GB. Each additional file created has
_part
appended to the name. TheXX
represents 2, then 3, and so on as needed.XX
- bytes_uploaded
-
The total number of bytes uploaded to Amazon S3.
Examples
psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath','us-west-2'); psql=> SELECT * from aws_s3.query_export_to_s3('select * from sample_table', 'sample-bucket', 'sample-filepath','us-west-2','format text');
aws_commons.create_s3_uri
Creates an aws_commons._s3_uri_1
structure to hold Amazon S3 file
information. You use the results of the aws_commons.create_s3_uri
function in the s3_info
parameter of the aws_s3.query_export_to_s3 function. For an example of
using the aws_commons.create_s3_uri
function, see Specifying the Amazon S3 file path to export
to.
Syntax
aws_commons.create_s3_uri( bucket text, file_path text, region text )
Input parameters
- bucket
-
A required text string containing the Amazon S3 bucket name for the file.
- file_path
-
A required text string containing the Amazon S3 file name including the path of the file.
- region
-
A required text string containing the Amazon Region that the file is in. For a listing of Amazon Region names and associated values, see Regions, Availability Zones, and Local Zones.