Using Aurora PostgreSQL extensions with Babelfish
Aurora PostgreSQL provides extensions for working with other Amazon services. These are optional extensions that support various use cases, such as using Amazon S3 with your DB cluster for importing or exporting data.
To import data from an Amazon S3 bucket to your Babelfish DB cluster, you set up the
aws_s3
Aurora PostgreSQL extension. This extension also lets you export data from your Aurora PostgreSQL DB cluster to an Amazon S3 bucket.Amazon Lambda is a compute service that lets you run code without provisioning or managing servers. You can use Lambda functions to do things like process event notifications from your DB instance. To learn more about Lambda, see What is Amazon Lambda? in the Amazon Lambda Developer Guide. To invoke Lambda functions from your Babelfish DB cluster, you set up the
aws_lambda
Aurora PostgreSQL extension.
To set up these extensions for your Babelfish cluster, you first need to grant permission to the internal Babelfish user to load the extensions. After granting permission, you can then load Aurora PostgreSQL extensions.
Enabling Aurora PostgreSQL extensions in your Babelfish DB cluster
Before you can load the aws_s3
or the aws_lambda
extensions, you
grant the needed privileges to your Babelfish DB cluster.
The procedure following uses the psql
PostgreSQL command line tool to
connect to the DB cluster. For more information, see Using psql to connect to the DB cluster. You
can also use pgAdmin. For details, see Using pgAdmin to connect to the DB cluster.
This procedure loads both aws_s3
and aws_lambda
, one
after the other. You don't need to load both if you want to use only one of
these extensions. The aws_commons
extension is required by each, and
it's loaded by default as shown in the output.
To set up your Babelfish DB cluster with privileges for the Aurora PostgreSQL extensions
Connect to your Babelfish DB cluster. Use the name for the "master" user (-U) that you specified when you created the Babelfish DB cluster. The default (
postgres
) is shown in the examples.For Linux, macOS, or Unix:
psql -h
your-Babelfish.cluster.444455556666-us-east-1.rds.amazonaws.com
\ -U postgres \ -d babelfish_db \ -p 5432For Windows:
psql -h
your-Babelfish.cluster.444455556666-us-east-1.rds.amazonaws.com
^ -U postgres ^ -d babelfish_db ^ -p 5432The command responds with a prompt to enter the password for the user name (-U).
Password:
Enter the password for the user name (-U) for the DB cluster. When you successfully connect, you see output similar to the following.
psql (13.4) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help.
postgres=>
Grant privileges to the internal Babelfish user to create and load extensions.
babelfish_db=>
GRANT rds_superuser TO master_dbo;
GRANT ROLE
Create and load the
aws_s3
extension. Theaws_commons
extension is required, and it's installed automatically when theaws_s3
is installed.babelfish_db=>
create extension aws_s3 cascade;
NOTICE: installing required extension "aws_commons" CREATE EXTENSION
Create and load the
aws_lambda
extension.babelfish_db=>
create extension aws_lambda cascade;
CREATE EXTENSION
babelfish_db=>
Using Babelfish with Amazon S3
If you don't already have an Amazon S3 bucket to use with your Babelfish DB cluster, you can create one. For any Amazon S3 bucket that you want to use, you provide access.
Before trying to import or export data using an Amazon S3 bucket, complete the following one-time steps.
To set up access for your Babelfish DB instance to your Amazon S3 bucket
-
Create an Amazon S3 bucket for your Babelfish instance, if needed. To do so, follow the instructions in Create a bucket in the Amazon Simple Storage Service User Guide.
Upload files to your Amazon S3 bucket. To do so, follow the steps in Add an object to a bucket in the Amazon Simple Storage Service User Guide.
-
Set up permissions as needed:
-
To import data from Amazon S3, the Babelfish DB cluster needs permission to access the bucket. We recommend using an Amazon Identity and Access Management (IAM) role and attaching an IAM policy to that role for your cluster. To do so, follow the steps in Using an IAM role to access an Amazon S3 bucket.
-
To export data from your Babelfish DB cluster, your cluster must be granted access to the Amazon S3 bucket. As with importing, we recommend using an IAM role and policy. To do so, follow the steps in Setting up access to an Amazon S3 bucket.
-
You can now use Amazon S3 with the aws_s3
extension with your Babelfish
DB cluster.
To import data from Amazon S3 to Babelfish and to export Babelfish data to Amazon S3
Use the
aws_s3
extension with your Babelfish DB cluster.When you do, make sure to reference the tables as they exist in the context of PostgreSQL. That is, if you want to import into a Babelfish table named
[database].[schema].[tableA]
, refer to that table asdatabase_schema_tableA
in theaws_s3
function:For an example of using an
aws_s3
function to import data, see Importing data from Amazon S3 to your Aurora PostgreSQL DB cluster.For examples of using
aws_s3
functions to export data, see Exporting query data using the aws_s3.query_export_to_s3 function.
Make sure to reference Babelfish tables using PostgreSQL naming when using the
aws_s3
extension and Amazon S3, as shown in the following table.
Babelfish table |
Aurora PostgreSQL table |
---|---|
|
|
To learn more about using Amazon S3 with Aurora PostgreSQL, see Importing data from Amazon S3 into an Aurora PostgreSQL DB cluster and Exporting data from an Aurora PostgreSQL DB cluster to Amazon S3.
Using Babelfish with Amazon Lambda
After the aws_lambda
extension is loaded in your Babelfish DB cluster but
before you can invoke Lambda functions, you give Lambda access to your DB cluster by
following this procedure.
To set up access for your Babelfish DB cluster to work with Lambda
This procedure uses the Amazon CLI to create the IAM policy and role, and associate these with the Babelfish DB cluster.
Create an IAM policy that allows access to Lambda from your Babelfish DB cluster.
aws iam create-policy --policy-name
rds-lambda-policy
--policy-document '{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowAccessToExampleFunction", "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": "arn:aws:lambda:
" } ] }'aws-region
:444455556666:function:my-functionCreate an IAM role that the policy can assume at runtime.
aws iam create-role --role-name
rds-lambda-role
--assume-role-policy-document '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "rds.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }'Attach the policy to the role.
aws iam attach-role-policy \ --policy-arn arn:aws:iam::
444455556666
:policy/rds-lambda-policy
\ --role-namerds-lambda-role
--regionaws-region
Attach the role to your Babelfish DB cluster
aws rds add-role-to-db-cluster \ --db-cluster-identifier
my-cluster-name
\ --feature-name Lambda \ --role-arn arn:aws:iam::444455556666:role/rds-lambda-role
\ --regionaws-region
After you complete these tasks, you can invoke your Lambda functions. For more information and examples of setting up Amazon Lambda for Aurora PostgreSQL DB cluster with Amazon Lambda, see Step 2: Configure IAM for your Aurora PostgreSQL DB cluster and Amazon Lambda.
To invoke a Lambda function from your Babelfish DB cluster
Amazon Lambda supports functions written in Java, Node.js, Python, Ruby, and other languages. If the function returns text when invoked, you can invoke it from your Babelfish DB cluster. The following example is a placeholder python function that returns a greeting.
lambda_function.py import json def lambda_handler(event, context): #TODO implement return { 'statusCode': 200, 'body': json.dumps('Hello from Lambda!')
Currently, Babelfish doesn't support JSON. If your function returns JSON, you use a wrapper to handle the JSON. For example, say that the lambda_function.py
shown
preceding is stored in Lambda as my-function
.
Connect to your Babelfish DB cluster using the
psql
client (or the pgAdmin client). For more information, see Using psql to connect to the DB cluster.Create the wrapper. This example uses PostgreSQL's procedural language for SQL,
PL/pgSQL
. To learn more, see PL/pgSQL–SQL Procedural Language. create or replace function master_dbo.lambda_wrapper() returns text language plpgsql as $$ declare r_status_code integer; r_payload text; begin SELECT payload INTO r_payload FROM aws_lambda.invoke( aws_commons.create_lambda_function_arn('my-function', 'us-east-1') ,'{"body": "Hello from Postgres!"}'::json ); return r_payload ; end; $$;
The function can now be run from the Babelfish TDS port (1433) or from the PostgreSQL port (5433).
To invoke (call) this function from your PostgreSQL port:
SELECT * from aws_lambda.invoke(aws_commons.create_lambda_function_arn('my-function', 'us-east-1'), '{"body": "Hello from Postgres!"}'::json );
The output is similar to the following:
status_code | payload | executed_version | log_result -------------+-------------------------------------------------------+------------------+------------ 200 | {"statusCode": 200, "body": "\"Hello from Lambda!\""} | $LATEST | (1 row)
To invoke (call) this function from the TDS port, connect to the port using the SQL Server
sqlcmd
command line client. For details, see Using a SQL Server client to connect to your DB cluster. When connected, run the following:1>
select lambda_wrapper();
2>
go
The command returns output similar to the following:
{"statusCode": 200, "body": "\"Hello from Lambda!\""}
To learn more about using Lambda with Aurora PostgreSQL, see Invoking an Amazon Lambda function from an Aurora PostgreSQL DB cluster. For more information about working with Lambda functions, see Getting started with Lambda in the Amazon Lambda Developer Guide.
Using pg_stat_statements in Babelfish
Babelfish for Aurora PostgreSQL supports pg_stat_statements
extension from 3.3.0. To learn more, see pg_stat_statements
For details about the version of this extension supported by Aurora PostgreSQL, see Extension versions
Creating pg_stat_statements extension
To turn on pg_stat_statements
, you must turn on the Query identifier calculation. This is done automatically if compute_query_id
is set to on
or auto
in the parameter group. The default value of compute_query_id
parameter is auto
.
You also need to create this extension to turn on this feature. Use the following command to install the extension from T-SQL endpoint:
1>
EXEC sp_execute_postgresql 'CREATE EXTENSION pg_stat_statements WITH SCHEMA sys';
You can access the query statistics using the following query:
postgres=>
select * from pg_stat_statements;
Note
During installation, if you don't provide the schema name for the extension then by default it will create it in public schema. To access it, you must use square brackets with schema qualifier as shown below:
postgres=>
select * from [public].pg_stat_statements;
You can also create the extension from PSQL endpoint.
Authorizing the extension
By default, you can see the statistics for queries performed within your T-SQL database without the need of any authorization.
To access query statistics created by others, you need to have pg_read_all_stats
PostgreSQL role. Follow the steps mentioned below to construct GRANT pg_read_all_stats command.
In T-SQL, use the following query that returns the internal PG role name.
SELECT rolname FROM pg_roles WHERE oid = USER_ID();
Connect to Babelfish for Aurora PostgreSQL database with rds_superuser privilege and use the following command:
GRANT pg_read_all_stats TO <rolname_from_above_query>
Example
From T-SQL endpoint:
1>
SELECT rolname FROM pg_roles WHERE oid = USER_ID();2>
go
rolname
-------
master_dbo
(1 rows affected)
From PSQL endpoint:
babelfish_db=# grant pg_read_all_stats to master_dbo;
GRANT ROLE
You can access the query statistics using the pg_stat_statements view:
1>
create table t1(cola int);2>
go1>
insert into t1 values (1),(2),(3);2>
go
(3 rows affected)
1>
select userid, dbid, queryid, query from pg_stat_statements;2>
go
userid dbid queryid query
------ ---- ------- -----
37503 34582 6487973085327558478 select * from t1
37503 34582 6284378402749466286 SET QUOTED_IDENTIFIER OFF
37503 34582 2864302298511657420 insert into t1 values ($1),($2),($3)
10 34582 NULL <insufficient privilege>
37503 34582 5615368793313871642 SET TEXTSIZE 4096
37503 34582 639400815330803392 create table t1(cola int)
(6 rows affected)
Resetting query statistics
You can use pg_stat_statements_reset()
to reset the statistics gathered so far by pg_stat_statements. To learn more, see
pg_stat_statementsrds_superuser
privilege, use the following command:
SELECT pg_stat_statements_reset();
Limitations
Currently,
pg_stat_statements()
is not supported through T-SQL endpoint.pg_stat_statements
view is the recommended way to gather the statistics.Some of the queries might be re-written by the T-SQL parser implemented by Aurora PostgreSQL engine,
pg_stat_statements
view will show the re-written query and not the original query.Example
select next value for [dbo].[newCounter];
The above query is re-written as the following in the pg_stat_statements view.
select nextval($1);
Based on the execution flow of the statements, some of the queries might not be tracked by pg_stat_statements and will not be visible in the view. This includes the following statements:
use dbname
,goto
,print
,raise error
,set
,throw
,declare cursor
.For CREATE LOGIN and ALTER LOGIN statements, query and queryid will not be shown. It will show insufficient privileges.
pg_stat_statements
view always contains the below two entries, as these are executed internally bysqlcmd
client.SET QUOTED_IDENTIFIER OFF
SET TEXTSIZE 4096