Vertica connections
You can use Amazon Glue for Spark to read from and write to tables in Vertica in Amazon Glue 4.0 and later versions. You can define what to read from Vertica with a SQL query. You connect to Vertica using username and password credentials stored in Amazon Secrets Manager through a Amazon Glue connection.
For more information about Vertica, consult the Vertica
documentation
Configuring Vertica connections
To connect to Vertica from Amazon Glue, you will need to create and store your Vertica credentials in a Amazon Secrets Manager secret, then associate that secret with a Vertica Amazon Glue connection. If your Vertica instance is in an Amazon VPC, you will also need to provide networking options to your Amazon Glue Vertica connection. You will need an Amazon S3 bucket or folder to use for temporary storage when reading from and writing to the database.
To connect to Vertica from Amazon Glue, you will need some prerequisites:
-
An Amazon S3 bucket or folder to use for temporary storage when reading from and writing to the database, referred to by
tempS3Path
.Note
When using Vertica in Amazon Glue job data previews, temporary files may not be automatically removed from
tempS3Path
. To ensure the removal of temporary files, directly end the data preview session by choosing End session in the Data preview pane.If you cannot guarantee the data preview session is ended directly, consider setting Amazon S3 Lifecycle configuration to remove old data. We recommend removing data older than 49 hours, based on maximum job runtime plus a margin. For more information about configuring Amazon S3 Lifecycle, see Managing your storage lifecycle in the Amazon S3 documentation.
-
An IAM policy with appropriate permissions to your Amazon S3 path you can associate with your Amazon Glue job role.
-
If your Vertica instance is in an Amazon VPC, configure Amazon VPC to allow your Amazon Glue job to communicate with the Vertica instance without traffic traversing the public internet.
In Amazon VPC, identify or create a VPC, Subnet and Security group that Amazon Glue will use while executing the job. Additionally, you need to ensure Amazon VPC is configured to permit network traffic between your Vertica instance and this location. Your job will need to establish a TCP connection with your Vertica client port, (default 5433). Based on your network layout, this may require changes to security group rules, Network ACLs, NAT Gateways and Peering connections.
You can then proceed to configure Amazon Glue for use with Vertica.
To configure a connection to Vertica:
-
In Amazon Secrets Manager, create a secret using your Vertica credentials,
verticaUsername
andverticaPassword
. To create a secret in Secrets Manager, follow the tutorial available in Create an Amazon Secrets Manager secret in the Amazon Secrets Manager documentation. After creating the secret, keep the Secret name,secretName
for the next step.-
When selecting Key/value pairs, create a pair for the key
user
with the valueverticaUsername
. -
When selecting Key/value pairs, create a pair for the key
password
with the valueverticaPassword
.
-
In the Amazon Glue console, create a connection by following the steps in Adding an Amazon Glue connection. After creating the connection, keep the connection name,
connectionName
, for the next step.When selecting a Connection type, select Vertica.
When selecting Vertica Host, provide the hostname of your Vertica installation.
When selecting Vertica Port, the port your Vertica installation is available through.
When selecting an Amazon Secret, provide
secretName
.
-
In the following situations, you may require additional configuration:
-
For Vertica instances hosted on Amazon in an Amazon VPC
-
Provide Amazon VPC connection information to the Amazon Glue connection that defines your Vertica security credentials. When creating or updating your connection, set VPC, Subnet and Security groups in Network options.
-
-
After creating a Amazon Glue Vertica connection, you will need to perform the following steps before calling your connection method.
Grant the IAM role associated with your Amazon Glue job permissions to
tempS3Path
.Grant the IAM role associated with your Amazon Glue job permission to read
secretName
.In your Amazon Glue job configuration, provide
connectionName
as an Additional network connection.
Reading from Vertica
Prerequisites:
-
A Vertica table you would like to read from. You will need the Vertica database name,
dbName
and the table name,tableName
. -
A Amazon Glue Vertica connection configured to provide auth information. Complete the steps in the previous procedure, To configure a connection to Vertica to configure your auth information. You will need the name of the Amazon Glue connection,
connectionName
. -
A Amazon S3 bucket or folder to use for temporary storage, mentioned previously. You will need the name,
tempS3Path
. You will need to connect to this location using thes3a
protocol.
For example:
dynamicFrame = glueContext.create_dynamic_frame.from_options( connection_type="vertica", connection_options={ "connectionName": "
connectionName
", "staging_fs_url": "s3a://tempS3Path
", "db": "dbName
", "table": "tableName
", } )
You can also provide a SELECT SQL query, to filter the results returned to your DynamicFrame or to access a dataset from multiple tables.
For example:
dynamicFrame = glueContext.create_dynamic_frame.from_options( connection_type="vertica", connection_options={ "connectionName": "
connectionName
", "staging_fs_url": "s3a://tempS3Path
", "db": "dbName
", "query": "select * FROMtableName
", }, )
Writing to Vertica tables
This example writes information from an existing DynamicFrame, dynamicFrame
to
Vertica. If the table already has information, Amazon Glue will append data from your DynamicFrame.
Prerequisites:
-
A current or desired table name,
tableName
, you would like to write to. You will also need the corresponding Vertica database name,dbName
. -
A Amazon Glue Vertica connection configured to provide auth information. Complete the steps in the previous procedure, To configure a connection to Vertica to configure your auth information. You will need the name of the Amazon Glue connection,
connectionName
. -
A Amazon S3 bucket or folder to use for temporary storage, mentioned previously. You will need the name,
tempS3Path
. You will need to connect to this location using thes3a
protocol.
For example:
glueContext.write_dynamic_frame.from_options( frame=
dynamicFrame
, connection_type="vertica", connection_options={ "connectionName": "connectionName
", "staging_fs_url": "s3a://tempS3Path
", "db": "dbName
", "table": "tableName
", } )
Vertica connection option reference
-
connectionName
— Required. Used for Read/Write. The name of a Amazon Glue Vertica connection configured to provide auth and networking information to your connection method. -
db
— Required. Used for Read/Write. The name of a database in Vertica your connection method will interact with. -
dbSchema
— Required if needed to identify your table. Used for Read/Write. Default:public
. The name of a schema your connection method will interact with. -
table
— Required for writing, required for reading unlessquery
is provided. Used for Read/Write. The name of a table your connection method will interact with. -
query
— Used for Read. A SELECT SQL query defining what should be retrieved when reading from Teradata. -
staging_fs_url
— Required. Used for Read/Write. Valid Values:s3a
URLs. The URL of a Amazon S3 bucket or folder to use for temporary storage.