Step 1: Create the Amazon S3 bucket, download the required tools, and configure the environment
In this step, you download the external tools and create and configure the Amazon
resources required for the automated data export solution of an Amazon Keyspaces table to an Amazon S3
bucket using an Amazon Glue job. To perform all these tasks in an efficient way, we
run a shell script with the name setup-connector.sh
available on
Github
The script setup-connector.sh
automates the following steps.
Creates an Amazon S3 bucket using Amazon CloudFormation. This bucket stores the downloaded jar and configuration files, as well as the exported table data.
Creates an IAM role using Amazon CloudFormation. Amazon Glue jobs use this role to access Amazon Keyspaces and Amazon S3.
Downloads the Apache Spark Cassandra Connector
and uploads it to the Amazon S3 bucket. Downloads the SigV4 Authentication plugin
and uploads it to the Amazon S3 bucket. Downloads the Apache Spark Extensions
and uploads them to the Amazon S3 bucket. Downloads the Keyspaces Retry Policy
from Github, compiles the code using Maven, and uploads the output to the Amazon S3 bucket. Uploads the
keyspaces-application.conf
file to the Amazon S3 bucket.
Use the setup-connector.sh
shell script to automate the setup and configuration steps.
Copy the files from the aws-glue
repository on Github to your local machine. This directory contains the shell script as well as other required files. -
Run the shell script
setup-connector.sh
. You can specify the following three optional parameters.SETUP_STACKNAME
– This is the name of the Amazon CloudFormation stack used to create the Amazon resources.S3_BUCKET_NAME
– This is the name of the Amazon S3 bucket.GLUE_SERVICE_ROLE_NAME
– This is the name of the IAM service role that Amazon Glue uses to run jobs that connect to Amazon Keyspaces and Amazon S3.
You can use the following command to run the shell script, provide the three parameters with the following names.
./setup-connector.sh
cfn-setup
s3-keyspaces
iam-export-role
To confirm that your bucket was created, you can use the following Amazon CLI command.
aws s3 ls s3://s3-keyspaces
The output of the command should look like this.
PRE conf/ PRE jars/
To confirm that the IAM role was created and to review the details, you can use the following Amazon CLI statement.
aws iam get-role --role-name "iam-export-role"
{ "Role": { "Path": "/", "RoleName": "iam-export-role", "RoleId": "AKIAIOSFODNN7EXAMPLE", "Arn": "arn:aws:iam::1111-2222-3333:role/iam-export-role", "CreateDate": "2025-01-28T16:09:03+00:00", "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "glue.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }, "Description": "AWS Glue service role to import and export data from Amazon Keyspaces", "MaxSessionDuration": 3600, "RoleLastUsed": { "LastUsedDate": "2025-01-29T12:03:54+00:00", "Region": "us-east-1" } } }
If the Amazon CloudFormation stack process fails, you can review the detailed error information about the failed stack in the Amazon CloudFormation console.
After the Amazon S3 bucket containing all scripts and tools has been created and the IAM role is configured, proceed to Step 2: Configure the Amazon Glue job that exports the Amazon Keyspaces table.