Considerations and limitations when using the Spark connector

The Spark connector supports a variety of ways to manage credentials, to configure security, and to connect with other Amazon services. Get familiar with the recommendations in this list in order to configure a functional and resilient connection.

We recommend that you activate SSL for the JDBC connection from Spark on Amazon EMR to Amazon Redshift.
We recommend that you manage the credentials for the Amazon Redshift cluster in Amazon Secrets Manager as a best practice. See Using Amazon Secrets Manager to retrieve credentials for connecting to Amazon Redshift for an example.
We recommend that you pass an IAM role with the parameter aws_iam_role for the Amazon Redshift authentication parameter.
The parameter tempformat currently doesn't support the Parquet format.
The tempdir URI points to an Amazon S3 location. This temp directory isn't cleaned up automatically and therefore could add additional cost.
Consider the following recommendations for Amazon Redshift:
- We recommend that you block public access to the Amazon Redshift cluster.
- We recommend that you turn on Amazon Redshift audit logging.
- We recommend turn on Amazon Redshift at-rest encryption.
Consider the following recommendations for Amazon S3:
- We recommend blocking public access to Amazon S3 buckets.
- We recommend that you use Amazon S3 server-side encryption to encrypt the S3 buckets that you use.
- We recommend that you use Amazon S3 lifecycle policies to define the retention rules for the S3 bucket.
- Amazon EMR always verifies code imported from open-source into the image. For security, we don't support encoding Amazon access keys in the tempdir URI as an authentication method from Spark to Amazon S3.

For more information on using the connector and its supported parameters, see the following resources:

Amazon Redshift integration for Apache Spark in the Amazon Redshift Management Guide
The spark-redshift community repository on Github

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Read and write to Amazon Redshift

Using Volcano