Considerations and limitations when using the Spark connector - Amazon EMR
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Considerations and limitations when using the Spark connector

  • We suggest that you turn on SSL for the JDBC connection from Spark on Amazon EMR to Amazon Redshift.

  • We suggest that you manage the credentials for the Amazon Redshift cluster in Amazon Secrets Manager as a best practice. Refer to Using Amazon Secrets Manager to retrieve credentials for connecting to Amazon Redshift for an example.

  • We suggest that you pass an IAM role with the parameter aws_iam_role for the Amazon Redshift authentication parameter.

  • The parameter tempformat currently doesn't support the Parquet format.

  • The tempdir URI points to an Amazon S3 location. This temp directory isn't cleaned up automatically and therefore could add additional cost.

  • Consider the following recommendations for Amazon Redshift:

  • Consider the following recommendations for Amazon S3:

    • We suggest that you block public access to Amazon S3 buckets.

    • We suggest that you use Amazon S3 server-side encryption to encrypt the Amazon S3 buckets used.

    • We suggest that you use Amazon S3 lifecycle policies to define the retention rules for the Amazon S3 bucket.

    • Amazon EMR always verifies code imported from open-source into the image. For security, we don't support the following authentication methods from Spark to Amazon S3:

      • Setting Amazon access keys in the hadoop-env configuration classification

      • Encoding Amazon access keys in the tempdir URI

For more information on using the connector and its supported parameters, see the following resources: