Considerations and limitations when using the Spark connector
- 
                    
We suggest that you turn on SSL for the JDBC connection from Spark on Amazon EMR to Amazon Redshift.
 - 
                    
We suggest that you manage the credentials for the Amazon Redshift cluster in Amazon Secrets Manager as a best practice. Refer to Using Amazon Secrets Manager to retrieve credentials for connecting to Amazon Redshift for an example.
 - 
                    
We suggest that you pass an IAM role with the parameter
aws_iam_rolefor the Amazon Redshift authentication parameter. - 
                    
The parameter
tempformatcurrently doesn't support the Parquet format. - 
                    
The
tempdirURI points to an Amazon S3 location. This temp directory isn't cleaned up automatically and therefore could add additional cost. - 
                    
Consider the following recommendations for Amazon Redshift:
- 
                            
We suggest that you block public access to the Amazon Redshift cluster.
 - 
                            
We suggest that you turn on Amazon Redshift audit logging.
 - 
                            
We suggest that you turn on Amazon Redshift at-rest encryption.
 
 - 
                            
 - 
                    
Consider the following recommendations for Amazon S3:
- 
                            
We suggest that you block public access to Amazon S3 buckets.
 - 
                            
We suggest that you use Amazon S3 server-side encryption to encrypt the Amazon S3 buckets used.
 - 
                            
We suggest that you use Amazon S3 lifecycle policies to define the retention rules for the Amazon S3 bucket.
 - 
                            
Amazon EMR always verifies code imported from open-source into the image. For security, we don't support the following authentication methods from Spark to Amazon S3:
- 
                                    
Setting Amazon access keys in the
hadoop-envconfiguration classification - 
                                    
Encoding Amazon access keys in the
tempdirURI 
 - 
                                    
 
 - 
                            
 
For more information on using the connector and its supported parameters, see the following resources:
- 
                    
Amazon Redshift integration for Apache Spark in the Amazon Redshift Management Guide
 - 
                    
The
spark-redshiftcommunity repositoryon Github