Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions,
see Getting Started with Amazon Web Services in China
(PDF).
Launching a Spark application with the
Amazon Redshift integration for Apache Spark
To use the integration with EMR Serverless 6.9.0, you must pass the required
Spark-Redshift dependencies with your Spark job. Use --jars
to
include Redshift connector related libraries. To see other file locations supported
by the --jars
option, see the Advanced Dependency Management section of the Apache Spark
documentation.
-
spark-redshift.jar
-
spark-avro.jar
-
RedshiftJDBC.jar
-
minimal-json.jar
Amazon EMR releases 6.10.0 and higher don't require the minimal-json.jar
dependency, and automatically install the other dependencies to each cluster by
default. The following examples show how to launch a Spark application with the
Amazon Redshift integration for Apache Spark.
- Amazon EMR 6.10.0 +
-
Launch a Spark job on Amazon EMR Serverless with the Amazon Redshift integration for Apache Spark on
EMR Serverless release 6.10.0 and higher.
spark-submit my_script.py
- Amazon EMR 6.9.0
-
To launch a Spark job on Amazon EMR Serverless with the Amazon Redshift integration for Apache Spark
on EMR Serverless release 6.9.0, use the --jars
option as shown in the following example. Note that the paths listed
with the --jars
option are the default paths for
the JAR files.
--jars
/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar,
/usr/share/aws/redshift/spark-redshift/lib/spark-redshift.jar,
/usr/share/aws/redshift/spark-redshift/lib/spark-avro.jar,
/usr/share/aws/redshift/spark-redshift/lib/minimal-json.jar
spark-submit \
--jars /usr/share/aws/redshift/jdbc/RedshiftJDBC.jar,/usr/share/aws/redshift/spark-redshift/lib/spark-redshift.jar,/usr/share/aws/redshift/spark-redshift/lib/spark-avro.jar,/usr/share/aws/redshift/spark-redshift/lib/minimal-json.jar \
my_script.py