Create a cluster with Hudi installed
With Amazon EMR release version 5.28.0 and later, Amazon EMR installs Hudi components by default when Spark, Hive, or Presto is installed. To use Hudi on Amazon EMR, create a cluster with one or more of the following applications installed:
-
Hadoop
-
Hive
-
Spark
-
Presto
-
Flink
You can create a cluster using the Amazon Web Services Management Console, the Amazon CLI, or the Amazon EMR API.
Navigate to the new Amazon EMR console and select Switch to the old console from the side navigation. For more information on what to expect when you switch to the old console, see Using the old console.
-
Choose Create cluster, Go to advanced options.
-
Under Software Configuration, choose emr-5.28.0 or later for Release and select Hadoop, Hive, Spark, Presto, and Tez along with other applications that your cluster requires.
-
Configure other options as required for your application, and then choose Next.
-
Configure options for Hardware and General cluster settings as desired.
-
For Security Options, we recommend that you select an EC2 key pair that you can use to connect to the master node command line using SSH. This allows you to run the Spark shell commands, Hive CLI commands, and Hudi CLI commands described in this guide.
-
Choose other security options as desired, and then choose Create cluster.