Create bootstrap actions to install additional software with an Amazon EMR cluster
You can use a bootstrap action to install additional software or customize the configuration of cluster instances. Bootstrap actions are scripts that run on cluster after Amazon EMR launches the instance using the Amazon Linux Amazon Machine Image (AMI). Bootstrap actions run before Amazon EMR installs the applications that you specify when you create the cluster and before cluster nodes begin processing data. If you add nodes to a running cluster, bootstrap actions also run on those nodes in the same way. You can create custom bootstrap actions and specify them when you create your cluster.
Most predefined bootstrap actions for Amazon EMR AMI versions 2.x and 3.x are not
supported in Amazon EMR releases 4.x. For example, configure-Hadoop
and
configure-daemons
are not supported in Amazon EMR release 4.x.
Instead, Amazon EMR release 4.x natively provides this functionality. For more
information about how to migrate bootstrap actions from Amazon EMR AMI versions 2.x
and 3.x to Amazon EMR release 4.x, go to
Customizing cluster and application configuration with earlier AMI versions of Amazon EMR in the Amazon EMR Release Guide.
Bootstrap action basics
Bootstrap actions execute as the Hadoop user by default. You can execute a bootstrap
action with root privileges by using sudo
.
All Amazon EMR management interfaces support bootstrap actions. You can specify up to 16
bootstrap actions per cluster by providing multiple bootstrap-actions
parameters from the console, Amazon CLI, or API.
From the Amazon EMR console, you can optionally specify a bootstrap action while creating a cluster.
When you use the CLI, you can pass references to bootstrap action scripts to Amazon EMR by adding
the --bootstrap-actions
parameter when you create the cluster using the
create-cluster
command.
--bootstrap-actions Path="s3://
amzn-s3-demo-bucket
/filename
",Args=[arg1
,arg2
]
If the bootstrap action returns a nonzero error code, Amazon EMR treats it as a failure and
terminates the instance. If too many instances fail their bootstrap actions, then Amazon EMR
terminates the cluster. If just a few instances fail, Amazon EMR attempts to reallocate the
failed instances and continue. Use the cluster lastStateChangeReason
error
code to identify failures caused by a bootstrap action.
Conditionally run a bootstrap action
In order to only run a bootstrap actions on the master node, you can use a custom bootstrap action with some logic to determine if the node is master.
#!/bin/bash if grep isMaster /mnt/var/lib/info/instance.json | grep false; then echo "This is not master node, do nothing,exiting" exit 0 fi echo "This is master, continuing to execute script" # continue with code logic for master node below
The following output will print from a core node.
This is not master node, do nothing, exiting
The following output will print from master node.
This is master, continuing to execute script
To use this logic, upload your bootstrap action, including the above code, to your Amazon S3
bucket. On the Amazon CLI, add the --bootstrap-actions
parameter to the aws emr
create-cluster
API call and specify your bootstrap script location as the
value of Path
.
Shutdown actions
A bootstrap action script can create one or more shutdown actions by writing
scripts to the
/mnt/var/lib/instance-controller/public/shutdown-actions/
directory. When a cluster is terminated, all the scripts in this directory are
executed in parallel. Each script must run and complete within 60 seconds.
Shutdown action scripts are not guaranteed to run if the node terminates with an error.
Note
When using Amazon EMR versions 4.0 and later, you must manually create the /mnt/var/lib/instance-controller/public/shutdown-actions/
directory on the master node. It doesn't exist by default; however, after being created, scripts in this directory nevertheless run before shutdown. For more information about connecting to the Master node to create directories, see Connect to the Amazon EMR cluster primary node using
SSH.
Use custom bootstrap actions
You can create a custom script to perform a customized bootstrap action. Any of the Amazon EMR interfaces can reference a custom bootstrap action.
Note
For the best performance, we recommend that you store custom bootstrap actions, scripts, and other files that you want to use with Amazon EMR in an Amazon S3 bucket that is in the same Amazon Web Services Region as your cluster.
Contents
Add custom bootstrap actions
Use a custom bootstrap action to copy an object from Amazon S3 to each node
You can use a bootstrap action to copy objects from Amazon S3 to each node in a cluster before your applications are installed. The Amazon CLI is installed on each node of a cluster, so your bootstrap action can call Amazon CLI commands.
The following example demonstrates a simple bootstrap action script that copies a file, myfile.jar
, from Amazon S3 to a local folder, /mnt1/myfolder
, on each cluster node. The script is saved to Amazon S3 with the file name copymyfile.sh
with the following contents.
#!/bin/bash aws s3 cp s3://amzn-s3-demo-bucket/myfilefolder/myfile.jar /mnt1/myfolder
When you launch the cluster, you specify the script. The following Amazon CLI example demonstrates this:
aws emr create-cluster --name "Test cluster" --release-label
emr-7.5.0
\ --use-default-roles --ec2-attributes KeyName=myKey \ --applications Name=Hive Name=Pig \ --instance-count 3 --instance-type m5.xlarge \ --bootstrap-actions Path="s3://amzn-s3-demo-bucket/myscriptfolder/copymyfile.sh"