Step 5: Check for suspended groups

An instance group becomes suspended when it encounters too many errors while trying to launch nodes. For example, if new nodes repeatedly fail while performing bootstrap actions, the instance group will — after some time — go into the SUSPENDED state rather than continuously attempt to provision new nodes.

A node could fail to come up if:

Hadoop or the cluster is somehow broken and does not accept a new node into the cluster
A bootstrap action fails on the new node
The node is not functioning correctly and fails to check in with Hadoop

If an instance group is in the SUSPENDED state, and the cluster is in a WAITING state, you can add a cluster step to reset the desired number of core and task nodes. Adding the step resumes processing of the cluster and put the instance group back into a RUNNING state.

For more information about how to reset a cluster in a suspended state, see Suspended state.

Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Step 4: Check Amazon EMR cluster and instance health

Step 6: Review configuration settings for the Amazon EMR cluster