Common errors when running jobs
The following errors may occur when you run StartJobRun
API.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
error: argument -- |
Required parameters are missing. | Add the missing arguments to the API request. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: ARN is not authorized to perform:
emr-containers:StartJobRun |
Execution role is missing. | See Using Using job execution roles with Amazon EMR on EKS. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: |
Caller doesn't have permission to the execution role [valid / not valid format] via condition keys. |
See Using job execution roles with Amazon EMR on EKS. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: |
Job submitter and Execution role ARN are from different accounts. |
Ensure that job submitter and execution role ARN are from the same Amazon account. |
1 validation error detected: Value |
Caller has permissions for the execution role via condition keys, but the role does not satisfy the constraints of ARN format. |
Provide the execution role following the ARN format. See Using job execution roles with Amazon EMR on EKS. |
An error occurred (ResourceNotFoundException) when calling the StartJobRun
operation: Virtual cluster |
Virtual cluster ID is not found. |
Provide a virtual cluster ID registered with Amazon EMR on EKS. |
An error occurred (ValidationException) when calling the StartJobRun
operation: Virtual cluster state |
Virtual cluster is not ready to execute job. |
See Virtual cluster states. |
An error occurred (ResourceNotFoundException) when calling the StartJobRun
operation: Release |
The release specified in job submission is incorrect. |
See Amazon EMR on EKS releases. |
An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: An error occurred (AccessDeniedException) when calling the StartJobRun
operation: User: |
User is not authorized to call StartJobRun. | See Using job execution roles with Amazon EMR on EKS. |
An error occurred (ValidationException) when calling the StartJobRun operation: configurationOverrides.monitoringConfiguration.s3MonitoringConfiguration.logUri failed to satisfy constraint : %s |
S3 path URI syntax is not valid. |
logUri should be in the format of s3://... |
The following errors may occur when you run DescribeJobRun
API before the
job runs.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
stateDetails: JobRun submission failed. Classification failureReason: VALIDATION_ERROR state: FAILED. |
Parameters in StartJobRun are not valid. | See Amazon EMR on EKS releases. |
stateDetails: Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
The EKS cluster is not available. | Check if the EKS cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
Amazon EMR does not have permissions to access the EKS cluster. |
Verify that permissions are set up for Amazon EMR on the registered namespace. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
EKS cluster is not reachable. |
Check if EKS Cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: JobRun submission failed due to an internal error. failureReason: INTERNAL_ERROR state: FAILED |
An internal error has occurred with the EKS cluster. |
N/A |
stateDetails: Cluster failureReason: USER_ERROR state: FAILED |
There are insufficient resources in the EKS cluster to run the job. |
Add more capacity to the EKS node group or set up EKS Autoscaler. For more
information, see Cluster Autoscaler |
The following errors may occur when you run DescribeJobRun
API after the
job runs.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
stateDetails: Trouble monitoring your JobRun. Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
The EKS cluster does not exist. | Check if EKS Cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Trouble monitoring your JobRun. Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
Amazon EMR does not have permissions to access the EKS cluster. | Verify that permissions are set up for Amazon EMR on the registered namespace. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Trouble monitoring your JobRun. Cluster failureReason: CLUSTER_UNAVAILABLE state: FAILED |
The EKS cluster is not reachable. |
Check if EKS Cluster exists and has the right permissions. For more information, see Setting up Amazon EMR on EKS. |
stateDetails: Trouble monitoring your JobRun due to an internal error failureReason: INTERNAL_ERROR state: FAILED |
An internal error has occurred and is preventing JobRun monitoring. |
N/A |
The following error may occur when a job cannot start and the job waits in the SUBMITTED state for 15 minutes. This can be caused by a lack of cluster resources.
Error Message | Error Condition | Recommended Next Step |
---|---|---|
cluster timeout |
The job has been in the SUBMITTED state for 15 minutes or more. | You can override the default setting of 15 minutes for this parameter with the configuration override shown below. |
Use the following configuration to change the cluster timeout setting to 30 minutes.
Notice that you provide the new job-start-timeout
value in seconds:
{ "configurationOverrides": { "applicationConfiguration": [{ "classification": "emr-containers-defaults", "properties": { "job-start-timeout":"1800" } }] }