Amazon IoT Jobs Troubleshooting - Amazon IoT Core
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Amazon IoT Jobs Troubleshooting

This is the troubleshooting section for Amazon IoT Jobs.

How do I locate an Amazon IoT Jobs endpoint?

How do I locate the Amazon IoT Jobs control plane endpoint?

Amazon IoT Jobs supports controls plane API operations using the HTTPS protocol. Verify you have connected to the correct control plane endpoint using the HTTPS protocol.

For a list of Amazon region-specific endpoints, see Amazon IoT Core - control plane endpoints.

For a list of FIPS compliant Amazon IoT Jobs control plane endpoints, see FIPS Endpoints by Service

Note

Amazon IoT Jobs and Amazon IoT Core share the same Amazon Region-specific endpoints.

How do I locate the Amazon IoT Jobs data plane endpoint?

Amazon IoT Jobs supports data plane API operations using the HTTPS and MQTT protocols. Verify you have connected to the correct data plane endpoint using the HTTPS or MQTT protocol.

  • HTTPS protocol

    • Use the following describe-endpoint CLI command shown below or the DescribeEndpoint REST API. For the endpoint type, use iot:Jobs.

      aws iot describe-endpoint --endpoint-type iot:Jobs
  • MQTT protocol

    • Use the following describe-endpoint CLI command shown below or the DescribeEndpoint REST API. For the endpoint type, use iot:Data-ATS (recommended) or iot:Data.

      aws iot describe-endpoint --endpoint-type iot:Data-ATS (recommended)
      aws iot describe-endpoint --endpoint-type iot:Data

For a list of FIPS compliant Amazon IoT Jobs data plane endpoints, see FIPS Endpoints by Service

How do I monitor Amazon IoT Jobs activity and provide metrics?

Monitoring Amazon IoT Jobs activity using Amazon CloudWatch provides real-time visibility into ongoing Amazon IoT Jobs operations and helps control costs with CloudWatch alarms via Amazon IoT Rules. You must configure logging before you can monitor Amazon IoT Jobs activity and setup CloudWatch alarms. For more information on setting up logging, see Configure Amazon IoT logging.

For more information on Amazon CloudWatch and how to setup permission via an IAM user role to use CloudWatch resources, see Identity and access management for Amazon CloudWatch.

How do I set up Amazon IoT Jobs metrics and monitoring using Amazon CloudWatch?

To set up Amazon IoT logging, follow the steps outlined in Configure Amazon IoT logging. Amazon IoT logging set up can be done in the Amazon Web Services Management Console, Amazon CLI, or API. Amazon IoT logging set up for specific thing groups must be done in the Amazon CLI or API only.

The Amazon IoT Jobs metrics section contains the Amazon IoT Jobs metrics used for monitoring Amazon IoT Jobs activity. It explains how to view the metrics in the Amazon Web Services Management Console and Amazon CLI.

Additionally, you can set up CloudWatch alarms to alert you of specific metrics you want to closely monitor. For guidance on alarm setup, see Using Amazon CloudWatch alarms.

Device fleets and single device troubleshooting

A job execution maintains a status of QUEUED indefinitely

When a job execution with a status state of QUEUED does not proceed to the next logical status state such as IN_PROGRESS, FAILED, or TIMED_OUT, one of the following scenarios may be the cause:

  • Review your device activity in the CloudWatch logs located in the CloudWatch console. For more information, refer to Monitor Amazon IoT using CloudWatch Logs.

  • The IAM role associated with the job and subsequent job execution may not have the correct permissions listed in one of the policy statements of the IAM policy attached to that IAM role. Use the describe-job API to identify the IAM role linked to that job and subsequent job execution and review the IAM policy for correct permissions. Once the policy permission statements have been updated, you should be able to perform the AssumeRole API command on the resource.

A job execution was not created for my thing or thing group

When a job updates its status state to IN_PROGRESS, it will begin the job document rollout to all devices in your target group. This status state update will create a job execution for each target device. If a job execution was not created for one of the target devices, refer to the following guidance:

  • Is the thing directly targeted by the job, the job has a status state of IN_PROGRESS, and the job is concurrent? If all three conditions are met, then the job is still sending out job executions to all devices in your target group and that specific thing has not received its job execution yet.

    • Review the devices in your target group for the job and the job status state in the Amazon Management Console or use the describe-job API command.

    • Use the describe-job API command to review if the job has the IsConcurrent property set to true or false. For more information, see Job limits.

  • The thing is not directly targeted by the job.

    • If the Thing was added to a ThingGroup and the job targeted the ThingGroup, then verify the Thing is part of the ThingGroup.

    • If the job is a snapshot job with a status state of IN_PROGRESS and is concurrent, then the job is still sending out job executions to all devices in your target group and that specific Thing has not received its job execution yet.

    • If the job is a continuous job with a status state of IN_PROGRESS and is concurrent, then the job is still sending out job executions to all devices in your target group and that specific Thing has not received its job execution yet. For continuous jobs only, you can also remove the Thing from the ThingGroup and then add the Thing back to the ThingGroup.

    • If the job is a snapshot job with a status state of IN_PROGRESS and is not concurrent, then it's likely the Thing or ThingGroup membership relationship is not acknowledged by Amazon IoT Jobs. It is recommended to add several seconds of waiting time after your AddThingToThingGroup call before you create your Job. Alternatively, you can switch the target selection to Continuous, thus making the service backfill the delayed Thing and ThingGroup membership attachment event.

New job fails due to LimitedExceededException error

If your job creation fails with an error response of LimitedExceededException, then call the list-jobs API and review all jobs with isConcurrent=true to determine if you are at your job concurrency limit. See Job limits for additional information on concurrent jobs. To view your job concurrency limits and to request a limit increase, see Amazon IoT Device Management jobs limits and quotas.

Job document size limit

The job document size is limited by the MQTT payload size. If you need a job document larger than 32 kB (kilobytes), 32,000 B (bytes), then create and store the job document in Amazon S3 and add an Amazon S3 object URL in the documentSource field for the CreateJob API or using the Amazon CLI. For the Amazon Web Services Management Console, add an Amazon S3 object URL in the Amazon S3 URL text box when creating a job.

Device Side MQTT message requests throttle limits

If you receive an error code 400 ThrottlingException, the device side MQTT message failed due to reaching the limit of simultaneous device side requests. See Amazon IoT Device Management jobs limits and quotas for more information on throttle limits and if it is adjustable.

Connection timeout error

An error code 400 RequestExpired indicates a connection failure due to high latency or low client side timeout values.

Invalid API command

Confirm the correct API command is entered to avoid an error message stating the API command is invalid. See the Amazon IoT API Reference for a comprehensive list of all Amazon IoT API commands.

Service side connection error

An error code 503 ServiceUnavailable indicates the error originated from the server side.