Use timeouts to avoid stuck executions
By default, the Amazon States Language doesn't specify timeouts for state machine definitions. Without an explicit timeout, Step Functions often relies solely on a response from an activity worker to know that
a task is complete. If something goes wrong and the TimeoutSeconds
field isn't specified for an Activity
or Task
state, an execution is stuck
waiting for a response that will never come.
To avoid this situation, specify a reasonable timeout when you create a Task
in your state machine. For example:
"ActivityState": { "Type": "Task", "Resource": "arn:aws-cn:states:us-east-1:123456789012:activity:HelloWorld", "TimeoutSeconds": 300, "Next": "NextState" }
If you use a callback with a task token
(.waitForTaskToken), we recommend that you use heartbeats and add the HeartbeatSeconds
field in your Task
state definition. You can set HeartbeatSeconds
to be less than the task timeout so if your workflow fails with a heartbeat error then you know it's because of the task failure instead of the task taking a long time to complete.
{ "StartAt": "Push to SQS", "States": { "Push to SQS": { "Type": "Task", "Resource": "arn:aws-cn:states:::sqs:sendMessage.waitForTaskToken", "HeartbeatSeconds": 600, "Parameters": { "MessageBody": { "myTaskToken.$": "$$.Task.Token" }, "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/push-based-queue" }, "ResultPath": "$.SQS", "End": true } } }
For more information, see Task in the Amazon States Language documentation.
Note
You can set a timeout for your state machine using the TimeoutSeconds
field in your Amazon States Language definition. For more information, see State machine structure.