Troubleshooting activities - Amazon Step Functions
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Troubleshooting activities

My state machine execution is stuck at an activity state.

An activity task state doesn't start until you poll a task token by using the GetActivityTask API action. As a best practice, add a task level timeout in order to avoid a stuck execution. For more information, see Use timeouts to avoid stuck executions.

If your state machine is stuck in the ActivityScheduled event, it indicates that your activity worker fleet has issues or is underscaled. You should monitor the ActivityScheduleTime CloudWatch metric and set an alarm when that time increases. However, to time out any stuck state machine executions in which the Activity state doesn't transition to the ActivityStarted state, define a timeout at state machine-level. To do this, specify a TimeoutSeconds field at the beginning of the state machine definition, outside of the States field.

My activity worker times out while waiting for a task token.

Workers use the GetActivityTask API action to retrieve a task with the specified activity ARN that is scheduled for execution by a running state machine. GetActivityTask starts a long poll, so the service holds the HTTP connection open and responds as soon as a task becomes available. The maximum time the service hold the request before responding is 60 seconds. If no task is available within 60 seconds, the poll returns a taskToken with a null string. To avoid this timeout, configure a client side socket with a timeout of at least 65 seconds in the Amazon SDK or in the client you are using to make the API call.