Troubleshooting activities
My state machine execution is stuck at an activity state.
An activity task state doesn't start until you poll a task token by using the GetActivityTask API action. As a best practice, add a task level timeout in order to avoid a stuck execution. For more information, see Use timeouts to avoid stuck executions.
If your state machine is stuck in the ActivityScheduled event, it indicates that your activity worker fleet has issues or is underscaled. You should monitor the ActivityScheduleTime CloudWatch metric and set an alarm when that time increases. However, to time out any stuck state machine executions in which the Activity
state doesn't transition to the ActivityStarted
state, define a timeout at state machine-level. To do this, specify a TimeoutSeconds
field at the beginning of the state machine definition, outside of the States
field.
My activity worker times out while waiting for a task token.
Workers use the GetActivityTask API action to retrieve a task with the specified activity
ARN that is scheduled for execution by a running state machine.
GetActivityTask
starts a long poll, so the service holds the HTTP
connection open and responds as soon as a task becomes available. The maximum time the
service hold the request before responding is 60 seconds. If no task is available within
60 seconds, the poll returns a taskToken
with a null string. To avoid this
timeout, configure a client side socket with a timeout of at least 65
seconds in the Amazon SDK or in the client you are using to make the API call.