Working with Amazon SQS messages - Amazon Simple Queue Service
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Working with Amazon SQS messages

The following guidelines can help you process messages efficiently using Amazon SQS.

Processing messages in a timely manner

Setting the visibility timeout depends on how long it takes your application to process and delete a message. For example, if your application requires 10 seconds to process a message and you set the visibility timeout to 15 minutes, you must wait for a relatively long time to attempt to process the message again if the previous processing attempt fails. Alternatively, if your application requires 10 seconds to process a message but you set the visibility timeout to only 2 seconds, a duplicate message is received by another consumer while the original consumer is still working on the message.

To make sure that there is sufficient time to process messages, use one of the following strategies:

  • If you know (or can reasonably estimate) how long it takes to process a message, extend the message's visibility timeout to the maximum time it takes to process and delete the message. For more information, see Configuring the Visibility Timeout.

  • If you don't know how long it takes to process a message, create a heartbeat for your consumer process: Specify the initial visibility timeout (for example, 2 minutes) and then—as long as your consumer still works on the message—keep extending the visibility timeout by 2 minutes every minute.

    Important

    The maximum visibility timeout is 12 hours from the time that Amazon SQS receives the ReceiveMessage request. Extending the visibility timeout does not reset the 12 hour maximum.

    Additionally, you may be unable to set the timeout on an individual message to the full 12 hours (e.g. 43,200 seconds) since the ReceiveMessage request initiates the timer. For example, if you receive a message and immediately set the 12 hour maximum by sending a ChangeMessageVisibility call with VisibilityTimeout equal to 43,200 seconds, it will likely fail. However, using a value of 43,195 seconds will work unless there is a significant delay between requesting the message via ReceiveMessage and updating the visibility timeout. If your consumer needs longer than 12 hours, consider using Step Functions.

Handling request errors

To handle request errors, use one of the following strategies:

  • If you use an Amazon SDK, you already have automatic retry and backoff logic at your disposal. For more information, see Error Retries and Exponential Backoff in Amazon in the Amazon Web Services General Reference.

  • If you don't use the Amazon SDK features for retry and backoff, allow a pause (for example, 200 ms) before retrying the ReceiveMessage action after receiving no messages, a timeout, or an error message from Amazon SQS. For subsequent use of ReceiveMessage that gives the same results, allow a longer pause (for example, 400 ms).

Setting up long polling

When the wait time for the ReceiveMessage API action is greater than 0, long polling is in effect. The maximum long polling wait time is 20 seconds. Long polling helps reduce the cost of using Amazon SQS by eliminating the number of empty responses (when there are no messages available for a ReceiveMessage request) and false empty responses (when messages are available but aren't included in a response). For more information, see Amazon SQS short and long polling.

For optimal message processing, use the following strategies:

  • In most cases, you can set the ReceiveMessage wait time to 20 seconds. If 20 seconds is too long for your application, set a shorter ReceiveMessage wait time (1 second minimum). If you don't use an Amazon SDK to access Amazon SQS, or if you configure an Amazon SDK to have a shorter wait time, you might have to modify your Amazon SQS client to either allow longer requests or use a shorter wait time for long polling.

  • If you implement long polling for multiple queues, use one thread for each queue instead of a single thread for all queues. Using a single thread for each queue allows your application to process the messages in each of the queues as they become available, while using a single thread for polling multiple queues might cause your application to become unable to process messages available in other queues while the application waits (up to 20 seconds) for the queue which doesn't have any available messages.

Important

To avoid HTTP errors, make sure that the HTTP response timeout for ReceiveMessage requests is longer than the WaitTimeSeconds parameter. For more information, see ReceiveMessage.

Capturing problematic messages

To capture all messages that can't be processed, and to collect accurate CloudWatch metrics, configure a dead-letter queue.

  • The redrive policy redirects messages to a dead-letter queue after the source queue fails to process a message a specified number of times.

  • Using a dead-letter queue decreases the number of messages and reduces the possibility of exposing you to poison pill messages (messages that are received but can't be processed).

  • Including a poison pill message in a queue can distort the ApproximateAgeOfOldestMessage CloudWatch metric by giving an incorrect age of the poison pill message. Configuring a dead-letter queue helps avoid false alarms when using this metric.

Setting up dead-letter queue retention

For standard queues, the expiration of a message is always based on its original enqueue timestamp. When a message is moved to a dead-letter queue, the enqueue timestamp is unchanged. The ApproximateAgeOfOldestMessage metric indicates when the message moved to the dead-letter queue, not when the message was originally sent. For example, assume that a message spends 1 day in the original queue before it's moved to a dead-letter queue. If the dead-letter queue's retention period is 4 days, the message is deleted from the dead-letter queue after 3 days and the ApproximateAgeOfOldestMessage is 3 days. Thus, it is a best practice to always set the retention period of a dead-letter queue to be longer than the retention period of the original queue.

For FIFO queues, the enqueue timestamp resets when the message is moved to a dead-letter queue. The ApproximateAgeOfOldestMessage metric indicates when the message moved to the dead-letter queue. In the same example above, the message is deleted from the dead-letter queue after 4 days and the ApproximateAgeOfOldestMessage is 4 days.

Avoiding inconsistent message processing

Because Amazon SQS is a distributed system, it is possible for a consumer to not receive a message even when Amazon SQS marks the message as delivered while returning successfully from a ReceiveMessage API method call. In this case, Amazon SQS records the message as delivered at least once, although the consumer has never received it. Because no additional attempts to deliver messages are made under these conditions, we don't recommend setting the number of maximum receives to 1 for a dead-letter queue.

Implementing request-response systems

When implementing a request-response or remote procedure call (RPC) system, keep the following best practices in mind:

  • Don't create reply queues per message. Instead, create reply queues on startup, per producer, and use a correlation ID message attribute to map replies to requests.

  • Don't let your producers share reply queues. This can cause a producer to receive response messages intended for another producer.

For more information about implementing the request-response pattern using the Temporary Queue Client, see Request-response messaging pattern (virtual queues).