Burst concurrency - Amazon Lambda
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

Burst concurrency

As your function receives more requests, Lambda automatically scales up the number of execution environments to handle these requests until you reach your account concurrency limit. However, there’s a limit to how fast Lambda can scale. In most cases, you don’t need to worry about this limitation, but there are a few special cases where you should take this into account:

  • When you deploy a new function and expect a large initial burst of traffic

  • When you expect existing functions to experience a sudden burst of traffic

In response to sudden bursts of traffic, Lambda might not scale up immediately to handle all incoming requests. This is to protect against over-scaling. Your burst concurrency quota is the maximum rate at which functions in your account can scale in response to bursts (i.e. how quickly Lambda can create new execution environments). Burst concurrency quota is an account-level limit. This section discusses how Lambda determines your burst concurrency quota, and details burst concurrency scaling behavior for specific burst scenarios.

Burst concurrency rate limits

When you deploy new functions, Lambda can immediately scale those functions up to between 500 and 3,000 execution environment instances to handle an initial burst of traffic. The exact maximum depends on the Amazon Web Services Region:

Regions Burst concurrency limit

US West (Oregon), US East (N. Virginia), Europe (Ireland)

3,000

Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio)

1,000

All other Regions

500

If you need a burst concurrency limit increase, please inquire further through Amazon Web Services Support. Service Quotas do not support changes in burst limits at this time.

Note

Your burst concurrency limit cannot exceed your account concurrency limit. For example, the initial burst concurrency limit of 3,000 in US West (Oregon), US East (N. Virginia), and Europe (Ireland) is higher than the default account concurrency limit of 1,000. So, by default, initial bursts in these three Regions can scale only up to 1,000. To take advantage of the full 3,000 units of burst concurrency available to your functions in these three Regions, request an account concurrency limit increase.

After the initial burst increase, Lambda continues to scale up your function based on the following rules:

  • If your function needs additional scaling, Lambda can scale up by a maximum of 500 additional execution environment instances (burst quota units) per minute, regardless of the Region.

  • Each minute, you continue to accrue 500 burst quota units. If your function doesn’t require this level of scaling, Lambda saves up any unused units in an imaginary “bucket”, up until the bucket reaches the maximum burst concurrency limit in your Region. For instance, your bucket can continue to accrue 500 units per minute until it reaches 3,000 units in US East (N. Virginia). When your function encounters future bursts, Lambda draws from your bucket to scale up your function.

In all cases, Lambda can continue to scale at the fastest rate available to you as long as you haven’t reached your account concurrency limit. If requests come in faster than your function can scale, or if your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).

Understanding and visualizing burst concurrency

This section contains animations of a scaling example to help you understand Lambda burst concurrency behavior. In this scenario, assume that you have active Lambda functions in the US East (N. Virginia) region, and you have an account concurrency limit of 10,000. Suppose that over the span of several minutes, your functions encounter the following traffic pattern:


        How Lambda responds to a particular traffic pattern involving several
          spikes in traffic.

In the following animations, we'll break down how Lambda scales in response to each of the three bursts. In order to fully understand the animations, here is an explanation of the various components of the graph:

Component Significance Examples

x-axis

Time

y-axis

Concurrent requests

Solid black line

The actual number of concurrent requests that your functions are handling.

  • At 8:59, your functions are handling 0 concurrent requests.

  • At 9:00, your functions are handling 2,000 concurrent requests.

Shaded blue region

The number of active execution environments that Lambda has provisioned for your functions.

  • At 8:59, Lambda has provisioned 0 execution environments.

  • At 9:00, Lambda has provisioned 2,000 execution environments.

Shaded green region

The theoretical maximum scaling capacity of your functions, given your current burst concurrency quota.

  • At 8:59, your functions can scale to 3,000 execution environments at maximum.

  • At 9:00, your functions can still scale to just 3,000 execution environments at maximum.

Orange "bucket" on the right side of the graph

The burst quota that's currently available to you.

  • At 9:07, the end of this scenario, the maximum burst quota available to you is 1,000.

Part 1: Handling burst #1 (8:58 - 9:00)

The first animation depicts how Lambda handles an initial burst of 2,000 concurrent requests.


          An animation illustrating how Lambda scales in response to burst #1.

Here's an explanation of this animation:

  • At 8:58, you have the entire burst quota of 3,000 available to you. Though your functions receive 0 requests until 9:00, Lambda can scale your functions to up to 3,000 execution environments immediately, if need be.

  • At 9:00, your functions suddenly experience a burst of 2,000 concurrent requests. Lambda uses 2,000 out of the 3,000 burst quota available to provision 2,000 execution environments. Lambda then handles all 2,000 incoming requests. You have a remaining burst quota of 1,000, which Lambda saves for potential use later.

Part 2: Accumulating unused burst quota (9:00 - 9:02)

The second animation depicts how Lambda accumulates burst quota units if they go unused.


          An animation illustrating how Lambda accumulates unused burst quota
            units.

Here's an explanation of this animation:

  • From 9:00 to 9:01, actual concurrent requests stay under 2,000, so there's no need for Lambda to continue to scale up.

  • At 9:01, you get 500 additional burst quota units. Since you don't need to use any right now, Lambda saves all 500 units, bringing your total available burst quota to 1,500. Notice how this increases the theoretical maximum scaling capacity up to 3,500.

  • The same thing occurs from 9:01 to 9:02. At 9:02, you get an additional 500 units, bringing your total available burst quota to 2,000. This increases the theoretical maximum scaling capacity to 4,000.

Part 3: Handling burst #2 (9:02 - 9:03)

The third animation depicts how Lambda uses your accumulated burst quota to handle a second burst.


          An animation illustrating how Lambda scales in response to burst #2.

Here's an explanation of this animation:

  • Shortly after 9:02, your functions suddenly experience a burst of 2,000 additional concurrent requests, or 4,000 total. Lambda uses all 2,000 of your available burst quota units to provision 2,000 more execution environments. Lambda then handles all 4,000 incoming requests. Your remaining burst quota is 0.

  • At 9:03, you again accumulate 500 units.

Part 4: Handling burst #3 and throttling (9:03 - 9:04)

The fourth animation depicts a scenario in which your functions experience temporary throttling in response to a burst.


          An animation illustrating how Lambda scales in response to burst #3.
            It also involves an example of temporary throttling.

Here's an explanation of this animation:

  • At 9:04, you again accumulate 500 units. Your available burst quota is 1,000.

  • Shortly after 9:04, your functions suddenly experience a burst of 1,500 additional concurrent requests, or 5,500 total. Lambda uses all 1,000 of your available quota to provision 1,000 more execution environments. Lambda can then handle 5,000 incoming requests, but 500 requests experience throttling.

Part 5: Recovery (9:04 - 9:05)

The fifth animation depicts how Lambda quickly uses new burst quota units to recover from temporary throttling.


          An animation illustrating how Lambda recovers from throttling.

Here's an explanation of this animation:

  • At 9:05, you again accumulate 500 units. Your available burst quota is now 500.

  • Immediately, Lambda uses these 500 units to provision 500 execution environments. This ends throttling, since Lambda can handle all 5,500 incoming requests. Your available burst quota is 0.

Part 6: Conclusion (9:05 - 9:07)

The sixth and final animation depicts how Lambda continues to accumulate burst quota units, even if they aren't needed.


          An animation illustrating how Lambda continues accumulating burst quota
            units.

Here's an explanation of this animation:

  • At 9:06, you again accumulate 500 units. Your available burst quota is now 500.

  • At 9:07, you again accumulate 500 units. Your available burst quota is now 1,000.

Note

Unused execution environments are frozen while they're waiting for requests and don't incur any charges. If they sit idle for a prolonged period of time, Lambda automatically shuts them down.