Burst concurrency
As your function receives more requests, Lambda automatically scales up the number of execution environments to handle these requests until you reach your account concurrency limit. However, there’s a limit to how fast Lambda can scale. In most cases, you don’t need to worry about this limitation, but there are a few special cases where you should take this into account:
-
When you deploy a new function and expect a large initial burst of traffic
-
When you expect existing functions to experience a sudden burst of traffic
In response to sudden bursts of traffic, Lambda might not scale up immediately to handle all incoming requests. This is to protect against over-scaling. Your burst concurrency quota is the maximum rate at which functions in your account can scale in response to bursts (i.e. how quickly Lambda can create new execution environments). Burst concurrency quota is an account-level limit. This section discusses how Lambda determines your burst concurrency quota, and details burst concurrency scaling behavior for specific burst scenarios.
Burst concurrency rate limits
When you deploy new functions, Lambda can immediately scale those functions up to between 500 and 3,000 execution environment instances to handle an initial burst of traffic. The exact maximum depends on the Amazon Web Services Region:
Regions | Burst concurrency limit |
---|---|
US West (Oregon), US East (N. Virginia), Europe (Ireland) |
3,000 |
Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio) |
1,000 |
All other Regions |
500 |
If you need a burst concurrency limit increase, please inquire further through Amazon Web Services Support. Service Quotas do not support changes in burst limits at this time.
Note
Your burst concurrency limit cannot exceed your account concurrency limit. For example, the initial burst concurrency limit of 3,000 in US West (Oregon), US East (N. Virginia), and Europe (Ireland) is higher than the default account concurrency limit of 1,000. So, by default, initial bursts in these three Regions can scale only up to 1,000. To take advantage of the full 3,000 units of burst concurrency available to your functions in these three Regions, request an account concurrency limit increase.
After the initial burst increase, Lambda continues to scale up your function based on the following rules:
-
If your function needs additional scaling, Lambda can scale up by a maximum of 500 additional execution environment instances (burst quota units) per minute, regardless of the Region.
-
Each minute, you continue to accrue 500 burst quota units. If your function doesn’t require this level of scaling, Lambda saves up any unused units in an imaginary “bucket”, up until the bucket reaches the maximum burst concurrency limit in your Region. For instance, your bucket can continue to accrue 500 units per minute until it reaches 3,000 units in US East (N. Virginia). When your function encounters future bursts, Lambda draws from your bucket to scale up your function.
In all cases, Lambda can continue to scale at the fastest rate available to you as long as you haven’t reached your account concurrency limit. If requests come in faster than your function can scale, or if your function is at maximum concurrency, additional requests fail with a throttling error (429 status code).
Understanding and visualizing burst concurrency
This section contains animations of a scaling example to help you understand Lambda burst concurrency behavior. In this scenario, assume that you have active Lambda functions in the US East (N. Virginia) region, and you have an account concurrency limit of 10,000. Suppose that over the span of several minutes, your functions encounter the following traffic pattern:

In the following animations, we'll break down how Lambda scales in response to each of the three bursts. In order to fully understand the animations, here is an explanation of the various components of the graph:
Component | Significance | Examples |
---|---|---|
x-axis |
Time |
|
y-axis |
Concurrent requests |
|
Solid black line |
The actual number of concurrent requests that your functions are handling. |
|
Shaded blue region |
The number of active execution environments that Lambda has provisioned for your functions. |
|
Shaded green region |
The theoretical maximum scaling capacity of your functions, given your current burst concurrency quota. |
|
Orange "bucket" on the right side of the graph |
The burst quota that's currently available to you. |
|
Part 1: Handling burst #1 (8:58 - 9:00)
The first animation depicts how Lambda handles an initial burst of 2,000 concurrent requests.

Here's an explanation of this animation:
-
At 8:58, you have the entire burst quota of 3,000 available to you. Though your functions receive 0 requests until 9:00, Lambda can scale your functions to up to 3,000 execution environments immediately, if need be.
-
At 9:00, your functions suddenly experience a burst of 2,000 concurrent requests. Lambda uses 2,000 out of the 3,000 burst quota available to provision 2,000 execution environments. Lambda then handles all 2,000 incoming requests. You have a remaining burst quota of 1,000, which Lambda saves for potential use later.
Part 2: Accumulating unused burst quota (9:00 - 9:02)
The second animation depicts how Lambda accumulates burst quota units if they go unused.

Here's an explanation of this animation:
-
From 9:00 to 9:01, actual concurrent requests stay under 2,000, so there's no need for Lambda to continue to scale up.
-
At 9:01, you get 500 additional burst quota units. Since you don't need to use any right now, Lambda saves all 500 units, bringing your total available burst quota to 1,500. Notice how this increases the theoretical maximum scaling capacity up to 3,500.
-
The same thing occurs from 9:01 to 9:02. At 9:02, you get an additional 500 units, bringing your total available burst quota to 2,000. This increases the theoretical maximum scaling capacity to 4,000.
Part 3: Handling burst #2 (9:02 - 9:03)
The third animation depicts how Lambda uses your accumulated burst quota to handle a second burst.

Here's an explanation of this animation:
-
Shortly after 9:02, your functions suddenly experience a burst of 2,000 additional concurrent requests, or 4,000 total. Lambda uses all 2,000 of your available burst quota units to provision 2,000 more execution environments. Lambda then handles all 4,000 incoming requests. Your remaining burst quota is 0.
-
At 9:03, you again accumulate 500 units.
Part 4: Handling burst #3 and throttling (9:03 - 9:04)
The fourth animation depicts a scenario in which your functions experience temporary throttling in response to a burst.

Here's an explanation of this animation:
-
At 9:04, you again accumulate 500 units. Your available burst quota is 1,000.
-
Shortly after 9:04, your functions suddenly experience a burst of 1,500 additional concurrent requests, or 5,500 total. Lambda uses all 1,000 of your available quota to provision 1,000 more execution environments. Lambda can then handle 5,000 incoming requests, but 500 requests experience throttling.
Part 5: Recovery (9:04 - 9:05)
The fifth animation depicts how Lambda quickly uses new burst quota units to recover from temporary throttling.

Here's an explanation of this animation:
-
At 9:05, you again accumulate 500 units. Your available burst quota is now 500.
-
Immediately, Lambda uses these 500 units to provision 500 execution environments. This ends throttling, since Lambda can handle all 5,500 incoming requests. Your available burst quota is 0.
Part 6: Conclusion (9:05 - 9:07)
The sixth and final animation depicts how Lambda continues to accumulate burst quota units, even if they aren't needed.

Here's an explanation of this animation:
-
At 9:06, you again accumulate 500 units. Your available burst quota is now 500.
-
At 9:07, you again accumulate 500 units. Your available burst quota is now 1,000.
Note
Unused execution environments are frozen while they're waiting for requests and don't incur any charges. If they sit idle for a prolonged period of time, Lambda automatically shuts them down.