

# Understanding Lambda function scaling
Function scaling

**Concurrency** is the number of in-flight requests that your Amazon Lambda function is handling at the same time. For each concurrent request, Lambda provisions a separate instance of your execution environment. As your functions receive more requests, Lambda automatically handles scaling the number of execution environments until you reach your account's concurrency limit. By default, Lambda provides your account with a total concurrency limit of 1,000 concurrent executions across all functions in an Amazon Web Services Region. To support your specific account needs, you can [request a quota increase](https://aws.amazon.com/premiumsupport/knowledge-center/lambda-concurrency-limit-increase/) and configure function-level concurrency controls so that your critical functions don't experience throttling.

This topic explains concurrency concepts and function scaling in Lambda. By the end of this topic, you'll be able to understand how to calculate concurrency, visualize the two main concurrency control options (reserved and provisioned), estimate appropriate concurrency control settings, and view metrics for further optimization.

**Topics**
+ [

## Understanding and visualizing concurrency
](#understanding-concurrency)
+ [

## Calculating concurrency for a function
](#calculating-concurrency)
+ [

## Understanding reserved concurrency and provisioned concurrency
](#reserved-and-provisioned)
+ [

## Understanding concurrency and requests per second
](#concurrency-vs-requests-per-second)
+ [

## Concurrency quotas
](#concurrency-quotas)
+ [

# Configuring reserved concurrency for a function
](configuration-concurrency.md)
+ [

# Configuring provisioned concurrency for a function
](provisioned-concurrency.md)
+ [

# Lambda scaling behavior
](scaling-behavior.md)
+ [

# Monitoring concurrency
](monitoring-concurrency.md)

## Understanding and visualizing concurrency


Lambda invokes your function in a secure and isolated [execution environment](lambda-runtime-environment.md). To handle a request, Lambda must first initialize an execution environment (the [Init phase](lambda-runtime-environment.md#runtimes-lifecycle-ib)), before using it to invoke your function (the [Invoke phase](lambda-runtime-environment.md#runtimes-lifecycle-invoke)):

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-1-environment.png)


**Note**  
Actual Init and Invoke durations can vary depending on many factors, such as the runtime you choose and the Lambda function code. The previous diagram isn't meant to represent the exact proportions of Init and Invoke phase durations.

The previous diagram uses a rectangle to represent a single execution environment. When your function receives its very first request (represented by the yellow circle with label `1`), Lambda creates a new execution environment and runs the code outside your main handler during the Init phase. Then, Lambda runs your function's main handler code during the Invoke phase. During this entire process, this execution environment is busy and cannot process other requests.

When Lambda finishes processing the first request, this execution environment can then process additional requests for the same function. For subsequent requests, Lambda doesn't need to re-initialize the environment.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-2-two-requests.png)


In the previous diagram, Lambda reuses the execution environment to handle the second request (represented by the yellow circle with label `2`).

So far, we've focused on just a single instance of your execution environment (that is, a concurrency of 1). In practice, Lambda may need to provision multiple execution environment instances in parallel to handle all incoming requests. When your function receives a new request, one of two things can happen:
+ If a pre-initialized execution environment instance is available, Lambda uses it to process the request.
+ Otherwise, Lambda creates a new execution environment instance to process the request.

For example, let's explore what happens when your function receives 10 requests:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-3-ten-requests.png)


In the previous diagram, each horizontal plane represents a single execution environment instance (labeled from `A` through `F`). Here's how Lambda handles each request:


| Request | Lambda behavior | Reasoning | 
| --- | --- | --- | 
|  1  |  Provisions new environment **A**  |  This is the first request; no execution environment instances are available.  | 
|  2  |  Provisions new environment **B**  |  Existing execution environment instance **A** is busy.  | 
|  3  |  Provisions new environment **C**  |  Existing execution environment instances **A** and **B** are both busy.  | 
|  4  |  Provisions new environment **D**  |  Existing execution environment instances **A**, **B**, and **C** are all busy.  | 
|  5  |  Provisions new environment **E**  |  Existing execution environment instances **A**, **B**, **C**, and **D** are all busy.  | 
|  6  |  Reuses environment **A**  |  Execution environment instance **A** has finished processing request **1** and is now available.  | 
|  7  |  Reuses environment **B**  |  Execution environment instance **B** has finished processing request **2** and is now available.  | 
|  8  |  Reuses environment **C**  |  Execution environment instance **C** has finished processing request **3** and is now available.  | 
|  9  |  Provisions new environment **F**  |  Existing execution environment instances **A**, **B**, **C**, **D**, and **E** are all busy.  | 
|  10  |  Reuses environment **D**  |  Execution environment instance **D** has finished processing request **4** and is now available.  | 

As your function receives more concurrent requests, Lambda scales up the number of execution environment instances in response. The following animation tracks the number of concurrent requests over time:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-4-animation.gif)


By freezing the previous animation at six distinct points in time, we get the following diagram:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-5-animation-summary.png)


In the previous diagram, we can draw a vertical line at any point in time and count the number of environments that intersect this line. This gives us the number of concurrent requests at that point in time. For example, at time `t1`, there are three active environments serving three concurrent requests. The maximum number of concurrent requests in this simulation occurs at time `t4`, when there are six active environments serving six concurrent requests.

To summarize, your function's concurrency is the number of concurrent requests that it's handling at the same time. In response to an increase in your function's concurrency, Lambda provisions more execution environment instances to meet request demand.

## Calculating concurrency for a function


In general, concurrency of a system is the ability to process more than one task simultaneously. In Lambda, concurrency is the number of in-flight requests that your function is handling at the same time. A quick and practical way of measuring concurrency of a Lambda function is to use the following formula:

```
Concurrency = (average requests per second) * (average request duration in seconds)
```

**Concurrency differs from requests per second.** For example, suppose your function receives 100 requests per second on average. If the average request duration is one second, then it's true that the concurrency is also 100:

```
Concurrency = (100 requests/second) * (1 second/request) = 100
```

However, if the average request duration is 500 ms, then the concurrency is 50:

```
Concurrency = (100 requests/second) * (0.5 second/request) = 50
```

What does a concurrency of 50 mean in practice? If the average request duration is 500 ms, then you can think of an instance of your function as being able to handle two requests per second. Then, it takes 50 instances of your function to handle a load of 100 requests per second. A concurrency of 50 means that Lambda must provision 50 execution environment instances to efficiently handle this workload without any throttling. Here's how to express this in equation form:

```
Concurrency = (100 requests/second) / (2 requests/second) = 50
```

If your function receives double the number of requests (200 requests per second), but only requires half the time to process each request (250 ms), then the concurrency is still 50:

```
Concurrency = (200 requests/second) * (0.25 second/request) = 50
```

### Test your understanding of concurrency


Suppose you have a function that takes, on average, 200 ms to run. During peak load, you observe 5,000 requests per second. What is the concurrency of your function during peak load? 

#### Answer


The average function duration is 200 ms, or 0.2 seconds. Using the concurrency formula, you can plug in the numbers to get a concurrency of 1,000:

```
Concurrency = (5,000 requests/second) * (0.2 seconds/request) = 1,000
```

Alternatively, an average function duration of 200 ms means that your function can process 5 requests per second. To handle the 5,000 request per second workload, you need 1,000 execution environment instances. Thus, the concurrency is 1,000:

```
Concurrency = (5,000 requests/second) / (5 requests/second) = 1,000
```

## Understanding reserved concurrency and provisioned concurrency


By default, your account has a concurrency limit of 1,000 concurrent executions across all functions in a Region. Your functions share this pool of 1,000 concurrency on an on-demand basis. Your functions experience throttling (that is, they start to drop requests) if you run out of available concurrency.

Some of your functions might be more critical than others. As a result, you might want to configure concurrency settings to ensure that critical functions get the concurrency that they need. There are two types of concurrency controls available: reserved concurrency and provisioned concurrency.
+ Use **reserved concurrency** to set both the maximum and minimum number of concurrent instances to reserve a portion of your account's concurrency for a function. This is useful if you don't want other functions taking up all the available unreserved concurrency. When a function has reserved concurrency, no other function can use that concurrency. 
+ Use **provisioned concurrency** to pre-initialize a number of environment instances for a function. This is useful for reducing cold start latencies.

### Reserved concurrency


If you want to guarantee that a certain amount of concurrency is available for your function at any time, use reserved concurrency.

Reserved concurrency sets the maximum and minimum number of concurrent instances that you want to allocate to your function. When you dedicate reserved concurrency to a function, no other function can use that concurrency. In other words, setting reserved concurrency can impact the concurrency pool that's available to other functions. Functions that don't have reserved concurrency share the remaining pool of unreserved concurrency.

Configuring reserved concurrency counts towards your overall account concurrency limit. There is no charge for configuring reserved concurrency for a function.

To better understand reserved concurrency, consider the following diagram:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-6-reserved-concurrency.png)


In this diagram, your account concurrency limit for all the functions in this Region is at the default limit of 1,000. Suppose you have two critical functions, `function-blue` and `function-orange`, that routinely expect to get high invocation volumes. You decide to give 400 units of reserved concurrency to `function-blue`, and 400 units of reserved concurrency to `function-orange`. In this example, all other functions in your account must share the remaining 200 units of unreserved concurrency.

The diagram has five points of interest:
+ At `t1`, both `function-orange` and `function-blue` begin receiving requests. Each function begins to use up its allocated portion of reserved concurrency units.
+ At `t2`, `function-orange` and `function-blue` steadily receive more requests. At the same time, you deploy some other Lambda functions, which begin receiving requests. You don't allocate reserved concurrency to these other functions. They begin using the remaining 200 units of unreserved concurrency.
+ At `t3`, `function-orange` hits the max concurrency of 400. Although there is unused concurrency elsewhere in your account, `function-orange` cannot access it. The red line indicates that `function-orange` is experiencing throttling, and Lambda may drop requests.
+ At `t4`, `function-orange` starts to receive fewer requests and is no longer throttling. However, your other functions experience a spike in traffic and begin throttling. Although there is unused concurrency elsewhere in your account, these other functions cannot access it. The red line indicates that your other functions are experiencing throttling.
+ At `t5`, other functions start to receive fewer requests and are no longer throttling.

From this example, notice that reserving concurrency has the following effects:
+ **Your function can scale independently of other functions in your account.** All of your account's functions in the same Region that don't have reserved concurrency share the pool of unreserved concurrency. Without reserved concurrency, other functions can potentially use up all of your available concurrency. This prevents critical functions from scaling up if needed.
+ **Your function can't scale out of control.** Reserved concurrency caps your function's maximum and minimum concurrency. This means that your function can't use concurrency reserved for other functions, or concurrency from the unreserved pool. Additionally, reserved concurrency acts as both a lower and upper bound - it reserves the specified capacity exclusively for your function while also preventing it from scaling beyond that limit. You can reserve concurrency to prevent your function from using all the available concurrency in your account, or from overloading downstream resources.
+ **You may not be able to use all of your account's available concurrency.** Reserving concurrency counts towards your account concurrency limit, but this also means that other functions cannot use that chunk of reserved concurrency. If your function doesn't use up all of the concurrency that you reserve for it, you're effectively wasting that concurrency. This isn't an issue unless other functions in your account could benefit from the wasted concurrency.

To learn how to manage reserved concurrency settings for your functions, see [Configuring reserved concurrency for a function](configuration-concurrency.md).

### Provisioned concurrency


You use reserved concurrency to define the maximum number of execution environments reserved for a Lambda function. However, none of these environments come pre-initialized. As a result, your function invocations may take longer because Lambda must first initialize the new environment before being able to use it to invoke your function. When Lambda has to initialize a new environment in order to carry out an invocation, this is known as a [cold start](lambda-runtime-environment.md#cold-start-latency). To mitigate cold starts, you can use provisioned concurrency.

Provisioned concurrency is the number of pre-initialized execution environments that you want to allocate to your function. If you set provisioned concurrency on a function, Lambda initializes that number of execution environments so that they are prepared to respond immediately to function requests.

**Note**  
Using provisioned concurrency incurs additional charges to your account. If you're working with the Java 11 or Java 17 runtimes, you can also use Lambda SnapStart to mitigate cold start issues at no additional cost. SnapStart uses cached snapshots of your execution environment to significantly improve startup performance. You cannot use both SnapStart and provisioned concurrency on the same function version. For more information about SnapStart features, limitations, and supported Regions, see [Improving startup performance with Lambda SnapStart](snapstart.md).

When using provisioned concurrency, Lambda still recycles execution environments in the background. For example, this can occur [after an invocation failure](lambda-runtime-environment.md#runtimes-lifecycle-invoke-with-errors). However, at any given time, Lambda always ensures that the number of pre-initialized environments is equal to the value of your function's provisioned concurrency setting. Importantly, even if you're using provisioned concurrency, you can still experience a cold start delay if Lambda has to reset the execution environment.

In contrast, when using reserved concurrency, Lambda may completely terminate an environment after a period of inactivity. The following diagram illustrates this by comparing the lifecycle of a single execution environment when you configure your function using reserved concurrency compared to provisioned concurrency.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-7-reserved-vs-provisioned.png)


The diagram has four points of interest:


| Time | Reserved concurrency | Provisioned concurrency | 
| --- | --- | --- | 
|  t1  |  Nothing happens.  |  Lambda pre-initializes an execution environment instance.  | 
|  t2  |  Request 1 comes in. Lambda must initialize a new execution environment instance.  |  Request 1 comes in. Lambda uses the pre-initialized environment instance.  | 
|  t3  |  After some inactivity, Lambda terminates the active environment instance.  |  Nothing happens.  | 
|  t4  |  Request 2 comes in. Lambda must initialize a new execution environment instance.  |  Request 2 comes in. Lambda uses the pre-initialized environment instance.  | 

To better understand provisioned concurrency, consider the following diagram:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-8-provisioned-concurrency.png)


In this diagram, you have an account concurrency limit of 1,000. You decide to give 400 units of provisioned concurrency to `function-orange`. All functions in your account, *including* `function-orange`, can use the remaining 600 units of unreserved concurrency.

The diagram has five points of interest:
+ At `t1`, `function-orange` begins receiving requests. Since Lambda has pre-initialized 400 execution environment instances, `function-orange` is ready for immediate invocation.
+ At `t2`, `function-orange` reaches 400 concurrent requests. As a result, `function-orange` runs out of provisioned concurrency. However, since there's still unreserved concurrency available, Lambda can use this to handle additional requests to `function-orange` (there's no throttling). Lambda must create new instances to serve these requests, and your function may experience cold start latencies.
+ At `t3`, `function-orange` returns to 400 concurrent requests after a brief spike in traffic. Lambda is again able to handle all requests without cold start latencies.
+ At `t4`, functions in your account experience a burst in traffic. This burst can come from `function-orange` or any other function in your account. Lambda uses unreserved concurrency to handle these requests.
+ At `t5`, functions in your account reach the maximum concurrency limit of 1,000, and experience throttling.

The previous example considered only provisioned concurrency. In practice, you can set both provisioned concurrency and reserved concurrency on a function. You might do this if you had a function that handles a consistent load of invocations on weekdays, but routinely sees spikes of traffic on weekends. In this case, you could use provisioned concurrency to set a baseline amount of environments to handle request during weekdays, and use reserved concurrency to handle the weekend spikes. Consider the following diagram:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-9-reserved-and-provisioned.png)


In this diagram, suppose that you configure 200 units of provisioned concurrency and 400 units of reserved concurrency for `function-orange`. Because you configured reserved concurrency, `function-orange` cannot use any of the 600 units of unreserved concurrency.

This diagram has five points of interest:
+ At `t1`, `function-orange` begins receiving requests. Since Lambda has pre-initialized 200 execution environment instances, `function-orange` is ready for immediate invocation.
+ At `t2`, `function-orange` uses up all its provisioned concurrency. `function-orange` can continue serving requests using reserved concurrency, but these requests may experience cold start latencies.
+ At `t3`, `function-orange` reaches 400 concurrent requests. As a result, `function-orange` uses up all its reserved concurrency. Since `function-orange` cannot use unreserved concurrency, requests begin to throttle.
+ At `t4`, `function-orange` starts to receive fewer requests, and no longer throttles.
+ At `t5`, `function-orange` drops down to 200 concurrent requests, so all requests are again able to use provisioned concurrency (that is, no cold start latencies).

Both reserved concurrency and provisioned concurrency count towards your account concurrency limit and [Regional quotas](gettingstarted-limits.md). In other words, allocating reserved and provisioned concurrency can impact the concurrency pool that's available to other functions. Configuring provisioned concurrency incurs charges to your Amazon Web Services account.

**Note**  
If the amount of provisioned concurrency on a function's versions and aliases adds up to the function's reserved concurrency, then all invocations run on provisioned concurrency. This configuration also has the effect of throttling the unpublished version of the function (`$LATEST`), which prevents it from executing. You can't allocate more provisioned concurrency than reserved concurrency for a function.

To manage provisioned concurrency settings for your functions, see [Configuring provisioned concurrency for a function](provisioned-concurrency.md). To automate provisioned concurrency scaling based on a schedule or application utilization, see [Using Application Auto Scaling to automate provisioned concurrency management](provisioned-concurrency.md#managing-provisioned-concurency).

### How Lambda allocates provisioned concurrency


Provisioned concurrency doesn't come online immediately after you configure it. Lambda starts allocating provisioned concurrency after a minute or two of preparation. For each function, Lambda can provision up to 6,000 execution environments every minute, regardless of Amazon Web Services Region. This is exactly the same as the [concurrency scaling rate](scaling-behavior.md#scaling-rate) for functions.

When you submit a request to allocate provisioned concurrency, you can't access any of those environments until Lambda completely finishes allocating them. For example, if you request 5,000 provisioned concurrency, none of your requests can use provisioned concurrency until Lambda completely finishes allocating the 5,000 execution environments.

### Comparing reserved concurrency and provisioned concurrency


The following table summarizes and compares reserved and provisioned concurrency.


| Topic | Reserved concurrency | Provisioned concurrency | 
| --- | --- | --- | 
|  Definition  |  Maximum number of execution environment instances for your function.  |  Set number of pre-provisioned execution environment instances for your function.  | 
|  Provisioning behavior  |  Lambda provisions new instances on an on-demand basis.  |  Lambda pre-provisions instances (that is, before your function starts receiving requests).  | 
|  Cold start behavior  |  Cold start latency possible, since Lambda must create new instances on-demand.  |  Cold start latency not possible, since Lambda doesn't have to create instances on-demand.  | 
|  Throttling behavior  |  Function throttled when reserved concurrency limit reached.  |  If reserved concurrency not set: function uses unreserved concurrency when provisioned concurrency limit reached. If reserved concurrency set: function throttled when reserved concurrency limit reached.  | 
|  Default behavior if not set  |  Function uses unreserved concurrency available in your account.  |  Lambda doesn't pre-provision any instances. Instead, if reserved concurrency not set: function uses unreserved concurrency available in your account. If reserved concurrency set: function uses reserved concurrency.  | 
|  Pricing  |  No additional charge.  |  Incurs additional charges.  | 

## Understanding concurrency and requests per second


As mentioned in the previous section, concurrency differs from requests per second. This is an especially important distinction to make when working with functions that have an average request duration of less than 100 ms.

Across all functions in your account, Lambda enforces a requests per second limit that's equal to 10 times your account concurrency. For example, since the default account concurrency limit is 1,000, functions in your account can handle a maximum of 10,000 requests per second.

For example, consider a function with an average request duration of 50 ms. At 20,000 requests per second, here's the concurrency of this function:

```
Concurrency = (20,000 requests/second) * (0.05 second/request) = 1,000
```

Based on this result, you might expect that the account concurrency limit of 1,000 is sufficient to handle this load. However, because of the 10,000 requests per second limit, your function can only handle 10,000 requests per second out of the 20,000 total requests. This function experiences throttling.

The lesson is that you must consider both concurrency and requests per second when configuring concurrency settings for your functions. In this case, you need to request an account concurrency limit increase to 2,000, since this would increase your total requests per second limit to 20,000.

**Note**  
Based on this request per second limit, it's incorrect to say that each Lambda execution environment can handle only a maximum of 10 requests per second. Instead of observing the load on any individual execution environment, Lambda only considers overall concurrency and overall requests per second when calculating your quotas.

### Test your understanding of concurrency (sub-100 ms functions)


Suppose that you have a function that takes, on average, 20 ms to run. During peak load, you observe 30,000 requests per second. What is the concurrency of your function during peak load?

#### Answer


The average function duration is 20 ms, or 0.02 seconds. Using the concurrency formula, you can plug in the numbers to get a concurrency of 600:

```
Concurrency = (30,000 requests/second) * (0.02 seconds/request) = 600
```

By default, the account concurrency limit of 1,000 seems sufficient to handle this load. However, the requests per second limit of 10,000 isn't enough to handle the incoming 30,000 requests per second. To fully accommodate the 30,000 requests, you need to request an account concurrency limit increase to 3,000 or higher.

The requests per second limit applies to all quotas in Lambda that involve concurrency. In other words, it applies to synchronous on-demand functions, functions that use provisioned concurrency, and [concurrency scaling behavior](scaling-behavior.md). For example, here are a few scenarios where you must carefully consider both your concurrency and request per second limits:
+ A function using on-demand concurrency can experience a burst increase of 500 concurrency every 10 seconds, or by 5,000 requests per second every 10 seconds, whichever happens first.
+ Suppose you have a function that has a provisioned concurrency allocation of 10. This function spills over into on-demand concurrency after 10 concurrency or 100 requests per second, whichever happens first.

## Concurrency quotas


Lambda sets quotas for the total amount of concurrency that you can use across all functions in a Region. These quotas exist on two levels:
+ **At the account level**, your functions can have up to 1,000 units of concurrency by default. To increase this limit, see [Requesting a quota increase](https://docs.amazonaws.cn/servicequotas/latest/userguide/request-quota-increase.html) in the *Service Quotas User Guide*.
+ **At the function level**, you can reserve up to 900 units of concurrency across all your functions by default. Regardless of your total account concurrency limit, Lambda always reserves 100 units of concurrency for your functions that don't explicitly reserve concurrency. For example, if you increased your account concurrency limit to 2,000, then you can reserve up to 1,900 units of concurrency at the function level.
+ At both the account level and the function level, Lambda also enforces a requests per second limit of equal to 10 times the corresponding concurrency quota. For instance, this applies to account-level concurrency, functions using on-demand concurrency, functions using provisoned concurrency, and [concurrency scaling behavior](scaling-behavior.md). For more information, see [Understanding concurrency and requests per second](#concurrency-vs-requests-per-second).

To check your current account level concurrency quota, use the Amazon Command Line Interface (Amazon CLI) to run the following command:

```
aws lambda get-account-settings
```

You should see output that looks like the following:

```
{
    "AccountLimit": {
        "TotalCodeSize": 80530636800,
        "CodeSizeUnzipped": 262144000,
        "CodeSizeZipped": 52428800,
        "ConcurrentExecutions": 1000,
        "UnreservedConcurrentExecutions": 900
    },
    "AccountUsage": {
        "TotalCodeSize": 410759889,
        "FunctionCount": 8
    }
}
```

`ConcurrentExecutions` is your total account-level concurrency quota. `UnreservedConcurrentExecutions` is the amount of reserved concurrency that you can still allocate to your functions.

As your function receives more requests, Lambda automatically scales up the number of execution environments to handle these requests until your account reaches its concurrency quota. However, to protect against over-scaling in response to sudden bursts of traffic, Lambda limits how fast your functions can scale. This ** concurrency scaling rate** is the maximum rate at which functions in your account can scale in response to increased requests. (That is, how quickly Lambda can create new execution environments.) The concurrency scaling rate differs from the account-level concurrency limit, which is the total amount of concurrency available to your functions.

**In each Amazon Web Services Region, and for each function, your concurrency scaling rate is 1,000 execution environment instances every 10 seconds (or 10,000 requests per second every 10 seconds).** In other words, every 10 seconds, Lambda can allocate at most 1,000 additional execution environment instances, or accommodate 10,000 additional requests per second, to each of your functions.

Usually, you don't need to worry about this limitation. Lambda's scaling rate is sufficient for most use cases.

Importantly, the concurrency scaling rate is a function-level limit. This means that each function in your account can scale independently of other functions.

For more information about scaling behavior, see [Lambda scaling behavior](scaling-behavior.md).

# Configuring reserved concurrency for a function
Configuring reserved concurrency

In Lambda, [concurrency](lambda-concurrency.md) is the number of in-flight requests that your function is currently handling. There are two types of concurrency controls available:
+ Reserved concurrency – This sets both the maximum and minimum number of concurrent instances allocated to your function. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency is useful for ensuring that your most critical functions always have enough concurrency to handle incoming requests. Additionally, reserved concurrency can be used for limiting concurrency to prevent overwhelming downstream resources, like database connections. Reserved concurrency acts as both a lower and upper bound - it reserves the specified capacity exclusively for your function while also preventing it from scaling beyond that limit. Configuring reserved concurrency for a function incurs no additional charges.
+ Provisioned concurrency – This is the number of pre-initialized execution environments allocated to your function. These execution environments are ready to respond immediately to incoming function requests. Provisioned concurrency is useful for reducing cold start latencies for functions and designed to make functions available with double-digit millisecond response times. Generally, interactive workloads benefit the most from the feature. Those are applications with users initiating requests, such as web and mobile applications, and are the most sensitive to latency. Asynchronous workloads, such as data processing pipelines, are often less latency sensitive and so do not usually need provisioned concurrency. Configuring provisioned concurrency incurs additional charges to your Amazon Web Services account.

This topic details how to manage and configure reserved concurrency. For a conceptual overview of these two types of concurrency controls, see [Reserved concurrency and provisioned concurrency](https://docs.amazonaws.cn/lambda/latest/dg/lambda-concurrency.html#reserved-and-provisioned). For information on configuring provisioned concurrency, see [Configuring provisioned concurrency for a function](provisioned-concurrency.md).

**Note**  
Lambda functions linked to an Amazon MQ event source mapping have a default maximum concurrency. For Apache Active MQ, the maximum number of concurrent instances is 5. For Rabbit MQ, the maximum number of concurrent instances is 1. Setting reserved or provisioned concurrency for your function doesn't change these limits. To request an increase in the default maximum concurrency when using Amazon MQ, contact Amazon Web Services Support.

**Topics**
+ [

## Configuring reserved concurrency
](#configuring-concurrency-reserved)
+ [

## Accurately estimating required reserved concurrency for a function
](#estimating-reserved-concurrency)

## Configuring reserved concurrency


You can configure reserved concurrency settings for a function using the Lambda console or the Lambda API.

**To reserve concurrency for a function (console)**

1. Open the [Functions page](https://console.amazonaws.cn/lambda/home#/functions) of the Lambda console.

1. Choose the function you want to reserve concurrency for.

1. Choose **Configuration** and then choose **Concurrency**.

1. Under **Concurrency**, choose **Edit**. 

1. Choose **Reserve concurrency**. Enter the amount of concurrency to reserve for the function.

1. Choose **Save**.

You can reserve up to the **Unreserved account concurrency** value minus 100. The remaining 100 units of concurrency are for functions that aren't using reserved concurrency. For example, if your account has a concurrency limit of 1,000, you cannot reserve all 1,000 units of concurrency to a single function.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-reserve-over-limit.png)


Reserving concurrency for a function impacts the concurrency pool that's available to other functions. For example, if you reserve 100 units of concurrency for `function-a`, other functions in your account must share the remaining 900 units of concurrency, even if `function-a` doesn't use all 100 reserved concurrency units.

To intentionally throttle a function, set its reserved concurrency to 0. This stops your function from processing any events until you remove the limit.

To configure reserved concurrency with the Lambda API, use the following API operations.
+ [PutFunctionConcurrency](https://docs.amazonaws.cn/lambda/latest/api/API_PutFunctionConcurrency.html)
+ [GetFunctionConcurrency](https://docs.amazonaws.cn/lambda/latest/api/API_GetFunctionConcurrency.html)
+ [DeleteFunctionConcurrency](https://docs.amazonaws.cn/lambda/latest/api/API_DeleteFunctionConcurrency.html)

For example, to configure reserved concurrency with the Amazon Command Line Interface (CLI), use the `put-function-concurrency` command. The following command reserves 100 concurrency units for a function named `my-function`:

```
aws lambda put-function-concurrency --function-name my-function \
    --reserved-concurrent-executions 100
```

You should see output that looks like the following:

```
{
    "ReservedConcurrentExecutions": 100
}
```

## Accurately estimating required reserved concurrency for a function


If your function is currently serving traffic, you can easily view its concurrency metrics using [CloudWatch metrics](https://docs.amazonaws.cn/AmazonCloudWatch/latest/monitoring/working_with_metrics.html). Specifically, the `ConcurrentExecutions` metric shows you the number of concurrent invocations for each function in your account.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-concurrent-executions-metrics.png)


The previous graph suggests that this function serves an average of 5 to 10 concurrent requests at any given time, and peaks at 20 requests on a typical day. Suppose that there are many other functions in your account. ** If this function is critical to your application and you don't want to drop any requests**, use a number greater than or equal to 20 as your reserved concurrency setting.

Alternatively, recall that you can also [ calculate concurrency](https://docs.amazonaws.cn/lambda/latest/dg/lambda-concurrency.html#calculating-concurrency) using the following formula:

```
Concurrency = (average requests per second) * (average request duration in seconds)
```

Multiplying average requests per second with the average request duration in seconds gives you a rough estimate of how much concurrency you need to reserve. You can estimate average requests per second using the `Invocation` metric, and the average request duration in seconds using the `Duration` metric. See [Using CloudWatch metrics with Lambda](monitoring-metrics.md) for more details.

You should also be familiar with your upstream and downstream throughput constraints. While Lambda functions scale seamlessly with load, upstream and downstream dependencies may not have the same throughput capabilities. If you need to limit how high your function can scale, configure reserved concurrency on your function.

# Configuring provisioned concurrency for a function
Configuring provisioned concurrency

In Lambda, [concurrency](lambda-concurrency.md) is the number of in-flight requests that your function is currently handling. There are two types of concurrency controls available:
+ Reserved concurrency – This sets both the maximum and minimum number of concurrent instances allocated to your function. When a function has reserved concurrency, no other function can use that concurrency. Reserved concurrency is useful for ensuring that your most critical functions always have enough concurrency to handle incoming requests. Additionally, reserved concurrency can be used for limiting concurrency to prevent overwhelming downstream resources, like database connections. Reserved concurrency acts as both a lower and upper bound - it reserves the specified capacity exclusively for your function while also preventing it from scaling beyond that limit. Configuring reserved concurrency for a function incurs no additional charges.
+ Provisioned concurrency – This is the number of pre-initialized execution environments allocated to your function. These execution environments are ready to respond immediately to incoming function requests. Provisioned concurrency is useful for reducing cold start latencies for functions and designed to make functions available with double-digit millisecond response times. Generally, interactive workloads benefit the most from the feature. Those are applications with users initiating requests, such as web and mobile applications, and are the most sensitive to latency. Asynchronous workloads, such as data processing pipelines, are often less latency sensitive and so do not usually need provisioned concurrency. Configuring provisioned concurrency incurs additional charges to your Amazon Web Services account.

This topic details how to manage and configure provisioned concurrency. For a conceptual overview of these two types of concurrency controls, see [ Reserved concurrency and provisioned concurrency](https://docs.amazonaws.cn/lambda/latest/dg/lambda-concurrency.html#reserved-and-provisioned). For more information on configuring reserved concurrency, see [Configuring reserved concurrency for a function](configuration-concurrency.md).

**Note**  
Lambda functions linked to an Amazon MQ event source mapping have a default maximum concurrency. For Apache Active MQ, the maximum number of concurrent instances is 5. For Rabbit MQ, the maximum number of concurrent instances is 1. Setting reserved or provisioned concurrency for your function doesn't change these limits. To request an increase in the default maximum concurrency when using Amazon MQ, contact Amazon Web Services Support.

**Topics**
+ [

## Configuring provisioned concurrency
](#configuring-provisioned-concurrency)
+ [

## Accurately estimating required provisioned concurrency for a function
](#estimating-provisioned-concurrency)
+ [

## Optimizing function code when using provisioned concurrency
](#optimizing-latency)
+ [

## Using environment variables to view and control provisioned concurrency behavior
](#pc-environment-variables)
+ [

## Understanding logging and billing behavior with provisioned concurrency
](#pc-logging-behavior)
+ [

## Using Application Auto Scaling to automate provisioned concurrency management
](#managing-provisioned-concurency)

## Configuring provisioned concurrency


You can configure provisioned concurrency settings for a function using the Lambda console or the Lambda API.

**To allocate provisioned concurrency for a function (console)**

1. Open the [Functions page](https://console.amazonaws.cn/lambda/home#/functions) of the Lambda console.

1. Choose the function you want to allocate provisioned concurrency for.

1. Choose **Configuration** and then choose **Concurrency**.

1. Under **Provisioned concurrency configurations**, choose **Add configuration**.

1. Choose the qualifier type, and alias or version.
**Note**  
You cannot use provisioned concurrency with the \$1LATEST version of any function.  
If your function has an event source, make sure that event source points to the correct function alias or version. Otherwise, your function won't use provisioned concurrency environments.

1. Enter a number under **Provisioned concurrency**.

1. Choose **Save**.

You can configure up to the **Unreserved account concurrency** in your account, minus 100. The remaining 100 units of concurrency are for functions that aren't using reserved concurrency. For example, if your account has a concurrency limit of 1,000, and you haven't assigned any reserved or provisioned concurrency to any of your other functions, you can configure a maximum of 900 provisioned concurrency units for a single function.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/provisioned-concurrency-over-limit.png)


Configuring provisioned concurrency for a function has an impact on the concurrency pool available to other functions. For instance, if you configure 100 units of provisioned concurrency for `function-a`, other functions in your account must share the remaining 900 units of concurrency. This is true even if `function-a` doesn't use all 100 units.

It's possible to allocate both reserved concurrency and provisioned concurrency for the same function. In such cases, the provisioned concurrency cannot exceed the reserved concurrency.

This limitation extends to function versions. The maximum provisioned concurrency you can assign to a specific function version is the function's reserved concurrency minus the provisioned concurrency on other function versions.

To configure provisioned concurrency with the Lambda API, use the following API operations.
+ [PutProvisionedConcurrencyConfig](https://docs.amazonaws.cn/lambda/latest/api/API_PutProvisionedConcurrencyConfig.html)
+ [GetProvisionedConcurrencyConfig](https://docs.amazonaws.cn/lambda/latest/api/API_GetProvisionedConcurrencyConfig.html)
+ [ListProvisionedConcurrencyConfigs](https://docs.amazonaws.cn/lambda/latest/api/API_ListProvisionedConcurrencyConfigs.html)
+ [DeleteProvisionedConcurrencyConfig](https://docs.amazonaws.cn/lambda/latest/api/API_DeleteProvisionedConcurrencyConfig.html)

For example, to configure provisioned concurrency with the Amazon Command Line Interface (CLI), use the `put-provisioned-concurrency-config` command. The following command allocates 100 units of provisioned concurrency for the `BLUE` alias of a function named `my-function`:

```
aws lambda put-provisioned-concurrency-config --function-name my-function \
  --qualifier BLUE \
  --provisioned-concurrent-executions 100
```

You should see output that looks like the following:

```
{
  "Requested ProvisionedConcurrentExecutions": 100,
  "Allocated ProvisionedConcurrentExecutions": 0,
  "Status": "IN_PROGRESS",
  "LastModified": "2023-01-21T11:30:00+0000"
}
```

## Accurately estimating required provisioned concurrency for a function


You can view any active function's concurrency metrics using [CloudWatch metrics](https://docs.amazonaws.cn/AmazonCloudWatch/latest/monitoring/working_with_metrics.html). Specifically, the `ConcurrentExecutions` metric shows you the number of concurrent invocations for functions in your account.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-concurrent-executions-metrics.png)


The previous graph suggests that this function serves an average of 5 to 10 concurrent requests at any given time, and peaks at 20 requests. Suppose that there are many other functions in your account. ** If this function is critical to your application and you need a low-latency response on every invocation**, configure at least 20 units of provisioned concurrency.

Recall that you can also [ calculate concurrency](https://docs.amazonaws.cn/lambda/latest/dg/lambda-concurrency.html#calculating-concurrency) using the following formula:

```
Concurrency = (average requests per second) * (average request duration in seconds)
```

To estimate how much concurrency you need, multiply average requests per second with the average request duration in seconds. You can estimate average requests per second using the `Invocation` metric, and the average request duration in seconds using the `Duration` metric.

When configuring provisioned concurrency, Lambda suggests adding a 10% buffer on top of the amount of concurrency your function typically needs. For example, if your function usually peaks at 200 concurrent requests, set the provisioned concurrency to 220 (200 concurrent requests \$1 10% = 220 provisioned concurrency).

## Optimizing function code when using provisioned concurrency


If you're using provisioned concurrency, consider restructuring your function code to optimize for low latency. For functions using provisioned concurrency, Lambda runs any initialization code, such as loading libraries and instantiating clients, during allocation time. Therefore, it's advisable to move as much initialization outside of the main function handler to avoid impacting latency during actual function invocations. In contrast, initializing libraries or instantiating clients within your main handler code means your function must run this each time it's invoked (this occurs regardless of whether you're using provisioned concurrency).

For on-demand invocations, Lambda may need to rerun your initialization code every time your function experiences a cold start. For such functions, you may choose to defer initialization of a specific capability until your function needs it. For example, consider the following control flow for a Lambda handler:

```
def handler(event, context):
    ...
    if ( some_condition ):
        // Initialize CLIENT_A to perform a task
    else:
        // Do nothing
```

In the previous example, instead of initializing `CLIENT_A` outside of the main handler, the developer initialized it within the `if` statement. By doing this, Lambda runs this code only if `some_condition` is met. If you initialize `CLIENT_A` outside the main handler, Lambda runs that code on every cold start. This can increase overall latency.

You can measure cold starts as Lambda scales up by adding X-Ray monitoring to your function. A function using provisioned concurrency does not exhibit cold start behavior since the execution environment is prepared ahead of invocation. However, provisioned concurrency must be applied to a [specific version or alias](https://docs.aws.amazon.com/lambda/latest/dg/configuration-versions.html) of a function, not the \$1LATEST version. In cases where you continue to see cold start behavior, ensure that you are invoking the version of alias with provisioned concurrency configured.

## Using environment variables to view and control provisioned concurrency behavior


It's possible for your function to use up all of its provisioned concurrency. Lambda uses on-demand instances to handle any excess traffic. To determine the type of initialization Lambda used for a specific environment, check the value of the `AWS_LAMBDA_INITIALIZATION_TYPE` environment variable. This variable has two possible values: `provisioned-concurrency` or `on-demand`. The value of `AWS_LAMBDA_INITIALIZATION_TYPE` is immutable and remains constant throughout the lifetime of the environment. To check the value of an environment variable in your function code, see [Retrieving Lambda environment variables](configuration-envvars.md#retrieve-environment-variables).

If you're using the .NET 8 runtime, you can configure the `AWS_LAMBDA_DOTNET_PREJIT` environment variable to improve the latency for functions, even if they don't use provisioned concurrency. The .NET runtime employs lazy compilation and initialization for each library that your code calls for the first time. As a result, the first invocation of a Lambda function may take longer than subsequent ones. To mitigate this, you can choose one of three values for `AWS_LAMBDA_DOTNET_PREJIT`:
+ `ProvisionedConcurrency`: Lambda performs ahead-of-time JIT compilation for all environments using provisioned concurrency. This is the default value.
+ `Always`: Lambda performs ahead-of-time JIT compilation for every environment, even if the function doesn't use provisioned concurrency.
+ `Never`: Lambda disables ahead-of-time JIT compilation for all environments.

## Understanding logging and billing behavior with provisioned concurrency


For provisioned concurrency environments, your function's initialization code runs during allocation, and periodically as Lambda recycles instances of your environment. Lambda bills you for initialization even if the environment instance never processes a request. Provisioned concurrency runs continually and incurs separate billing from initialization and invocation costs. For more details, see [Amazon Lambda Pricing](https://aws.amazon.com/lambda/pricing/).

When you configure a Lambda function with provisioned concurrency, Lambda pre-initializes that execution environment so that it's available ahead of invocation requests. Lambda logs the [ Init Duration field](lambda-runtime-environment.md#runtimes-lifecycle-ib) of the function in a [ platform-initReport](telemetry-schema-reference.md#platform-initReport) log event in JSON logging format every time the environment is initialized. To see this log event, configure your [JSON log level](monitoring-cloudwatchlogs-logformat.md) to at least `INFO`. You can also use the [Telemetry API](telemetry-api-reference.md) to consume platform events where the Init Duration field is reported.

## Using Application Auto Scaling to automate provisioned concurrency management


You can use Application Auto Scaling to manage provisioned concurrency on a schedule or based on utilization. If your function receives predictable traffic patterns, use scheduled scaling. If you want your function to maintain a specific utilization percentage, use a target tracking scaling policy.

**Note**  
If you use Application Auto Scaling to manage your function's provisioned concurrency, ensure that you [configure an initial provisioned concurrency value](#configuring-provisioned-concurrency) first. If your function doesn't have an initial provisioned concurrency value, Application Auto Scaling may not handle function scaling properly.

### Scheduled scaling


With Application Auto Scaling, you can set your own scaling schedule according to predictable load changes. For more information and examples, see [ Scheduled scaling for Application Auto Scaling](https://docs.amazonaws.cn/autoscaling/application/userguide/application-auto-scaling-scheduled-scaling.html) in the Application Auto Scaling User Guide, and [ Scheduling Amazon Lambda Provisioned Concurrency for recurring peak usage](https://amazonaws-china.com/blogs/compute/scheduling-aws-lambda-provisioned-concurrency-for-recurring-peak-usage/) on the Amazon Compute Blog.

### Target tracking


With target tracking, Application Auto Scaling creates and manages a set of CloudWatch alarms based on how you define your scaling policy. When these alarms activate, Application Auto Scaling automatically adjusts the amount of environments allocated using provisioned concurrency. Use target tracking for applications that don't have predictable traffic patterns.

To scale provisioned concurrency using target tracking, use the `RegisterScalableTarget` and `PutScalingPolicy` Application Auto Scaling API operations. For example, if you're using the Amazon Command Line Interface (CLI), follow these steps:

1. Register a function's alias as a scaling target. The following example registers the BLUE alias of a function named `my-function`:

   ```
   aws application-autoscaling register-scalable-target --service-namespace lambda \
       --resource-id function:my-function:BLUE --min-capacity 1 --max-capacity 100 \
       --scalable-dimension lambda:function:ProvisionedConcurrency
   ```

1. Apply a scaling policy to the target. The following example configures Application Auto Scaling to adjust the provisioned concurrency configuration for an alias to keep utilization near 70 percent, but you can apply any value between 10% and 90%.

   ```
   aws application-autoscaling put-scaling-policy \
       --service-namespace lambda \
       --scalable-dimension lambda:function:ProvisionedConcurrency \
       --resource-id function:my-function:BLUE \
       --policy-name my-policy \
       --policy-type TargetTrackingScaling \
       --target-tracking-scaling-policy-configuration '{ "TargetValue": 0.7, "PredefinedMetricSpecification": { "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization" }}'
   ```

You should see output that looks like this:

```
{
    "PolicyARN": "arn:aws:autoscaling:us-east-2:123456789012:scalingPolicy:12266dbb-1524-xmpl-a64e-9a0a34b996fa:resource/lambda/function:my-function:BLUE:policyName/my-policy",
    "Alarms": [
        {
            "AlarmName": "TargetTracking-function:my-function:BLUE-AlarmHigh-aed0e274-xmpl-40fe-8cba-2e78f000c0a7",
            "AlarmARN": "arn:aws:cloudwatch:us-east-2:123456789012:alarm:TargetTracking-function:my-function:BLUE-AlarmHigh-aed0e274-xmpl-40fe-8cba-2e78f000c0a7"
        },
        {
            "AlarmName": "TargetTracking-function:my-function:BLUE-AlarmLow-7e1a928e-xmpl-4d2b-8c01-782321bc6f66",
            "AlarmARN": "arn:aws:cloudwatch:us-east-2:123456789012:alarm:TargetTracking-function:my-function:BLUE-AlarmLow-7e1a928e-xmpl-4d2b-8c01-782321bc6f66"
        }
    ]
}
```

Application Auto Scaling creates two alarms in CloudWatch. The first alarm triggers when the utilization of provisioned concurrency consistently exceeds 70%. When this happens, Application Auto Scaling allocates more provisioned concurrency to reduce utilization. The second alarm triggers when utilization is consistently less than 63% (90 percent of the 70% target). When this happens, Application Auto Scaling reduces the alias's provisioned concurrency.

**Note**  
Lambda emits the `ProvisionedConcurrencyUtilization` metric only when your function is active and receiving requests. During periods of inactivity, no metrics are emitted, and your auto-scaling alarms will enter the `INSUFFICIENT_DATA` state. As a result, Application Auto Scaling won't be able to adjust your function's provisioned concurrency. This may lead to unexpected billing.

In the following example, a function scales between a minimum and maximum amount of provisioned concurrency based on utilization.

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/features-scaling-provisioned-auto.png)


**Legend**
+ ![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/features-scaling-provisioned.instances.png) Function instances
+ ![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/features-scaling-provisioned.open.png) Open requests
+ ![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/features-scaling-provisioned.provisioned.png) Provisioned concurrency
+ ![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/features-scaling-provisioned.standard.png) Standard concurrency

When the number of open requests increase, Application Auto Scaling increases provisioned concurrency in large steps until it reaches the configured maximum. After this, the function can continue to scale on standard, unreserved concurrency if you haven't reached your account concurrency limit. When utilization drops and stays low, Application Auto Scaling decreases provisioned concurrency in smaller periodic steps.

Both of the Application Auto Scaling alarms use the average statistic by default. Functions that experience quick bursts of traffic may not trigger these alarms. For example, suppose your Lambda function executes quickly (i.e. 20-100 ms) and your traffic comes in quick bursts. In this case, the number of requests exceeds the allocated provisioned concurrency during the burst. However, Application Auto Scaling requires the burst load to sustain for at least 3 minutes in order to provision additional environments. Additionally, both CloudWatch alarms require 3 data points that hit the target average to activate the auto scaling policy. If your function experiences quick bursts of traffic, using the **Maximum** statistic instead of the **Average** statistic can be more effective at scaling provisioned concurrency to minimize cold starts.

For more information on target tracking scaling policies, see [ Target tracking scaling policies for Application Auto Scaling](https://docs.amazonaws.cn/autoscaling/application/userguide/application-auto-scaling-target-tracking.html).

# Lambda scaling behavior
Scaling behavior

As your function receives more requests, Lambda automatically scales up the number of execution environments to handle these requests until your account reaches its concurrency quota. However, to protect against over-scaling in response to sudden bursts of traffic, Lambda limits how fast your functions can scale. This ** concurrency scaling rate** is the maximum rate at which functions in your account can scale in response to increased requests. (That is, how quickly Lambda can create new execution environments.) The concurrency scaling rate differs from the account-level concurrency limit, which is the total amount of concurrency available to your functions.

## Concurrency scaling rate


**In each Amazon Web Services Region, and for each function, your concurrency scaling rate is 1,000 execution environment instances every 10 seconds (or 10,000 requests per second every 10 seconds).** In other words, every 10 seconds, Lambda can allocate at most 1,000 additional execution environment instances, or accommodate 10,000 additional requests per second, to each of your functions.

Usually, you don't need to worry about this limitation. Lambda's scaling rate is sufficient for most use cases.

Importantly, the concurrency scaling rate is a function-level limit. This means that each function in your account can scale independently of other functions.

**Note**  
In practice, Lambda makes a best attempt to refill your concurrency scaling rate continuously over time, rather than in one single refill of 1,000 units every 10 seconds.

Lambda doesn't accrue unused portions of your concurrency scaling rate. This means that at any instant in time, your scaling rate is always 1,000 concurrency units at maximum. For example, if you don't use any of your available 1,000 concurrency units in a 10-second interval, you won't accrue 1,000 additional units in the next 10-second interval. Your concurrency scaling rate is still 1,000 in the next 10-second interval.

As long as your function continues to receive increasing numbers of requests, then Lambda scales at the fastest rate available to you, up to your account's concurrency limit. You can limit the amount of concurrency that individual functions can use by [configuring reserved concurrency](configuration-concurrency.md). If requests come in faster than your function can scale, or if your function is at maximum concurrency, then additional requests fail with a throttling error (429 status code).

# Monitoring concurrency
Monitoring concurrency

Lambda emits Amazon CloudWatch metrics to help you monitor concurrency for your functions. This topic explains these metrics and how to interpret them.

**Topics**
+ [

## General concurrency metrics
](#general-concurrency-metrics)
+ [

## Provisioned concurrency metrics
](#provisioned-concurrency-metrics)
+ [

## Working with the `ClaimedAccountConcurrency` metric
](#claimed-account-concurrency)

## General concurrency metrics


Use the following metrics to monitor concurrency for your Lambda functions. The granularity for each metric is 1 minute.
+ `ConcurrentExecutions` – The number of active concurrent invocations at a given point in time. Lambda emits this metric for all functions, versions, and aliases. For any function in the Lambda console, Lambda displays the graph for `ConcurrentExecutions` natively in the **Monitoring** tab, under **Metrics**. View this metric using **MAX**.
+ `UnreservedConcurrentExecutions` – The number of active concurrent invocations that are using unreserved concurrency. Lambda emits this metric across all functions in a region. View this metric using **MAX**.
+ `ClaimedAccountConcurrency` – The amount of concurrency that is unavailable for on-demand invocations. `ClaimedAccountConcurrency` is equal to `UnreservedConcurrentExecutions` plus the amount of allocated concurrency (i.e. the total reserved concurrency plus total provisioned concurrency). If `ClaimedAccountConcurrency` exceeds your account concurrency limit, you can [ request a higher account concurrency limit](https://aws.amazon.com/premiumsupport/knowledge-center/lambda-concurrency-limit-increase/). View this metric using **MAX**. For more information, see [Working with the `ClaimedAccountConcurrency` metric](#claimed-account-concurrency).

## Provisioned concurrency metrics


Use the following metrics to monitor Lambda functions using provisioned concurrency. The granularity for each metric is 1 minute.
+ `ProvisionedConcurrentExecutions` – The number of execution environment instances that are actively processing an invocation on provisioned concurrency. Lambda emits this metric for each function version and alias with provisioned concurrency configured. View this metric using **MAX**.

`ProvisionedConcurrentExecutions` is not the same as the total number of provisioned concurrency that you allocate. For example, suppose you allocate 100 units of provisioned concurrency to a function version. During any given minute, if at most 50 out of those 100 execution environments were handling invocations simultaneously, then the value of **MAX**(`ProvisionedConcurrentExecutions`) is 50.
+ `ProvisionedConcurrencyInvocations` – The number of times Lambda invokes your function code using provisioned concurrency. Lambda emits this metric for each function version and alias with provisioned concurrency configured. View this metric using **SUM**.

`ProvisionedConcurrencyInvocations` differs from `ProvisionedConcurrentExecutions` in that `ProvisionedConcurrencyInvocations` counts total number of invocations, while `ProvisionedConcurrentExecutions` counts number of active environments. To understand this distinction, consider the following scenario:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/concurrency-metrics-pc-executions-vs-invocations.png)


In this example, suppose that you receive 1 invocation per minute, and each invocation takes 2 minutes to complete. Each orange horizontal bar represents a single request. Suppose that you allocate 10 units of provisioned concurrency to this function, such that each request runs on provisioned concurrency.

In between minutes 0 and 1, `Request 1` comes in. **At minute 1**, the value for **MAX**(`ProvisionedConcurrentExecutions`) is 1, since at most 1 execution environment was active during the past minute. The value for **SUM**(`ProvisionedConcurrencyInvocations`) is also 1, since 1 new request came in during the past minute.

In between minutes 1 and 2, `Request 2` comes in, and `Request 1` continues to run. **At minute 2**, the value for **MAX**(`ProvisionedConcurrentExecutions`) is 2, since at most 2 execution environments were active during the past minute. However, the value for **SUM**(`ProvisionedConcurrencyInvocations`) is 1, since only 1 new request came in during the past minute. This metric behavior continues until the end of the example.
+ `ProvisionedConcurrencySpilloverInvocations` – The number of times Lambda invokes your function on standard (reserved or unreserved) concurrency when all provisioned concurrency is in use. Lambda emits this metric for each function version and alias with provisioned concurrency configured. View this metric using **SUM**. The value of `ProvisionedConcurrencyInvocations` \$1 `ProvisionedConcurrencySpilloverInvocations` should be equal to the total number of function invocations (i.e. the `Invocations` metric).

  `ProvisionedConcurrencyUtilization` – The percentage of provisioned concurrency in use (i.e. the value of `ProvisionedConcurrentExecutions` divided by the total amount of provisioned concurrency allocated). Lambda emits this metric for each function version and alias with provisioned concurrency configured. View this metric using **MAX**.

For example, suppose you provision 100 units of provisioned concurrency to a function version. During any given minute, if at most 60 out of those 100 execution environments were handling invocations simultaneously, then the value of **MAX**(`ProvisionedConcurrentExecutions`) is 60, and the value of **MAX**(`ProvisionedConcurrencyUtilization`) is 0.6.

A high value for `ProvisionedConcurrencySpilloverInvocations` may indicate that you need to allocate additional provisioned concurrency for your function. Alternatively, you can [ configure Application Auto Scaling to handle automatic scaling of provisioned concurrency](https://docs.amazonaws.cn/lambda/latest/dg/provisioned-concurrency.html#managing-provisioned-concurency) based on pre-defined thresholds.

Conversely, consistently low values for `ProvisionedConcurrencyUtilization` may indicate that you over-allocated provisioned concurrency for your function.

## Working with the `ClaimedAccountConcurrency` metric


Lambda uses the `ClaimedAccountConcurrency` metric to determine how much concurrency your account is available for on-demand invocations. Lambda calculates `ClaimedAccountConcurrency` using the following formula:

```
ClaimedAccountConcurrency = UnreservedConcurrentExecutions + (allocated concurrency)
```

`UnreservedConcurrentExecutions` is the number of active concurrent invocations that are using unreserved concurrency. Allocated concurrency is the sum of the following two parts (substituting `RC` as "reserved concurrency" and `PC` as "provisioned concurrency"):
+ The total `RC` across all functions in a Region.
+ The total `PC` across all functions in a Region that use `PC`, excluding functions that use `RC`.

**Note**  
You can’t allocate more `PC` than `RC` for a function. Thus, a function’s `RC` is always greater than or equal to its `PC`. To calculate the contribution to allocated concurrency for such functions with both `PC` and `RC`, Lambda considers only `RC`, which is the maximum of the two.

Lambda uses the `ClaimedAccountConcurrency` metric, rather than `ConcurrentExecutions`, to determine how much concurrency is available for on-demand invocations. While the `ConcurrentExecutions` metric is useful for tracking the number of active concurrent invocations, it doesn't always reflect your true concurrency availability. This is because Lambda also considers reserved concurrency and provisioned concurrency to determine availability.

To illustrate `ClaimedAccountConcurrency`, consider a scenario where you configure a lot of reserved concurrency and provisioned concurrency across your functions that go largely unused. In the following example, assume that your account concurrency limit is 1,000, and you have two main functions in your account: `function-orange` and `function-blue`. You allocate 600 units of reserved concurrency for `function-orange`. You allocate 200 units of provisioned concurrency for `function-blue`. Suppose that over time, you deploy additional functions and observe the following traffic pattern:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/claimed-account-concurrency.png)


In the previous diagram, the black lines indicate the actual concurrency use over time, and the red line indicates the value of `ClaimedAccountConcurrency` over time. Throughout this scenario, `ClaimedAccountConcurrency` is 800 at minimum, despite low actual concurrency utilization across your functions. This is because you allocated 800 total units of concurrency for `function-orange` and `function-blue`. From Lambda's perspective, you have "claimed" this concurrency for use, so you effectively have only 200 units of concurrency remaining for other functions.

For this scenario, allocated concurrency is 800 in the `ClaimedAccountConcurrency` formula. We can then derive the value of `ClaimedAccountConcurrency` at various points in the diagram:
+ At `t1`, `ClaimedAccountConcurrency` is 800 (800 \$1 0 `UnreservedConcurrentExecutions`).
+ At `t2`, `ClaimedAccountConcurrency` is 900 (800 \$1 100 `UnreservedConcurrentExecutions`).
+ At `t3`, `ClaimedAccountConcurrency` is again 900 (800 \$1 100 `UnreservedConcurrentExecutions`).

### Setting up the `ClaimedAccountConcurrency` metric in CloudWatch


Lambda emits the `ClaimedAccountConcurrency` metric in CloudWatch. Use this metric along with the value of `SERVICE_QUOTA(ConcurrentExecutions)` to get the percent utilization of concurrency in your account, as shown in the following formula:

```
Utilization = (ClaimedAccountConcurrency/SERVICE_QUOTA(ConcurrentExecutions)) * 100%
```

The following screenshot illustrates how you can graph this formula in CloudWatch. The green `claim_utilization` line represents the concurrency utilization in this account, which is at around 40%:

![\[\]](http://docs.amazonaws.cn/en_us/lambda/latest/dg/images/claimed-account-concurrency-cloudwatch-graph.png)


The previous screenshot also includes a CloudWatch alarm that goes into `ALARM` state when the concurrency utilization exceeds 70%. You can use the `ClaimedAccountConcurrency` metric along with similar alarms to proactively determine when you might need to request a higher account concurrency limit.