How Application Auto Scaling predictive scaling works - Application Auto Scaling
Services or capabilities described in Amazon Web Services documentation might vary by Region. To see the differences applicable to the China Regions, see Getting Started with Amazon Web Services in China (PDF).

How Application Auto Scaling predictive scaling works

To use predictive scaling, create a predictive scaling policy that specifies the CloudWatch metric to monitor and analyze. You can use a predefined metric or a custom metric. For predictive scaling to start forecasting future values, this metric must have at least 24 hours of data.

After you create the policy, predictive scaling starts analyzing metric data from up to the past 14 days to identify patterns. It uses this analysis to generate an hourly forecast of capacity requirements for the next 48 hours. The forecast is updated every 6 hours using the latest CloudWatch data. As new data comes in, predictive scaling is able to continuously improve the accuracy of future forecasts.

You can first enable predictive scaling in forecast only mode. In this mode, it generates capacity forecasts but does not actually scale your capacity based on those forecasts. This allows you to evaluate the accuracy and suitability of the forecast.

After you review the forecast data and decide to start scaling based on that data, switch the scaling policy to forecast and scale mode. In this mode:

  • If the forecast expects an increase in load, predictive scaling will increase the capacity.

  • If the forecast expects a decrease in load, predictive scaling will not scale in to remove capacity. This ensures that you scale-in only when the demand actually drops, and not just on predictions. To remove capacity that is no longer needed, you must create a Target Tracking or Step Scaling policy because they respond to real time metric data.

By default, predictive scaling scales your scalable targets at the start of each hour based on the forecast for that hour. You can optionally specify an earlier start time by using the SchedulingBufferTime property in the PutScalingPolicy API operation. This allows you to launch predicted capacity ahead of the forecasted demand, which gives the new capacity adequate time to become ready to handle traffic.

Maximum capacity limit

By default, when scaling policies are set, they cannot increase capacity higher than its maximum capacity.

Alternatively, you can allow the scalable target's maximum capacity to be automatically increased if the forecast capacity approaches or exceeds the maximum capacity of the scalable target. To enable this behavior, use the MaxCapacityBreachBehavior and MaxCapacityBuffer properties in the PutScalingPolicy API operation or the Max capacity behavior setting in the Amazon Web Services Management Console.

Warning

Use caution when allowing the maximum capacity to be automatically increased. The maximum capacity does not automatically decrease back to the original maximum.

Commonly used commands for scaling policy creation, management, and deletion

The commonly used commands for working with predictive scaling policies include:

  • register-scalable-target to register Amazon or custom resources as scalable targets, to suspend scaling, and to resume scaling.

  • put-scaling-policy to create a predictive scaling policy.

  • get-predictive-scaling-forecast to retrieve the forecast data for a predictive scaling policy.

  • describe-scaling-activities to return information about scaling activities in an Amazon Web Services Region.

  • describe-scaling-policies to return information about scaling policies in an Amazon Web Services Region.

  • delete-scaling-policy to delete a scaling policy.

Custom metrics

Custom metrics can be used to predict the capacity needed for an application. Custom metrics are useful when predefined metrics are not enough to capture the load on your application.

Considerations

The following considerations apply when working with predictive scaling.

  • Confirm whether predictive scaling is suitable for your application. An application is a good fit for predictive scaling if it exhibits recurring load patterns that are specific to the day of the week or the time of day. Evaluate the forecast before letting predictive scaling actively scale your application.

  • Predictive scaling needs at least 24 hours of historical data to start forecasting. However, forecasts are more effective if historical data spans two full weeks.

  • Choose a load metric that accurately represents the full load on your application and is the aspect of your application that's most important to scale on.