# Detecting outliers with ML-powered anomaly detection
<a name="anomaly-detection"></a>

Amazon Quick Sight uses proven Amazon technology to continuously run ML-powered anomaly detection across millions of metrics to discover hidden trends and outliers in your data. This tool allows you to get deep insights that are often buried in the aggregates and not scalable with manual analysis. With ML-powered anomaly detection, you can find outliers in your data without the need for manual analysis, custom development, or ML domain expertise. 

Amazon Quick Sight notifies you in your visuals if it detects that you can analyze an anomaly or do some forecasting on your data. 

Anomaly detection is not available in the `eu-central-2` Europe (Zurich) region.

**Important**  
ML-powered anomaly detection is a compute-intense task. Before you start using it, you can get an idea of costs by analyzing the amount of data that you want to use. We offer a tiered pricing model that is based on the number of metrics you process per month. 

**Topics**
+ [

# Concepts for anomaly or outlier detection
](anomaly-detection-outliers-and-key-drivers.md)
+ [

# Setting up ML-powered anomaly detection for outlier analysis
](anomaly-detection-using.md)
+ [

# Exploring outliers and key drivers with ML-powered anomaly detection and contribution analysis
](anomaly-exploring.md)

# Concepts for anomaly or outlier detection
<a name="anomaly-detection-outliers-and-key-drivers"></a>

Amazon Quick Sight uses the word *anomaly* to describe data points that fall outside an overall pattern of distribution. There are many other words for anomalies, which is a scientific term, including outliers, deviations, oddities, exceptions, irregularities, quirks, and many more. The term that you use might be based on the type of analysis you do, or the type of data you use, or even just the preference of your group. These outlying data points represent an entity—a person, place, thing, or time—which is exceptional in some way. 

Humans easily recognize patterns and spot things that aren't like the others. Our senses provide this information for us. If the pattern is simple, and there is only a little data, you can easily make a graph to highlight the outliers in your data. Some simple examples include the following:
+ A red balloon in a group of blue ones
+ A racehorse that is far ahead of the others
+ A kid who isn't paying attention during class
+ A day when online orders are up, but shipping is down
+ A person who got well, where others didn't

Some data points represent a significant event, and others represent a random occurrence. Analysis uncovers which data is worth investigating, based on what driving factors (key drivers) contributed to the event. Questions are essential to data analysis. Why did it happen? What's it related to? Did it happen only once or many times? What can you do to encourage or discourage more like it? 

Understanding how and why a variation exists, and whether there is a pattern in the variations, requires more thought. Without the assistance of machine learning, each person might come to a different conclusion, because they have different experience and information. Therefore, each person might make a slightly different business decision. If there is a lot of data or variables to consider, it can require an overwhelming amount of analysis. 

ML-powered anomaly detection identifies the causations and correlations to enable you to make data-driven decisions. You still have control over defining how you want the job to work on your data. You can specify your own parameters, and choose additional options, such as identifying key drivers in a contribution analysis. Or you can use the default settings. The following section walks you through the setup process, and provides explanations for the options available. 

# Setting up ML-powered anomaly detection for outlier analysis
<a name="anomaly-detection-using"></a>

Use procedures in the following sections to start detecting outliers, detecting anomalies, and identifying the key drivers that contribute to them.

**Topics**
+ [

# Viewing anomaly and forecast notifications
](anomaly-detection-adding-from-visuals.md)
+ [

# Adding an ML insight to detect outliers and key drivers
](anomaly-detection-adding-anomaly-insights.md)
+ [

# Using contribution analysis for key drivers
](anomaly-detection-adding-key-drivers.md)

# Viewing anomaly and forecast notifications
<a name="anomaly-detection-adding-from-visuals"></a>

Amazon Quick Sight notifies you on a visual where it detects an anomaly, key drivers, or a forecasting opportunity. You can follow the prompts to set up anomaly detection or forecasting based on the data in that visual.

1. In an existing line chart, look for an insight notification in the menu on the visual widget. 

1. Choose the lightbulb icon to display the notification.

1. If you want more information about the ML insight, you can follow the screen prompts to add an ML insight.

# Adding an ML insight to detect outliers and key drivers
<a name="anomaly-detection-adding-anomaly-insights"></a>

You can add an ML insight that detects *anomalies*, which are outliers that seem significant. To get started, you create for your insight a widget, also known as an *autonarrative*. As you configure your options, you can view a limited screenshot of your insight in the **Preview** pane at screen right.

In your insight widget, you can add up to five dimension fields that are not calculated fields. In the field wells, values for **Categories** represent the dimensional values that Amazon Quick Sight uses to split the metric. For example, let's say that you are analyzing revenue across all product categories and product SKUs. There are 10 product categories, each with 10 product SKUs. Amazon Quick Sight splits the metric by the 100 unique combinations and runs anomaly detection on each combination for the split.

The following procedure shows how to do this, and also how to add contribution analysis to detect the key drivers that are causing each anomaly. You can add contribution analysis later, as described in [Using contribution analysis for key drivers](anomaly-detection-adding-key-drivers.md).

**To set up outlier analysis, including key drivers**

1. Open your analysis and in the toolbar, choose **Insights**, then **Add**. From the list, choose **Anomaly detection** and **Select**.

1. Follow the screen prompt on the new widget, which tells you to choose fields for the insight. Add at least one date, one measure, and one dimension. 

1. Choose **Get started** on the widget. The configuration screen appears.

1. Under **Compute options**, choose values for the following options.

   1. For **Combinations to be analysed**, choose one of the following options:

      1. **Hierarchical**

         Choose this option if you want to analyze the fields hierarchically. For example, if you chose a date (T), a measure (N), and three dimension categories (C1, C2, and C3), Quick Sight analyses the fields hierarchically, as shown following.

         ```
         T-N, T-C1-N, T-C1-C2-N, T-C1-C2-C3-N
         ```

      1. **Exact**

         Choose this option if you want to analyze only the exact combination of fields in the Category field well, as they are listed. For example, if you chose a date (T), a measure (N), and three dimension categories (C1, C2, and C3), Quick Sight analyses only the exact combination of category fields in the order they are listed, as shown following.

         ```
         T-C1-C2-C3-N
         ```

      1. **All**

         Choose this option if you want to analyze all field combinations in the Category field well. For example, if you chose a date (T), a measure (N), and three dimension categories (C1, C2, and C3), Quick Sight analyses all combinations of fields, as shown following.

         ```
         T-N, T-C1-N, T-C1-C2-N, T-C1-C2-C3-N, T-C1-C3-N, T-C2-N, T-C2-C3-N, T-C3-N
         ```

      If you chose a date and a measure only, Quick Sight analyses the fields by date and then by measure.

      In the **Fields to be analyzed** section, you can see a list of fields from the field wells for reference.

   1. For **Name**, enter a descriptive alphanumeric name with no spaces, or choose the default value. This provides a name for the computation.

      If you plan on editing the narrative that automatically displays on the widget, you can use the name to identify this widget's calculation. Customize the name if you plan to edit the autonarrative and if you have other similar calculations in your analysis.

1. In the **Display options** section, choose the following options to customize what is displayed in your insight widget. You can still explore all your results, no matter what you display.

   1. **Maximum number of anomalies to show** – The number of outliers you want to display in the narrative widget. 

   1. **Severity** – The minimum level of severity for anomalies that you want to display in the insight widget.

      A *level of severity* is a range of anomaly scores that is characterized by the lowest actual anomaly score included in the range. All anomalies that score higher are included in the range. If you set severity to **Low**, the insight displays all of the anomalies that rank between low and very high. If you set the severity to **Very high**, the insight displays only the anomalies that have the highest anomaly scores.

      You can use the following options:
      + **Very high** 
      + **High and above** 
      + **Medium and above** 
      + **Low and above** 

   1. **Direction** – The direction on the x-axis or y-axis that you want to identify as anomalous. You can choose from the following:
      + **Higher than expected** to identify higher values as anomalies.
      + **Lower than expected** to identify lower values as anomalies. 
      + **[ALL]** to identify all anomalous values, high and low (default setting).

   1. **Delta** – Enter a custom value to use to identify anomalies. Any amount higher than the threshold value counts as an anomaly. The values here change how the insight works in your analysis. In this section, you can set the following:
      + **Absolute value** – The actual value to use. For example, suppose this is 48. Amazon Quick Sight then identifies values as anomalous when the difference between a value and the expected value is greater than 48. 
      + **Percentage** – The percentage threshold to use. For example, suppose this is 12.5%. Amazon Quick Sight then identifies values as anomalous when the difference between a value and the expected value is greater than 12.5%.

   1. **Sort by** – Choose a sort method for your results. Some methods are based on the anomaly score that Amazon Quick Sight generates. Amazon Quick Sight gives higher scores to data points that look anomalous. You can use any of the following options: 
      + **Weighted anomaly score** – The anomaly score multiplied by the log of the absolute value of the difference between the actual value and the expected value. This score is always a positive number. 
      + **Anomaly score** – The actual anomaly score assigned to this data point.
      + **Weighted difference from expected value** – The anomaly score multiplied by the difference between the actual value and the expected value (default).
      + **Difference from expected value** – The actual difference between the actual value and the expected value (that is, actual−expected).
      + **Actual value** – The actual value with no formula applied.

1. In the **Schedule options** section, set the schedule for automatically running the insight recalculation. The schedule runs only for published dashboards. In the analysis, you can run it manually as needed. Scheduling includes the following settings:
   + **Occurrence** – How often that you want the recalculation to run: every hour, every day, every week, or every month.
   + **Start schedule on** – The date and time to start running this schedule.
   + **Timezone** – The time zone that the schedule runs in. To view a list, delete the current entry. 

1. In the **Top contributors** section, set Amazon Quick Sight to analyze the key drivers when an outlier (anomaly) is detected.

   For example, Amazon Quick Sight can show the top customers that contributed to a spike in sales in the US for home improvement products. You can add up to four dimensions from your dataset. These include dimensions that you didn't add to the field wells of this insight widget.

   For a list of dimensions available for contribution analysis, choose **Select fields**.

1. Choose **Save** to confirm your choices. Choose **Cancel** to exit without saving.

1. From the insight widget, choose **Run now** to run the anomaly detection and view your insight.

The amount of time that anomaly detecton takes to complete varies depending on how many unique data points you are analyzing. The process can take a few minutes for a minimum number of points, or it can take many hours.

While it's running in the background, you can do other work in your analysis. Make sure to wait for it to complete before you change the configuration, edit the narrative, or open the **Explore anomalies** page for this insight.

The insight widget needs to run at least once before you can see results. If you think the status might be out of date, you can refresh the page. The insight can have the following states.


| Appears on the Page | Status | 
| --- | --- | 
| Run now button | The job has not yet started. | 
| Message about Analyzing for anomalies | The job is currently running. | 
| Narrative about the detected anomalies (outliers)  | The job has run successfully. The message says when this widget's calculation was last updated. | 
| Alert icon with an exclamation point (\$1)  | This icon indicates there was an error during the last run. If the narrative also displays, you can still use Explore anomalies to use data from the previous successful run.  | 

# Using contribution analysis for key drivers
<a name="anomaly-detection-adding-key-drivers"></a>

Amazon Quick Sight can identify the dimensions (categories) that contribute to outliers in measures (metrics) between two points in time. The key driver that contributes to an outlier helps you to answer the question: What happened to cause this anomaly? 

If you are already using anomaly detection without contribution analysis, you can enable the existing ML insight to find key drivers. Use the following procedure to add contribution analysis and identify the key drivers behind outliers. Your insight for anomaly detection needs to include a time field and at least one aggregated metric (SUM, AVERAGE, or COUNT). You can include multiple categories (dimension fields) if you wish, but you can also run contribution analysis without specifying any category or dimension field.

You can also use this procedure to change or remove fields as key drivers in your anomaly detection.

**To add contribution analysis to identify key drivers**

1. Open your analysis and locate an existing ML insight for anomaly detection. Select the insight widget to highlight it.

1. Choose **Menu Options** (**…**) from the menu on the visual.

1. Choose **Configure anomaly** to edit the settings.

1. The **Contribution analysis (optional)** setting allows Amazon Quick Sight to analyze the key drivers when an outlier (anomaly) is detected. For example, Amazon Quick Sight can show you the top customers that contributed to a spike in sales in the US for home improvement products. You can add up to four dimensions from your dataset, including dimensions that you didn't add to the field wells of this insight widget.

   To view a list of dimensions available for contribution analysis, choose **Select fields**.

   If you want to change the fields you're using as key drivers, change the fields that are enabled in this list. If you disable all of them, Quick Sight won't perform any contribution analysis in this insight.

1. To save your changes, scroll to the bottom of the configuration options, and choose **Save**. To exit without saving, choose **Cancel**. To completely remove these settings, choose **Delete**.

# Exploring outliers and key drivers with ML-powered anomaly detection and contribution analysis
<a name="anomaly-exploring"></a>

You can interactively explore the anomalies (also known as outliers) in your analysis, along with the contributors (key drivers). The analysis is available for you to explore after the ML-powered anomaly detection runs. The changes you make in this screen aren't saved when you go back to the analysis.

To begin, choose **Explore anomalies** in the insight. The following screenshot shows the anomalies screen as it appears when you first open it. In this example, contributors analysis is set up and shows two key drivers.

![\[Anomalies analysis with contributors shown.\]](http://docs.amazonaws.cn/en_us/quick/latest/userguide/images/anomaly-exploration-v2.png)


The sections of the screen include the following, from top left to bottom right:
+ **Contributors** displays key drivers. To see this section, you need to have contributors set up in your anomaly configuration. 
+ **Controls** contains settings for anomaly exploration.
+ **Number of anomalies** displays outliers detected over time. You can hide or show this chart section.
+ **Your field names** for category or dimension fields act as titles for charts that show anomalies for each category or dimension. 

The following sections provide detailed information for each aspect of exploring anomalies.

**Topics**
+ [

# Exploring contributors (key drivers)
](exploring-anomalies-key-drivers.md)
+ [

# Setting controls for anomaly detection
](exploring-anomalies-controls.md)
+ [

# Showing and hiding anomalies by date
](exploring-anomalies-by-date.md)
+ [

# Exploring anomalies per category or dimension
](exploring-anomalies-per-category-or-dimension.md)

# Exploring contributors (key drivers)
<a name="exploring-anomalies-key-drivers"></a>

If your anomaly insight is set up to detect key drivers, Quick Sight runs the contribution analysis to determine which categories (dimensions) are influencing the outliers. The **Contributors** section appears on the left. 

**Contributors** contains the following sections:
+ **Narrative** – At top left, a summary describes any changes in the metrics.
+ **Top contributors configuration** – Choose **Configure** to change the contributors and the date range to use in this section.
+ **Sort by** – Sets the sort applied to the results that appear below. You can choose from the following:
  + **Absolute difference** 
  + **Contribution percentage** (default) 
  + **Deviation from expected** 
  + **Percentage difference** 
+ **Top contributor results** – Displays the results of the top contributor analysis for the point in time selected on the timeline at right. 

  Contribution analysis identifies up to four of the top contributing factors or key drivers of an anomaly. For example, Amazon Quick Sight can show you the top customers that contributed to a spike in sales in the US for health products. This panel appears only if you choose to include fields in contribution analysis when you configure the anomaly. 

  If you don't see this panel and you want to display it, you can turn it on. To do so, go to the analysis, choose anomaly configuration from the insight's menu, and choose up to four fields to analyze for contributions. If you make changes in the sheet controls that exclude the contributing drivers, the **Contributions** panel closes.

# Setting controls for anomaly detection
<a name="exploring-anomalies-controls"></a>

You can find the settings for anomaly detection in the **Controls** section of the screen. You can open and close this section by clicking the word **Controls**.

The settings include the following:
+ **Controls** – The current settings appear at the top of the workspace. You can expand this section by choosing the double arrow icon on the right side. The following settings are available for exploring outliers generated by ML-powered anomaly detection:
  + **Severity** – Sets how sensitive your detector is to detected anomalies (outliers). You should expect to see more anomalies with the threshold set to **Low and above**, and fewer anomalies when the threshold is set to **High and above**. This sensitivity is determined based on standard deviations of the anomaly score generated by the RCF algorithm. The default is **Medium and above**.
  + **Direction** – The direction on the x-axis or y-axis that you want to identify as anomalous. The default is [ALL]. You can choose the following:
    + Set to **Higher than expected** to identify higher values as anomalies. 
    + Set to **Lower than expected** to identify lower values as anomalies. 
    + Set to **[ALL]** to identify all anomalous values, both high and low. 
  + **Minimum Delta - absolute value** – Enter a custom value to use to as the absolute threshold to identify anomalies. Any amount higher than this value counts as an anomaly. 
  + **Minimum Delta - percentage** – Enter a custom value to use to as the percentage threshold to identify anomalies. Any amount higher than this value counts as an anomaly. 
  + **Sort by** – Choose the method that you want to apply to sorting anomalies. These are listed in preferred order on the screen. View the following list for a description of each method.
    + **Weighted anomaly score** – The anomaly score multiplied by the log of the absolute value of the difference between the actual value and the expected value. This score is always a positive number.
    + **Anomaly score** – The actual anomaly score assigned to this data point.
    + **Weighted difference from expected value** – (Default) The anomaly score multiplied by the difference between the actual value and the expected value.
    + **Difference from expected value** – The actual difference between the actual value and the expected value (actual−expected).
    + **Actual value** – The actual value with no formula applied.
  + **Categories** – One or more settings can appear at the end of the other settings. There is one for each category field that you added to the category field well. You can use category settings to limit the data that displays in the screen. 

# Showing and hiding anomalies by date
<a name="exploring-anomalies-by-date"></a>

The **Number of anomalies** chart shows outliers detected over time. If you don't see this chart, you can display it by choosing **SHOW ANOMALIES BY DATE**. 

This chart shows anomalies (outliers) for the most recent data point in the time series. When expanded, it displays the following components:
+ **Anomalies** – The middle of the screen displays the anomalies for the most recent data point in the time series. One or more graphs appear with a chart showing variations in a metric over time. To use this graph, select a point along the timeline. The currently selected point in time is highlighted in the graph, and includes a menu offering you the option to analyze contributions to the current metric. You can also drag the cursor over the timeline without choosing a specific point to display the metric value for that point in time.
+ **Anomalies by date** – If you choose **SHOW ANOMALIES BY DATE**, another graph appears that shows how many significant anomalies there were for each time point. You can see details in this chart on each bar's context menu. 
+ **Timeline adjustment** – Each graph has a timeline adjustor tool below the dates, which you can use to compress, expand, or choose a period of time to view.

# Exploring anomalies per category or dimension
<a name="exploring-anomalies-per-category-or-dimension"></a>

The main section of the **Explore anomalies** screen is locked to the lower right of the screen. It remains here no matter how many other sections of the screen are open. If multiple anomalies exist, you can scroll out to highlight them. The chart displays anomalies in color ranges and shows where they occur over a period of time. 

![\[Explore anomalies screen.\]](http://docs.amazonaws.cn/en_us/quick/latest/userguide/images/anomaly-exploration-1.png)


Each category or dimension has a separate chart that uses the field name as the chart title. Each chart contains the following components:
+ **Configure alerts** – If you are exploring anomalies from a dashboard, select this button to subscribe to alerts and contribution analysis (if configured). You can set up the alerts for the level of severity (medium, high, and so on). You can get the top five alerts for **Higher than expected**, **Lower than expected**, or ALL. Dashboard readers can configure alerts for themselves. If you open the **Explore Anomalies** page doesn't display this button if you opened the page from an analysis.
**Note**  
The ability to configure alerts is available only in published dashboards.
+ **Status** – Under the **Anomalies** header, the status label displays information on the last run. For example, you might see "Anomalies for Revenue on November 17, 2018." This label tells you how many metrics were processed and how long ago. You can choose the link to learn more about the details, such as how many metrics were ignored.