

# Model evaluation


After you’ve built your model, you can evaluate how well your model performed on your data before using it to make predictions. You can use information, such as the model’s accuracy when predicting labels and advanced metrics, to determine whether your model can make sufficiently accurate predictions for your data.

The section [Evaluate your model's performance](canvas-scoring.md) describes how to view and interpret the information on your model's **Analyze** page. The section [Use advanced metrics in your analyses](canvas-advanced-metrics.md) contains more detailed information about the **Advanced metrics** used to quantify your model’s accuracy.

You can also view more advanced information for specific *model candidates*, which are all of the model iterations that Canvas runs through while building your model. Based on the advanced metrics for a given model candidate, you can select a different candidate to be the default, or the version that is used for making predictions and deploying. For each model candidate, you can view the **Advanced metrics** information to help you decide which model candidate you’d like to select as the default. You can view this information by selecting the model candidate from the **Model leaderboard**. For more information, see [View model candidates in the model leaderboard](canvas-evaluate-model-candidates.md).

Canvas also provides the option to download a Jupyter notebook so that you can view and run the code used to build your model. This is useful if you’d like to make adjustments to the code or learn more about how your model was built. For more information, see [Download a model notebook](canvas-notebook.md).

# Evaluate your model's performance


Amazon SageMaker Canvas provides overview and scoring information for the different types of model. Your model’s score can help you determine how accurate your model is when it makes predictions. The additional scoring insights can help you quantify the differences between the actual and predicted values.

To view the analysis of your model, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model that you built.

1. In the top navigation pane, choose the **Analyze** tab.

1. Within the **Analyze** tab, you can view the overview and scoring information for your model.

The following sections describe how to interpret the scoring for each model type.

## Evaluate categorical prediction models


The **Overview** tab shows you the column impact for each column. **Column impact** is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

The following screenshot shows the **Accuracy** score for the model, along with the **Optimization metric**, which is the metric that you choose to optimize when building the model. In this case, the **Optimization metric** is **Accuracy**. You can specify a different optimization metric if you build a new version of your model.

![\[Screenshot of the accuracy score and optimization metric on the Analyze tab in Canvas.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/analyze-tab-2-category.png)


The **Scoring** tab for a categorical prediction model gives you the ability to visualize all the predictions. Line segments extend from the left of the page, indicating all the predictions the model has made. In the middle of the page, the line segments converge on a perpendicular segment to indicate the proportion of each prediction to a single category. From the predicted category, the segments branch out to the actual category. You can get a visual sense of how accurate the predictions were by following each line segment from the predicted category to the actual category.

The following image gives you an example **Scoring** section for a **3\$1 category prediction** model.

![\[Screenshot of the Scoring tab for a 3+ category prediction model.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/canvas-analyze/canvas-multiclass-classification.png)


You can also view the **Advanced metrics** tab for more detailed information about your model’s performance, such as the advanced metrics, error density plots, or confusion matrices. To learn more about the **Advanced metrics** tab, see [Use advanced metrics in your analyses](canvas-advanced-metrics.md).

## Evaluate numeric prediction models


The **Overview** tab shows you the column impact for each column. **Column impact** is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

The following screenshot shows the **RMSE** score for the model on the **Overview** tab, which in this case is the **Optimization metric**. The **Optimization metric** is the metric that you choose to optimize when building the model. You can specify a different optimization metric if you build a new version of your model.

![\[Screenshot of the RMSE optimization metric on the Analyze tab in Canvas.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/analyze-tab-2-numeric.png)


The **Scoring** tab for numeric prediction shows a line to indicate the model's predicted value in relation to the data used to make predictions. The values of the numeric prediction are often \$1/- the RMSE (root mean squared error) value. The value that the model predicts is often within the range of the RMSE. The width of the purple band around the line indicates the RMSE range. The predicted values often fall within the range.

The following image shows the **Scoring** section for numeric prediction.

![\[Screenshot of the Scoring tab for a numeric prediction model.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/canvas-analyze/canvas-analyze-regression-scoring.png)


You can also view the **Advanced metrics** tab for more detailed information about your model’s performance, such as the advanced metrics, error density plots, or confusion matrices. To learn more about the **Advanced metrics** tab, see [Use advanced metrics in your analyses](canvas-advanced-metrics.md).

## Evaluate time series forecasting models


On the **Analyze** page for time series forecasting models, you can see an overview of the model’s metrics. You can hover over each metric for more information, or you can see [Use advanced metrics in your analyses](canvas-advanced-metrics.md) for more information about each metric.

In the **Column impact** section, you can see the score for each column. **Column impact** is a percentage score that indicates how much weight a column has in making predictions in relation to the other columns. For a column impact of 25%, Canvas weighs the prediction as 25% for the column and 75% for the other columns.

The following screenshot shows the time series metrics scores for the model, along with the **Optimization metric**, which is the metric that you choose to optimize when building the model. In this case, the **Optimization metric** is **RMSE**. You can specify a different optimization metric if you build a new version of your model. These metrics scores are taken from your backtest results, which are available for download in the **Artifacts** tab.

![\[Screenshot of the RMSE optimization metric on the Analyze tab in Canvas.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/analyze-tab-2-time-series.png)


The **Artifacts** tab provides access to several key resources that you can use to dive deeper into your model’s performance and continue iterating upon it:
+ **Shuffled training and validation splits** – This section includes links to the artifacts generated when your dataset was split into training and validation sets, enabling you to review the data distribution and potential biases.
+ **Backtest results** – This section includes a link to the forecasted values for your validation dataset, which is used to generate accuracy metrics and evaluation data for your model.
+ **Accuracy metrics** – This section lists the advanced metrics that evaluate your model's performance, such as Root Mean Squared Error (RMSE). For more information about each metric, see [Metrics for time series forecasts](canvas-metrics.md#canvas-time-series-forecast-metrics).
+ **Explainability report** – This section provides a link to download the explainability report, which offers insights into the model's decision-making process and the relative importance of input columns. This report can help you identify potential areas for improvement.

On the **Analyze** page, you can also choose the **Download** button to directly download the backtest results, accuracy metrics, and explainability report artifacts to your local machine.

## Evaluate image prediction models


The **Overview** tab shows you the **Per label performance**, which gives you an overall accuracy score for the images predicted for each label. You can choose a label to see more specific details, such as the **Correctly predicted** and **Incorrectly predicted** images for the label.

You can turn on the **Heatmap** toggle to see a heatmap for each image. The heatmap shows you the areas of interest that have the most impact when your model is making predictions. For more information about heatmaps and how to use them to improve your model, choose the **More info** icon next to the **Heatmap** toggle.

The **Scoring** tab for single-label image prediction models shows you a comparison of what the model predicted as the label versus what the actual label was. You can select up to 10 labels at a time. You can change the labels in the visualization by choosing the labels dropdown menu and selecting or deselecting labels.

You can also view insights for individual labels or groups of labels, such as the three labels with the highest or lowest accuracy, by choosing the **View scores for** dropdown menu in the **Model accuracy insights** section.

The following screenshot shows the **Scoring** information for a single-label image prediction model.

![\[The actual versus predicted labels on the Scoring page for a multi-category text prediction model.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/analyze-image-scoring.png)


## Evaluate text prediction models


The **Overview** tab shows you the **Per label performance**, which gives you an overall accuracy score for the passages of text predicted for each label. You can choose a label to see more specific details, such as the **Correctly predicted** and **Incorrectly predicted** passages for the label.

The **Scoring** tab for multi-category text prediction models shows you a comparison of what the model predicted as the label versus what the actual label was.

In the **Model accuracy insights** section, you can see the **Most frequent category**, which tells you the category that the model predicted most frequently and how accurate those predictions were. If you model predicts a label of **Positive** correctly 99% of the time, then you can be fairly confident that your model is good at predicting positive sentiment in text.

The following screenshot shows the **Scoring** information for a multi-category text prediction model.

![\[The actual versus predicted labels on the Scoring page for a single-label image prediction model.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/analyze-text-scoring.png)


# Use advanced metrics in your analyses


The following section describes how to find and interpret the advanced metrics for your model in Amazon SageMaker Canvas.

**Note**  
Advanced metrics are only currently available for numeric and categorical prediction models.

To find the **Advanced metrics** tab, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model that you built.

1. In the top navigation pane, choose the **Analyze** tab.

1. Within the **Analyze** tab, choose the **Advanced metrics** tab.

In the **Advanced metrics** tab, you can find the **Performance** tab. The page looks like the following screenshot.

![\[Screenshot of the advanced metrics tab for a categorical prediction model.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/canvas-analyze-performance.png)


At the top, you can see an overview of the metrics scores, including the **Optimization metric**, which is the metric that you selected (or that Canvas selected by default) to optimize when building the model.

The following sections describe more detailed information for the **Performance** tab within the **Advanced metrics**.

## Performance


In the **Performance** tab, you’ll see a **Metrics table**, along with visualizations that Canvas creates based on your model type. For categorical prediction models, Canvas provides a *confusion matrix*, whereas for numeric prediction models, Canvas provides you with *residuals* and *error density* charts.

In the **Metrics table**, you are provided with a full list of your model’s scores for each advanced metric, which is more comprehensive than the scores overview at the top of the page. The metrics shown here depend on your model type. For a reference to help you understand and interpret each metric, see [Metrics reference](canvas-metrics.md).

To understand the visualizations that might appear based on your model type, see the following options:
+ **Confusion matrix** – Canvas uses confusion matrices to help you visualize when a model makes predictions correctly. In a confusion matrix, your results are arranged to compare the predicted values against the actual values. The following example explains how a confusion matrix works for a 2 category prediction model that predicts positive and negative labels:
  + True positive – The model correctly predicted positive when the true label was positive.
  + True negative – The model correctly predicted negative when the true label was negative.
  + False positive – The model incorrectly predicted positive when the true label was negative.
  + False negative – The model incorrectly predicted negative when the true label was positive.
+ **Precision recall curve** – The precision recall curve is a visualization of the model’s precision score plotted against the model’s recall score. Generally, a model that can make perfect predictions would have precision and recall scores that are both 1. The precision recall curve for a decently accurate model is fairly high in both precision and recall.
+ **Residuals** – Residuals are the difference between the actual values and the values predicted by the model. A residuals chart plots the residuals against the corresponding values to visualize their distribution and any patterns or outliers. A normal distribution of residuals around zero indicates that the model is a good fit for the data. However, if the residuals are significantly skewed or have outliers, it may indicate that the model is overfitting the data or that there are other issues that need to be addressed.
+ **Error density** – An error density plot is a representation of the distribution of errors made by a model. It shows the probability density of the errors at each point, helping you to identify any areas where the model may be overfitting or making systematic errors.

# View model candidates in the model leaderboard


When you do a [Standard build](https://docs.amazonaws.cn/sagemaker/latest/dg/canvas-build-model.html) for tabular and time series forecasting models in Amazon SageMaker Canvas, SageMaker AI trains multiple *model candidates* (different iterations of the model) and by default selects the one with the highest value for the optimization metric. For tabular models, Canvas builds up to 250 different model candidates using various algorithms and hyperparameter settings. For time series forecasting models, Canvas builds 7 different models—one for each of the [supported forecasting algorithms](canvas-advanced-settings.md#canvas-advanced-settings-time-series) and one ensemble model that averages the predictions of the other models to try to optimize accuracy.

The default model candidate is the only version that you can use in Canvas for actions like making predictions, registering to the model registry, or deploying to an endpoint. However, you might want to review all of the model candidates and select a different candidate to be the default model. You can view all of the model candidates and more details about each candidate on the **Model leaderboard** in Canvas.

To view the **Model leaderboard**, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **My models**.

1. Choose the model that you built.

1. In the top navigation pane, choose the **Analyze** tab.

1. Within the **Analyze** tab, choose **Model leaderboard.**

The **Model leaderboard** page opens, which for tabular models looks like the following screenshot.

![\[The model leaderboard, which lists all of the model candidates that Canvas trained.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/canvas-model-leaderboard.png)


For time series forecasting models, you see 7 models, which include one for each of the time series forecasting algorithms supported by Canvas and one ensemble model. For more information about the algorithms, see [Advanced time series forecasting model settings](canvas-advanced-settings.md#canvas-advanced-settings-time-series).

In the preceding screenshot, you can see that the first model candidate listed is marked as the **Default model**. This is the model candidate with which you can make predictions or deploy to endpoints.

To view more detailed metrics information about the model candidates to compare them, you can choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) and choose **View model details**.

**Important**  
 Loading the model details for non-default model candidates may take a few minutes (typically less than 10 minutes), and SageMaker AI Hosting charges apply. For more information, see [SageMaker AI Pricing](https://www.amazonaws.cn/sagemaker/pricing/).

The model candidate opens in the **Analyze** tab, and the metrics shown are specific to that model candidate. When you’re done reviewing the model candidate’s metrics, you can go back or exit the view to return to the **Model leaderboard**.

If you’d like to set the **Default model** to a different candidate, you can choose the **More options** icon (![\[Vertical ellipsis icon representing a menu or more options.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/studio/canvas/more-options-icon.png)) and choose **Change to default model**. Changing the default model for a model trained using HPO mode might take several minutes.

**Note**  
If your model is already deployed in production, [registered to the model registry](https://docs.amazonaws.cn/sagemaker/latest/dg/canvas-register-model.html), or has [automations](https://docs.amazonaws.cn/sagemaker/latest/dg/canvas-manage-automations.html) set up, you must delete your deployment, model registration, or automations before changing the default model.

# Metrics reference


The following sections describe the metrics that are available in Amazon SageMaker Canvas for each model type.

## Metrics for numeric prediction


The following list defines the metrics for numeric prediction in SageMaker Canvas and gives you information about how you can use them.
+ InferenceLatency – The approximate amount of time between making a request for a model prediction to receiving it from a real-time endpoint to which the model is deployed. This metric is measured in seconds and is only available for models built with the **Ensembling**mode.
+ MAE – Mean absolute error. On average, the prediction for the target column is \$1/- \$1MAE\$1 from the actual value.

  Measures how different the predicted and actual values are when they're averaged over all values. MAE is commonly used in numeric prediction to understand model prediction error. If the predictions are linear, MAE represents the average distance from a predicted line to the actual value. MAE is defined as the sum of absolute errors divided by the number of observations. Values range from 0 to infinity, with smaller numbers indicating a better model fit to the data.
+ MAPE – Mean absolute percent error. On average, the prediction for the target column is \$1/- \$1MAPE\$1 % from the actual value.

  MAPE is the mean of the absolute differences between the actual values and the predicted or estimated values, divided by the actual values and expressed as a percentage. A lower MAPE indicates better performance, as it means that the predicted or estimated values are closer to the actual values.
+ MSE – Mean squared error, or the average of the squared differences between the predicted and actual values.

  MSE values are always positive. The better a model is at predicting the actual values, the smaller the MSE value is.
+ R2 – The percentage of the difference in the target column that can be explained by the input column.

  Quantifies how much a model can explain the variance of a dependent variable. Values range from one (1) to negative one (-1). Higher numbers indicate a higher fraction of explained variability. Values close to zero (0) indicate that very little of the dependent variable can be explained by the model. Negative values indicate a poor fit and that the model is outperformed by a constant function (or a horizontal line).
+ RMSE – Root mean squared error, or the standard deviation of the errors.

  Measures the square root of the squared difference between predicted and actual values, and is averaged over all values. It is used to understand model prediction error, and it's an important metric to indicate the presence of large model errors and outliers. Values range from zero (0) to infinity, with smaller numbers indicating a better model fit to the data. RMSE is dependent on scale, and should not be used to compare datasets of different types.

## Metrics for categorical prediction


This section defines the metrics for categorical prediction in SageMaker Canvas and gives you information about how you can use them.

The following is a list of available metrics for 2-category prediction:
+ Accuracy – The percentage of correct predictions.

  Or, the ratio of the number of correctly predicted items to the total number of predictions. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates complete inaccuracy.
+ AUC – A value between 0 and 1 that indicates how well your model is able to separate the categories in your dataset. A value of 1 indicates that it was able to separate the categories perfectly.
+ BalancedAccuracy – Measures the ratio of accurate predictions to all predictions.

  This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is defined as follows: `0.5*((TP/P)+(TN/N))`, with values ranging from 0 to 1. The balanced accuracy metric gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.
+ F1 – A balanced measure of accuracy that takes class balance into account.

  It is the harmonic mean of the precision and recall scores, defined as follows: `F1 = 2 * (precision * recall) / (precision + recall)`. F1 scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
+ InferenceLatency – The approximate amount of time between making a request for a model prediction to receiving it from a real-time endpoint to which the model is deployed. This metric is measured in seconds and is only available for models built with the **Ensembling**mode.
+ LogLoss – Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
+ Precision – Of all the times that \$1category x\$1 was predicted, the prediction was correct \$1precision\$1% of the time.

  Precision measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies. It is defined as follows: `Precision = TP/(TP+FP)`, with values ranging from zero (0) to one (1). Precision is an important metric when the cost of a false positive is high. For example, the cost of a false positive is very high if an airplane safety system is falsely deemed safe to fly. A false positive (FP) reflects a positive prediction that is actually negative in the data.
+ Recall – The model correctly predicted \$1recall\$1% to be \$1category x\$1 when \$1target\$1column\$1 was actually \$1category x\$1.

  Recall measures how well an algorithm correctly predicts all of the true positives (TP) in a dataset. A true positive is a positive prediction that is also an actual positive value in the data. Recall is defined as follows: `Recall = TP/(TP+FN)`, with values ranging from 0 to 1. Higher scores reflect a better ability of the model to predict true positives (TP) in the data. Note that it is often insufficient to measure only recall, because predicting every output as a true positive yields a perfect recall score.

The following is a list of available metrics for 3\$1 category prediction:
+ Accuracy – The percentage of correct predictions.

  Or, the ratio of the number of correctly predicted items to the total number of predictions. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates complete inaccuracy.
+ BalancedAccuracy – Measures the ratio of accurate predictions to all predictions.

  This ratio is calculated after normalizing true positives (TP) and true negatives (TN) by the total number of positive (P) and negative (N) values. It is defined as follows: `0.5*((TP/P)+(TN/N))`, with values ranging from 0 to 1. The balanced accuracy metric gives a better measure of accuracy when the number of positives or negatives differ greatly from each other in an imbalanced dataset, such as when only 1% of email is spam.
+ F1macro – The F1macro score applies F1 scoring by calculating the precision and recall, and then taking their harmonic mean to calculate the F1 score for each class. Then, the F1macro averages the individual scores to obtain the F1macro score. F1macro scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
+ InferenceLatency – The approximate amount of time between making a request for a model prediction to receiving it from a real-time endpoint to which the model is deployed. This metric is measured in seconds and is only available for models built with the **Ensembling**mode.
+ LogLoss – Log loss, also known as cross-entropy loss, is a metric used to evaluate the quality of the probability outputs, rather than the outputs themselves. Log loss is an important metric to indicate when a model makes incorrect predictions with high probabilities. Values range from 0 to infinity. A value of 0 represents a model that perfectly predicts the data.
+ PrecisionMacro – Measures precision by calculating precision for each class and averaging scores to obtain precision for several classes. Scores range from zero (0) to one (1). Higher scores reflect the model's ability to predict true positives (TP) out of all of the positives that it identifies, averaged across multiple classes.
+ RecallMacro – Measures recall by calculating recall for each class and averaging scores to obtain recall for several classes. Scores range from 0 to 1. Higher scores reflect the model's ability to predict true positives (TP) in a dataset, whereas a true positive reflects a positive prediction that is also an actual positive value in the data. It is often insufficient to measure only recall, because predicting every output as a true positive will yield a perfect recall score.

Note that for 3\$1 category prediction, you also receive the average F1, Accuracy, Precision, and Recall metrics. The scores for these metrics are just the metric scores averaged for all categories.

## Metrics for image and text prediction


The following is a list of available metrics for image prediction and text prediction.
+ Accuracy – The percentage of correct predictions.

  Or, the ratio of the number of correctly predicted items to the total number of predictions. Accuracy measures how close the predicted class values are to the actual values. Values for accuracy metrics vary between zero (0) and one (1). A value of 1 indicates perfect accuracy, and 0 indicates complete inaccuracy.
+ F1 – A balanced measure of accuracy that takes class balance into account.

  It is the harmonic mean of the precision and recall scores, defined as follows: `F1 = 2 * (precision * recall) / (precision + recall)`. F1 scores vary between 0 and 1. A score of 1 indicates the best possible performance, and 0 indicates the worst.
+ Precision – Of all the times that \$1category x\$1 was predicted, the prediction was correct \$1precision\$1% of the time.

  Precision measures how well an algorithm predicts the true positives (TP) out of all of the positives that it identifies. It is defined as follows: `Precision = TP/(TP+FP)`, with values ranging from zero (0) to one (1). Precision is an important metric when the cost of a false positive is high. For example, the cost of a false positive is very high if an airplane safety system is falsely deemed safe to fly. A false positive (FP) reflects a positive prediction that is actually negative in the data.
+ Recall – The model correctly predicted \$1recall\$1% to be \$1category x\$1 when \$1target\$1column\$1 was actually \$1category x\$1.

  Recall measures how well an algorithm correctly predicts all of the true positives (TP) in a dataset. A true positive is a positive prediction that is also an actual positive value in the data. Recall is defined as follows: `Recall = TP/(TP+FN)`, with values ranging from 0 to 1. Higher scores reflect a better ability of the model to predict true positives (TP) in the data. Note that it is often insufficient to measure only recall, because predicting every output as a true positive yields a perfect recall score.

Note that for image and text prediction models where you are predicting 3 or more categories, you also receive the *average* F1, Accuracy, Precision, and Recall metrics. The scores for these metrics are just the metric scores average for all categories.

## Metrics for time series forecasts


The following defines the advanced metrics for time series forecasts in Amazon SageMaker Canvas and gives you information about how you can use them.
+ Average Weighted Quantile Loss (wQL) – Evaluates the forecast by averaging the accuracy at the P10, P50, and P90 quantiles. A lower value indicates a more accurate model.
+ Weighted Absolute Percent Error (WAPE) – The sum of the absolute error normalized by the sum of the absolute target, which measures the overall deviation of forecasted values from observed values. A lower value indicates a more accurate model, where WAPE = 0 is a model with no errors.
+ Root Mean Square Error (RMSE) – The square root of the average squared errors. A lower RMSE indicates a more accurate model, where RMSE = 0 is a model with no errors.
+ Mean Absolute Percent Error (MAPE) – The percentage error (percent difference of the mean forecasted value versus the actual value) averaged over all time points. A lower value indicates a more accurate model, where MAPE = 0 is a model with no errors.
+ Mean Absolute Scaled Error (MASE) – The mean absolute error of the forecast normalized by the mean absolute error of a simple baseline forecasting method. A lower value indicates a more accurate model, where MASE < 1 is estimated to be better than the baseline and MASE > 1 is estimated to be worse than the baseline.