

# Post-training Data and Model Bias Metrics


Amazon SageMaker Clarify provides eleven post-training data and model bias metrics to help quantify various conceptions of fairness. These concepts cannot all be satisfied simultaneously and the selection depends on specifics of the cases involving potential bias being analyzed. Most of these metrics are a combination of the numbers taken from the binary classification confusion matrices for the different demographic groups. Because fairness and bias can be defined by a wide range of metrics, human judgment is required to understand and choose which metrics are relevant to the individual use case, and customers should consult with appropriate stakeholders to determine the appropriate measure of fairness for their application.

We use the following notation to discuss the bias metrics. The conceptual model described here is for binary classification, where events are labeled as having only two possible outcomes in their sample space, referred to as positive (with value 1) and negative (with value 0). This framework is usually extensible to multicategory classification in a straightforward way or to cases involving continuous valued outcomes when needed. In the binary classification case, positive and negative labels are assigned to outcomes recorded in a raw dataset for a favored facet *a* and for a disfavored facet *d*. These labels y are referred to as *observed labels* to distinguish them from the *predicted labels* y' that are assigned by a machine learning model during the training or inferences stages of the ML lifecycle. These labels are used to define probability distributions Pa(y) and Pd(y) for their respective facet outcomes. 
+ labels: 
  + y represents the n observed labels for event outcomes in a training dataset.
  + y' represents the predicted labels for the n observed labels in the dataset by a trained model.
+ outcomes:
  + A positive outcome (with value 1) for a sample, such as an application acceptance.
    + n(1) is the number of observed labels for positive outcomes (acceptances).
    + n'(1) is the number of predicted labels for positive outcomes (acceptances).
  + A negative outcome (with value 0) for a sample, such as an application rejection.
    + n(0) is the number of observed labels for negative outcomes (rejections).
    + n'(0) is the number of predicted labels for negative outcomes (rejections).
+ facet values:
  + facet *a* – The feature value that defines a demographic that bias favors.
    + na is the number of observed labels for the favored facet value: na = na(1) \$1 na(0) the sum of the positive and negative observed labels for the value facet *a*.
    + n'a is the number of predicted labels for the favored facet value: n'a = n'a(1) \$1 n'a(0) the sum of the positive and negative predicted outcome labels for the facet value *a*. Note that n'a = na.
  + facet *d* – The feature value that defines a demographic that bias disfavors.
    + nd is the number of observed labels for the disfavored facet value: nd = nd(1) \$1 nd(0) the sum of the positive and negative observed labels for the facet value *d*. 
    + n'd is the number of predicted labels for the disfavored facet value: n'd = n'd(1) \$1 n'd(0) the sum of the positive and negative predicted labels for the facet value *d*. Note that n'd = nd.
+ probability distributions for outcomes of the labeled facet data outcomes:
  + Pa(y) is the probability distribution of the observed labels for facet *a*. For binary labeled data, this distribution is given by the ratio of the number of samples in facet *a* labeled with positive outcomes to the total number, Pa(y1) = na(1)/ na, and the ratio of the number of samples with negative outcomes to the total number, Pa(y0) = na(0)/ na. 
  + Pd(y) is the probability distribution of the observed labels for facet *d*. For binary labeled data, this distribution is given by the number of samples in facet *d* labeled with positive outcomes to the total number, Pd(y1) = nd(1)/ nd, and the ratio of the number of samples with negative outcomes to the total number, Pd(y0) = nd(0)/ nd. 

The following table contains a cheat sheet for quick guidance and links to the post-training bias metrics.

Post-training bias metrics


| Post-training bias metric | Description | Example question | Interpreting metric values | 
| --- | --- | --- | --- | 
| [Difference in Positive Proportions in Predicted Labels (DPPL)](clarify-post-training-bias-metric-dppl.md) | Measures the difference in the proportion of positive predictions between the favored facet a and the disfavored facet d. |  Has there been an imbalance across demographic groups in the predicted positive outcomes that might indicate bias?  |  Range for normalized binary & multicategory facet labels: `[-1,+1]` Range for continuous labels: (-∞, \$1∞) Interpretation: [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html)  | 
| [Disparate Impact (DI)](clarify-post-training-bias-metric-di.md) | Measures the ratio of proportions of the predicted labels for the favored facet a and the disfavored facet d. | Has there been an imbalance across demographic groups in the predicted positive outcomes that might indicate bias? |  Range for normalized binary, multicategory facet, and continuous labels: [0,∞) Interpretation: [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html)  | 
| [Conditional Demographic Disparity in Predicted Labels (CDDPL)](clarify-post-training-bias-metric-cddpl.md)  | Measures the disparity of predicted labels between the facets as a whole, but also by subgroups. | Do some demographic groups have a larger proportion of rejections for loan application outcomes than their proportion of acceptances? |  The range of CDDPL values for binary, multicategory, and continuous outcomes: `[-1, +1]` [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html)  | 
| [Counterfactual Fliptest (FT)](clarify-post-training-bias-metric-ft.md)  | Examines each member of facet d and assesses whether similar members of facet a have different model predictions. | Is one group of a specific-age demographic matched closely on all features with a different age group, yet paid more on average? | The range for binary and multicategory facet labels is [-1, \$11]. [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 
| [Accuracy Difference (AD)](clarify-post-training-bias-metric-ad.md)  | Measures the difference between the prediction accuracy for the favored and disfavored facets.  | Does the model predict labels as accurately for applications across all demographic groups? | The range for binary and multicategory facet labels is [-1, \$11].[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 
| [Recall Difference (RD)](clarify-post-training-bias-metric-rd.md)  | Compares the recall of the model for the favored and disfavored facets.  | Is there an age-based bias in lending due to a model having higher recall for one age group as compared to another? |  Range for binary and multicategory classification: `[-1, +1]`. [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html)  | 
| [Difference in Conditional Acceptance (DCAcc)](clarify-post-training-bias-metric-dcacc.md)  | Compares the observed labels to the labels predicted by a model. Assesses whether this is the same across facets for predicted positive outcomes (acceptances).  | When comparing one age group to another, are loans accepted more frequently, or less often than predicted (based on qualifications)? |  The range for binary, multicategory facet, and continuous labels: (-∞, \$1∞). [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html)  | 
| [Difference in Acceptance Rates (DAR)](clarify-post-training-bias-metric-dar.md)  | Measures the difference in the ratios of the observed positive outcomes (TP) to the predicted positives (TP \$1 FP) between the favored and disfavored facets. | Does the model have equal precision when predicting loan acceptances for qualified applicants across all age groups? | The range for binary, multicategory facet, and continuous labels is [-1, \$11].[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 
| [Specificity difference (SD)](clarify-post-training-bias-metric-sd.md)  | Compares the specificity of the model between favored and disfavored facets.  | Is there an age-based bias in lending because the model predicts a higher specificity for one age group as compared to another? |  Range for binary and multicategory classification: `[-1, +1]`. [\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html)  | 
| [Difference in Conditional Rejection (DCR)](clarify-post-training-bias-metric-dcr.md)  | Compares the observed labels to the labels predicted by a model and assesses whether this is the same across facets for negative outcomes (rejections). | Are there more or less rejections for loan applications than predicted for one age group as compared to another based on qualifications? | The range for binary, multicategory facet, and continuous labels: (-∞, \$1∞).[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 
| [Difference in Rejection Rates (DRR)](clarify-post-training-bias-metric-drr.md)  | Measures the difference in the ratios of the observed negative outcomes (TN) to the predicted negatives (TN \$1 FN) between the disfavored and favored facets. | Does the model have equal precision when predicting loan rejections for unqualified applicants across all age groups? | The range for binary, multicategory facet, and continuous labels is [-1, \$11].[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 
| [Treatment Equality (TE)](clarify-post-training-bias-metric-te.md)  | Measures the difference in the ratio of false positives to false negatives between the favored and disfavored facets. | In loan applications, is the relative ratio of false positives to false negatives the same across all age demographics?  | The range for binary and multicategory facet labels: (-∞, \$1∞).[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 
| [Generalized entropy (GE)](clarify-post-training-bias-metric-ge.md)  | Measures the inequality in benefits b assigned to each input by the model predictions. | Of two candidate models for loan application classification, does one lead to a more uneven distribution of desired outcomes than the other? | The range for binary and multicategory labels: (0, 0.5). GE is undefined when the model predicts only false negatives.[\[See the AWS documentation website for more details\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/clarify-measure-post-training-bias.html) | 

For additional information about post-training bias metrics, see [A Family of Fairness Measures for Machine Learning in Finance](https://pages.awscloud.com/rs/112-TZM-766/images/Fairness.Measures.for.Machine.Learning.in.Finance.pdf).

**Topics**
+ [

# Difference in Positive Proportions in Predicted Labels (DPPL)
](clarify-post-training-bias-metric-dppl.md)
+ [

# Disparate Impact (DI)
](clarify-post-training-bias-metric-di.md)
+ [

# Difference in Conditional Acceptance (DCAcc)
](clarify-post-training-bias-metric-dcacc.md)
+ [

# Difference in Conditional Rejection (DCR)
](clarify-post-training-bias-metric-dcr.md)
+ [

# Specificity difference (SD)
](clarify-post-training-bias-metric-sd.md)
+ [

# Recall Difference (RD)
](clarify-post-training-bias-metric-rd.md)
+ [

# Difference in Acceptance Rates (DAR)
](clarify-post-training-bias-metric-dar.md)
+ [

# Difference in Rejection Rates (DRR)
](clarify-post-training-bias-metric-drr.md)
+ [

# Accuracy Difference (AD)
](clarify-post-training-bias-metric-ad.md)
+ [

# Treatment Equality (TE)
](clarify-post-training-bias-metric-te.md)
+ [

# Conditional Demographic Disparity in Predicted Labels (CDDPL)
](clarify-post-training-bias-metric-cddpl.md)
+ [

# Counterfactual Fliptest (FT)
](clarify-post-training-bias-metric-ft.md)
+ [

# Generalized entropy (GE)
](clarify-post-training-bias-metric-ge.md)

# Difference in Positive Proportions in Predicted Labels (DPPL)


The difference in positive proportions in predicted labels (DPPL) metric determines whether the model predicts outcomes differently for each facet. It is defined as the difference between the proportion of positive predictions (y’ = 1) for facet *a* and the proportion of positive predictions (y’ = 1) for facet *d*. For example, if the model predictions grant loans to 60% of a middle-aged group (facet *a*) and 50% other age groups (facet *d*), it might be biased against facet *d*. In this example, you must determine whether the 10% difference is material to a case for bias. 

A comparison of difference in proportions of labels (DPL), a measure of pre-training bias, with DPPL, a measure of post-training bias, assesses whether bias in positive proportions that are initially present in the dataset changes after training. If DPPL is larger than DPL, then bias in positive proportions increased after training. If DPPL is smaller than DPL, the model did not increase bias in positive proportions after training. Comparing DPL against DPPL does not guarantee that the model reduces bias along all dimensions. For example, the model may still be biased when considering other metrics such as [Counterfactual Fliptest (FT)](clarify-post-training-bias-metric-ft.md) or [Accuracy Difference (AD)](clarify-post-training-bias-metric-ad.md). For more information about bias detection, see the blog post [Learn how Amazon SageMaker Clarify helps detect bias](https://www.amazonaws.cn/blogs/machine-learning/learn-how-amazon-sagemaker-clarify-helps-detect-bias/). See [Difference in Proportions of Labels (DPL)](clarify-data-bias-metric-true-label-imbalance.md) for more information about DPL.

The formula for the DPPL is:



        DPPL = q'a - q'd

Where:
+ q'a = n'a(1)/na is the predicted proportion of facet *a* who get a positive outcome of value 1. In our example, the proportion of a middle-aged facet predicted to get granted a loan. Here n'a(1) represents the number of members of facet *a* who get a positive predicted outcome of value 1 and na the is number of members of facet *a*. 
+ q'd = n'd(1)/nd is the predicted proportion of facet *d* who get a positive outcome of value 1. In our example, a facet of older and younger people predicted to get granted a loan. Here n'd(1) represents the number of members of facet *d* who get a positive predicted outcome and nd the is number of members of facet *d*. 

If DPPL is close enough to 0, it means that post-training *demographic parity* has been achieved.

For binary and multicategory facet labels, the normalized DPL values range over the interval [-1, 1]. For continuous labels, the values vary over the interval (-∞, \$1∞). 
+ Positive DPPL values indicate that facet *a* has a higher proportion of predicted positive outcomes when compared with facet *d*. 

  This is referred to as *positive bias*.
+ Values of DPPL near zero indicate a more equal proportion of predicted positive outcomes between facets *a* and *d* and a value of zero indicates perfect demographic parity. 
+ Negative DPPL values indicate that facet *d* has a higher proportion of predicted positive outcomes when compared with facet *a*. This is referred to as *negative bias*.

# Disparate Impact (DI)


The difference in positive proportions in the predicted labels metric can be assessed in the form of a ratio.

The comparison of positive proportions in predicted labels metric can be assessed in the form of a ratio instead of as a difference, as it is with the [Difference in Positive Proportions in Predicted Labels (DPPL)](clarify-post-training-bias-metric-dppl.md). The disparate impact (DI) metric is defined as the ratio of the proportion of positive predictions (y’ = 1) for facet *d* over the proportion of positive predictions (y’ = 1) for facet *a*. For example, if the model predictions grant loans to 60% of a middle-aged group (facet *a*) and 50% other age groups (facet *d*), then DI = .5/.6 = 0.8, which indicates a positive bias and an adverse impact on the other aged group represented by facet *d*.

The formula for the ratio of proportions of the predicted labels:



        DI = q'd/q'a

Where:
+ q'a = n'a(1)/na is the predicted proportion of facet *a* who get a positive outcome of value 1. In our example, the proportion of a middle-aged facet predicted to get granted a loan. Here n'a(1) represents the number of members of facet *a* who get a positive predicted outcome and na the is number of members of facet *a*. 
+ q'd = n'd(1)/nd is the predicted proportion of facet *d* a who get a positive outcome of value 1. In our example, a facet of older and younger people predicted to get granted a loan. Here n'd(1) represents the number of members of facet *d* who get a positive predicted outcome and nd the is number of members of facet *d*. 

For binary, multicategory facet, and continuous labels, the DI values range over the interval [0, ∞).
+ Values less than 1 indicate that facet *a* has a higher proportion of predicted positive outcomes than facet *d*. This is referred to as *positive bias*.
+ A value of 1 indicates demographic parity. 
+ Values greater than 1 indicate that facet *d* has a higher proportion of predicted positive outcomes than facet *a*. This is referred to as *negative bias*.

# Difference in Conditional Acceptance (DCAcc)


This metric compares the observed labels to the labels predicted by the model and assesses whether this is the same across facets for predicted positive outcomes. This metric comes close to mimicking human bias in that it quantifies how many more positive outcomes a model predicted (labels y’) for a certain facet as compared to what was observed in the training dataset (labels y). For example, if there were more acceptances (a positive outcome) observed in the training dataset for loan applications for a middle-aged group (facet *a*) than predicted by the model based on qualifications as compared to the facet containing other age groups (facet *d*), this might indicate potential bias in the way loans were approved favoring the middle-aged group. 

The formula for the difference in conditional acceptance:

        DCAcc = ca - cd

Where:
+ ca = na(1)/ n'a(1) is the ratio of the observed number of positive outcomes of value 1 (acceptances) of facet *a* to the predicted number of positive outcome (acceptances) for facet *a*. 
+ cd = nd(1)/ n'd(1) is the ratio of the observed number of positive outcomes of value 1 (acceptances) of facet *d* to the predicted number of predicted positive outcomes (acceptances) for facet *d*. 

The DCAcc metric can capture both positive and negative biases that reveal preferential treatment based on qualifications. Consider the following instances of age-based bias on loan acceptances.

**Example 1: Positive bias** 

Suppose we have dataset of 100 middle-aged people (facet *a*) and 50 people from other age groups (facet *d*) who applied for loans, where the model recommended that 60 from facet *a* and 30 from facet *d* be given loans. So the predicted proportions are unbiased with respect to the DPPL metric, but the observed labels show that 70 from facet *a* and 20 from facet *d* were granted loans. In other words, the model granted loans to 17% fewer from the middle aged facet than the observed labels in the training data suggested (70/60 = 1.17) and granted loans to 33% more from other age groups than the observed labels suggested (20/30 = 0.67). The calculation of the DCAcc value gives the following:

        DCAcc = 70/60 - 20/30 = 1/2

The positive value indicates that there is a potential bias against the middle-aged facet *a* with a lower acceptance rate as compared with the other facet *d* than the observed data (taken as unbiased) indicate is the case.

**Example 2: Negative bias** 

Suppose we have dataset of 100 middle-aged people (facet *a*) and 50 people from other age groups (facet *d*) who applied for loans, where the model recommended that 60 from facet *a* and 30 from facet *d* be given loans. So the predicted proportions are unbiased with respect to the DPPL metric, but the observed labels show that 50 from facet *a* and 40 from facet *d* were granted loans. In other words, the model granted loans to 17% fewer from the middle aged facet than the observed labels in the training data suggested (50/60 = 0.83), and granted loans to 33% more from other age groups than the observed labels suggested (40/30 = 1.33). The calculation of the DCAcc value gives the following:

        DCAcc = 50/60 - 40/30 = -1/2

The negative value indicates that there is a potential bias against facet *d* with a lower acceptance rate as compared with the middle-aged facet *a* than the observed data (taken as unbiased) indicate is the case.

Note that you can use DCAcc to help you detect potential (unintentional) biases by humans overseeing the model predictions in a human-in-the-loop setting. Assume, for example, that the predictions y' by the model were unbiased, but the eventual decision is made by a human (possibly with access to additional features) who can alter the model predictions to generate a new and final version of y'. The additional processing by the human may unintentionally deny loans to a disproportionate number from one facet. DCAcc can help detect such potential biases.

The range of values for differences in conditional acceptance for binary, multicategory facet, and continuous labels is (-∞, \$1∞).
+ Positive values occur when the ratio of the observed number of acceptances compared to predicted acceptances for facet *a* is higher than the same ratio for facet *d*. These values indicate a possible bias against the qualified applicants from facet *a*. The larger the difference of the ratios, the more extreme the apparent bias.
+ Values near zero occur when the ratio of the observed number of acceptances compared to predicted acceptances for facet *a* is the similar to the ratio for facet *d*. These values indicate that predicted acceptance rates are consistent with the observed values in the labeled data and that qualified applicants from both facets are being accepted in a similar way. 
+ Negative values occur when the ratio of the observed number of acceptances compared to predicted acceptances for facet *a* is less than that ratio for facet *d*. These values indicate a possible bias against the qualified applicants from facet *d*. The more negative the difference in the ratios, the more extreme the apparent bias.

# Difference in Conditional Rejection (DCR)


This metric compares the observed labels to the labels predicted by the model and assesses whether this is the same across facets for negative outcomes (rejections). This metric comes close to mimicking human bias, in that it quantifies how many more negative outcomes a model granted (predicted labels y’) to a certain facet as compared to what was suggested by the labels in the training dataset (observed labels y). For example, if there were more observed rejections (a negative outcome) for loan applications for a middle-aged group (facet *a*) than predicted by the model based on qualifications as compared to the facet containing other age groups (facet *d*), this might indicate potential bias in the way loans were rejected that favored the middle-aged group over other groups.

The formula for the difference in conditional acceptance:

        DCR = rd - ra

Where:
+ rd = nd(0)/ n'd(0) is the ratio of the observed number of negative outcomes of value 0 (rejections) of facet *d* to the predicted number of negative outcome (rejections) for facet *d*. 
+ ra = na(0)/ n'a(0) is the ratio of the observed number of negative outcomes of value 0 (rejections) of facet *a* to the predicted number of negative outcome of value 0 (rejections) for facet *a*. 

The DCR metric can capture both positive and negative biases that reveal preferential treatment based on qualifications. Consider the following instances of age-based bias on loan rejections.

**Example 1: Positive bias** 

Suppose we have dataset of 100 middle-aged people (facet *a*) and 50 people from other age groups (facet *d*) who applied for loans, where the model recommended that 60 from facet *a* and 30 from facet *d* be rejected for loans. So the predicted proportions are unbiased by the DPPL metric, but the observed labels show that 50 from facet *a* and 40 from facet *d* were rejected. In other words, the model rejected 17% more loans from the middle aged facet than the observed labels in the training data suggested (50/60 = 0.83), and rejected 33% fewer loans from other age groups than the observed labels suggested (40/30 = 1.33). The DCR value quantifies this difference in the ratio of observed to predicted rejection rates between the facets. The positive value indicates that there is a potential bias favoring the middle aged group with lower rejection rates as compared with other groups than the observed data (taken as unbiased) indicate is the case.

        DCR = 40/30 - 50/60 = 1/2

**Example 2: Negative bias** 

Suppose we have dataset of 100 middle-aged people (facet *a*) and 50 people from other age groups (facet *d*) who applied for loans, where the model recommended that 60 from facet *a* and 30 from facet *d* be rejected for loans. So the predicted proportions are unbiased by the DPPL metric, but the observed labels show that 70 from facet *a* and 20 from facet *d* were rejected. In other words, the model rejected 17% fewer loans from the middle aged facet than the observed labels in the training data suggested (70/60 = 1.17), and rejected 33% more loans from other age groups than the observed labels suggested (20/30 = 0.67). The negative value indicates that there is a potential bias favoring facet *a* with lower rejection rates as compared with the middle-aged facet *a* than the observed data (taken as unbiased) indicate is the case.

        DCR = 20/30 - 70/60 = -1/2

The range of values for differences in conditional rejection for binary, multicategory facet, and continuous labels is (-∞, \$1∞).
+ Positive values occur when the ratio of the observed number of rejections compared to predicted rejections for facet *d* is greater than that ratio for facet *a*. These values indicate a possible bias against the qualified applicants from facet *a*. The larger the value of DCR metric, the more extreme the apparent bias.
+ Values near zero occur when the ratio of the observed number of rejections compared to predicted acceptances for facet *a* is the similar to the ratio for facet *d*. These values indicate that predicted rejections rates are consistent with the observed values in the labeled data and that the qualified applicants from both facets are being rejected in a similar way. 
+ Negative values occur when the ratio of the observed number of rejections compared to predicted rejections for facet *d* is less than that ratio facet *a*. These values indicate a possible bias against the qualified applicants from facet *d*. The larger magnitude of the negative DCR metric, the more extreme the apparent bias.

 

# Specificity difference (SD)


The specificity difference (SD) is the difference in specificity between the favored facet *a* and disfavored facet *d*. Specificity measures how often the model correctly predicts a negative outcome (y'=0). Any difference in these specificities is a potential form of bias. 

Specificity is perfect for a facet if all of the y=0 cases are correctly predicted for that facet. Specificity is greater when the model minimizes false positives, known as a Type I error. For example, the difference between a low specificity for lending to facet *a*, and high specificity for lending to facet *d*, is a measure of bias against facet *d*.

The following formula is for the difference in the specificity for facets *a* and *d*.

        SD = TNd/(TNd \$1 FPd) - TNa/(TNa \$1 FPa) = TNRd - TNRa

The following variables used to calculated SD are defined as follows:
+ TNd are the true negatives predicted for facet *d*.
+ FPd are the false positives predicted for facet *d*.
+ TNd are the true negatives predicted for facet *a*.
+ FPd are the false positives predicted for facet *a*.
+ TNRa = TNa/(TNa \$1 FPa) is the true negative rate, also known as the specificity, for facet *a*.
+ TNRd = TNd/(TNd \$1 FPd) is the true negative rate, also known as the specificity, for facet *d*.

For example, consider the following confusion matrices for facets *a* and *d*.

Confusion matrix for the favored facet `a`


| Class a predictions | Actual outcome 0 | Actual outcome 1 | Total  | 
| --- | --- | --- | --- | 
| 0 | 20 | 5 | 25 | 
| 1 | 10 | 65 | 75 | 
| Total | 30 | 70 | 100 | 

Confusion matrix for the disfavored facet `d`


| Class d predictions | Actual outcome 0 | Actual outcome 1 | Total  | 
| --- | --- | --- | --- | 
| 0 | 18 | 7 | 25 | 
| 1 | 5 | 20 | 25 | 
| Total | 23 | 27 | 50 | 

The value of the specificity difference is `SD = 18/(18+5) - 20/(20+10) = 0.7826 - 0.6667 = 0.1159`, which indicates a bias against facet *d*.

The range of values for the specificity difference between facets *a* and *d* for binary and multicategory classification is `[-1, +1]`. This metric is not available for the case of continuous labels. Here is what different values of SD imply:
+ Positive values are obtained when there is higher specificity for facet *d* than for facet *a*. This suggests that the model finds less false positives for facet *d* than for facet *a*. A positive value indicates bias against facet *d*. 
+ Values near zero indicate that the specificity for facets that are being compared is similar. This suggests that the model finds a similar number of false positives in both of these facets and is not biased.
+ Negative values are obtained when there is higher specificity for facet *a* than for facet *d*. This suggests that the model finds more false positives for facet *a* than for facet *d*. A negative value indicates bias against facet *a*. 

# Recall Difference (RD)


The recall difference (RD) metric is the difference in recall of the model between the favored facet *a* and disfavored facet *d*. Any difference in these recalls is a potential form of bias. Recall is the true positive rate (TPR), which measures how often the model correctly predicts the cases that should receive a positive outcome. Recall is perfect for a facet if all of the y=1 cases are correctly predicted as y’=1 for that facet. Recall is greater when the model minimizes false negatives known as the Type II error. For example, how many of the people in two different groups (facets *a* and *d*) that should qualify for loans are detected correctly by the model? If the recall rate is high for lending to facet *a*, but low for lending to facet *d*, the difference provides a measure of this bias against the group belonging to facet *d*. 

The formula for difference in the recall rates for facets *a* and *d*:

        RD = TPa/(TPa \$1 FNa) - TPd/(TPd \$1 FNd) = TPRa - TPRd 

Where:
+ TPa are the true positives predicted for facet *a*.
+ FNa are the false negatives predicted for facet *a*.
+ TPd are the true positives predicted for facet *d*.
+ FNd are the false negatives predicted for facet *d*.
+ TPRa = TPa/(TPa \$1 FNa) is the recall for facet *a*, or its true positive rate.
+ TPRd TPd/(TPd \$1 FNd) is the recall for facet *d*, or its true positive rate.

For example, consider the following confusion matrices for facets *a* and *d*.

Confusion Matrix for the Favored Facet a


| Class a predictions | Actual outcome 0 | Actual outcome 1 | Total  | 
| --- | --- | --- | --- | 
| 0 | 20 | 5 | 25 | 
| 1 | 10 | 65 | 75 | 
| Total | 30 | 70 | 100 | 

Confusion Matrix for the Disfavored Facet d


| Class d predictions | Actual outcome 0 | Actual outcome 1 | Total  | 
| --- | --- | --- | --- | 
| 0 | 18 | 7 | 25 | 
| 1 | 5 | 20 | 25 | 
| Total | 23 | 27 | 50 | 

The value of the recall difference is RD = 65/70 - 20/27 = 0.93 - 0.74 = 0.19 which indicates a bias against facet *d*.

The range of values for the recall difference between facets *a* and *d* for binary and multicategory classification is [-1, \$11]. This metric is not available for the case of continuous labels.
+ Positive values are obtained when there is higher recall for facet *a* than for facet *d*. This suggests that the model finds more of the true positives for facet *a* than for facet *d*, which is a form of bias. 
+ Values near zero indicate that the recall for facets being compared is similar. This suggests that the model finds about the same number of true positives in both of these facets and is not biased.
+ Negative values are obtained when there is higher recall for facet *d* than for facet *a*. This suggests that the model finds more of the true positives for facet *d* than for facet *a*, which is a form of bias. 

# Difference in Acceptance Rates (DAR)


The difference in acceptance rates (DAR) metric is the difference in the ratios of the true positive (TP) predictions to the observed positives (TP \$1 FP) for facets *a* and *d*. This metric measures the difference in the precision of the model for predicting acceptances from these two facets. Precision measures the fraction of qualified candidates from the pool of qualified candidates that are identified as such by the model. If the model precision for predicting qualified applicants diverges between the facets, this is a bias and its magnitude is measured by the DAR.

The formula for difference in acceptance rates between facets *a* and *d*:

        DAR = TPa/(TPa \$1 FPa) - TPd/(TPd \$1 FPd) 

Where:
+ TPa are the true positives predicted for facet *a*.
+ FPa are the false positives predicted for facet *a*.
+ TPd are the true positives predicted for facet *d*.
+ FPd are the false positives predicted for facet *d*.

For example, suppose the model accepts 70 middle-aged applicants (facet *a*) for a loan (predicted positive labels) of whom only 35 are actually accepted (observed positive labels). Also suppose the model accepts 100 applicants from other age demographics (facet *d*) for a loan (predicted positive labels) of whom only 40 are actually accepted (observed positive labels). Then DAR = 35/70 - 40/100 = 0.10, which indicates a potential bias against qualified people from the second age group (facet *d*).

The range of values for DAR for binary, multicategory facet, and continuous labels is [-1, \$11].
+ Positive values occur when the ratio of the predicted positives (acceptances) to the observed positive outcomes (qualified applicants) for facet *a* is larger than the same ratio for facet *d*. These values indicate a possible bias against the disfavored facet *d* caused by the occurrence of relatively more false positives in facet *d*. The larger the difference in the ratios, the more extreme the apparent bias.
+ Values near zero occur when the ratio of the predicted positives (acceptances) to the observed positive outcomes (qualified applicants) for facets *a* and *d* have similar values indicating the observed labels for positive outcomes are being predicted with equal precision by the model.
+ Negative values occur when the ratio of the predicted positives (acceptances) to the observed positive outcomes (qualified applicants) for facet *d* is larger than the ratio facet *a*. These values indicate a possible bias against the favored facet *a* caused by the occurrence of relatively more false positives in facet *a*. The more negative the difference in the ratios, the more extreme the apparent bias.

# Difference in Rejection Rates (DRR)


The difference in rejection rates (DRR) metric is the difference in the ratios of the true negative (TN) predictions to the observed negatives (TN \$1 FN) for facets *a* and *d*. This metric measures the difference in the precision of the model for predicting rejections from these two facets. Precision measures the fraction of unqualified candidates from the pool of unqualified candidates that are identified as such by the model. If the model precision for predicting unqualified applicants diverges between the facets, this is a bias and its magnitude is measured by the DRR.

The formula for difference in rejection rates between facets *a* and *d*:

        DRR = TNd/(TNd \$1 FNd) - TNa/(TNa \$1 FNa) 

The components for the previous DRR equation are as follows.
+ TNd are the true negatives predicted for facet *d*.
+ FNd are the false negatives predicted for facet *d*.
+ TPa are the true negatives predicted for facet *a*.
+ FNa are the false negatives predicted for facet *a*.

For example, suppose the model rejects 100 middle-aged applicants (facet *a*) for a loan (predicted negative labels) of whom 80 are actually unqualified (observed negative labels). Also suppose the model rejects 50 applicants from other age demographics (facet *d*) for a loan (predicted negative labels) of whom only 40 are actually unqualified (observed negative labels). Then DRR = 40/50 - 80/100 = 0, so no bias is indicated.

The range of values for DRR for binary, multicategory facet, and continuous labels is [-1, \$11].
+ Positive values occur when the ratio of the predicted negatives (rejections) to the observed negative outcomes (unqualified applicants) for facet *d* is larger than the same ratio for facet *a*. These values indicate a possible bias against the favored facet *a* caused by the occurrence of relatively more false negatives in facet *a*. The larger the difference in the ratios, the more extreme the apparent bias.
+ Values near zero occur when the ratio of the predicted negatives (rejections) to the observed negative outcomes (unqualified applicants) for facets *a* and *d* have similar values, indicating the observed labels for negative outcomes are being predicted with equal precision by the model.
+ Negative values occur when the ratio of the predicted negatives (rejections) to the observed negative outcomes (unqualified applicants) for facet *a* is larger than the ratio facet *d*. These values indicate a possible bias against the disfavored facet *d* caused by the occurrence of relatively more false positives in facet *d*. The more negative the difference in the ratios, the more extreme the apparent bias.

# Accuracy Difference (AD)


Accuracy difference (AD) metric is the difference between the prediction accuracy for different facets. This metric determines whether the classification by the model is more accurate for one facet than the other. AD indicates whether one facet incurs a greater proportion of Type I and Type II errors. But it cannot differentiate between Type I and Type II errors. For example, the model may have equal accuracy for different age demographics, but the errors may be mostly false positives (Type I errors) for one age-based group and mostly false negatives (Type II errors) for the other. 

Also, if loan approvals are made with much higher accuracy for a middle-aged demographic (facet *a*) than for another age-based demographic (facet *d*), either a greater proportion of qualified applicants in the second group are denied a loan (FN) or a greater proportion of unqualified applicants from that group get a loan (FP) or both. This can lead to within group unfairness for the second group, even if the proportion of loans granted is nearly the same for both age-based groups, which is indicated by a DPPL value that is close to zero.

The formula for AD metric is the difference between the prediction accuracy for facet *a*, ACCa, minus that for facet *d*, ACCd:

        AD = ACCa - ACCd

Where:
+ ACCa = (TPa \$1 TNa)/(TPa \$1 TNa \$1 FPa \$1 FNa) 
  + TPa are the true positives predicted for facet *a*
  + TNa are the true negatives predicted for facet *a*
  + FPa are the false positives predicted for facet *a*
  + FNa are the false negatives predicted for facet *a*
+ ACCd = (TPd \$1 TNd)/(TPd \$1 TNd \$1 FPd \$1 FNd)
  + TPd are the true positives predicted for facet *d*
  + TNd are the true negatives predicted for facet *d*
  + FPd are the false positives predicted for facet *d*
  + FNd are the false negatives predicted for facet *d*

For example, suppose a model approves loans to 70 applicants from facet *a* of 100 and rejected the other 30. 10 should not have been offered the loan (FPa) and 60 were approved that should have been (TPa). 20 of the rejections should have been approved (FNa) and 10 were correctly rejected (TNa). The accuracy for facet *a* is as follows:

        ACCa = (60 \$1 10)/(60 \$1 10 \$1 20 \$1 10) = 0.7

Next, suppose a model approves loans to 50 applicants from facet *d* of 100 and rejected the other 50. 10 should not have been offered the loan (FPa) and 40 were approved that should have been (TPa). 40 of the rejections should have been approved (FNa) and 10 were correctly rejected (TNa). The accuracy for facet *a* is determined as follows:

        ACCd= (40 \$1 10)/(40 \$1 10 \$1 40 \$1 10) = 0.5

The accuracy difference is thus AD = ACCa - ACCd = 0.7 - 0.5 = 0.2. This indicates there is a bias against facet *d* as the metric is positive.

The range of values for AD for binary and multicategory facet labels is [-1, \$11].
+ Positive values occur when the prediction accuracy for facet *a* is greater than that for facet *d*. It means that facet *d* suffers more from some combination of false positives (Type I errors) or false negatives (Type II errors). This means there is a potential bias against the disfavored facet *d*.
+ Values near zero occur when the prediction accuracy for facet *a* is similar to that for facet *d*.
+ Negative values occur when the prediction accuracy for facet *d* is greater than that for facet *a* t. It means that facet *a* suffers more from some combination of false positives (Type I errors) or false negatives (Type II errors). This means the is a bias against the favored facet *a*.

# Treatment Equality (TE)


The treatment equality (TE) is the difference in the ratio of false negatives to false positives between facets *a* and *d*. The main idea of this metric is to assess whether, even if the accuracy across groups is the same, is it the case that errors are more harmful to one group than another? Error rate comes from the total of false positives and false negatives, but the breakdown of these two maybe very different across facets. TE measures whether errors are compensating in the similar or different ways across facets. 

The formula for the treatment equality:

        TE = FNd/FPd - FNa/FPa

Where:
+ FNd are the false negatives predicted for facet *d*.
+ FPd are the false positives predicted for facet *d*.
+ FNa are the false negatives predicted for facet *a*.
+ FPa are the false positives predicted for facet *a*.

Note the metric becomes unbounded if FPa or FPd is zero.

For example, suppose that there are 100 loan applicants from facet *a* and 50 from facet *d*. For facet *a*, 8 were wrongly denied a loan (FNa) and another 6 were wrongly approved (FPa). The remaining predictions were true, so TPa \$1 TNa = 86. For facet *d*, 5 were wrongly denied (FNd) and 2 were wrongly approved (FPd). The remaining predictions were true, so TPd \$1 TNd = 43. The ratio of false negatives to false positives equals 8/6 = 1.33 for facet *a* and 5/2 = 2.5 for facet *d*. Hence TE = 2.5 - 1.33 = 1.167, even though both facets have the same accuracy:

        ACCa = (86)/(86\$1 8 \$1 6) = 0.86

        ACCd = (43)/(43 \$1 5 \$1 2) = 0.86

The range of values for differences in conditional rejection for binary and multicategory facet labels is (-∞, \$1∞). The TE metric is not defined for continuous labels. The interpretation of this metric depends on the relative important of false positives (Type I error) and false negatives (Type II error). 
+ Positive values occur when the ratio of false negatives to false positives for facet *d* is greater than that for facet *a*. 
+ Values near zero occur when the ratio of false negatives to false positives for facet *a* is similar to that for facet *d*. 
+ Negative values occur when the ratio of false negatives to false positives for facet *d* is less than that for facet *a*.

**Note**  
A previous version stated that the Treatment Equality metric is computed as FPa / FNa - FPd / FNd instead of FNd / FPd - FNa / FPa. While either of the versions can be used. For more information, see [https://pages.awscloud.com/rs/112-TZM-766/images/Fairness.Measures.for.Machine.Learning.in.Finance.pdf](https://pages.awscloud.com/rs/112-TZM-766/images/Fairness.Measures.for.Machine.Learning.in.Finance.pdf).

# Conditional Demographic Disparity in Predicted Labels (CDDPL)


The demographic disparity metric (DDPL) determines whether facet *d* has a larger proportion of the predicted rejected labels than of the predicted accepted labels. It enables a comparison of difference in predicted rejection proportion and predicted acceptance proportion across facets. This metric is exactly the same as the pre-training CDD metric except that it is computed off the predicted labels instead of the observed ones. This metric lies in the range (-1,\$11).

The formula for the demographic disparity predictions for labels of facet *d* is as follows: 

        DDPLd = n'd(0)/n'(0) - n'd(1)/n'(1) = PdR(y'0) - PdA(y'1) 

Where: 
+ n'(0) = n'a(0) \$1 n'd(0) is the number of predicted rejected labels for facets *a* and *d*.
+ n'(1) = n'a(1) \$1 n'd(1) is the number of predicted accepted labels for facets *a* and *d*.
+ PdR(y'0) is the proportion of predicted rejected labels (value 0) in facet *d*.
+ PdA(y'1) is the proportion of predicted accepted labels (value 1) in facet *d*.

A conditional demographic disparity in predicted labels (CDDPL) metric that conditions DDPL on attributes that define a strata of subgroups on the dataset is needed to rule out Simpson's paradox. The regrouping can provide insights into the cause of apparent demographic disparities for less favored facets. The classic case arose in the case of Berkeley admissions where men were accepted at a higher rate overall than women. But when departmental subgroups were examined, women were shown to have higher admission rates than men by department. The explanation was that women had applied to departments with lower acceptance rates than men had. Examining the subgroup acceptance rates revealed that women were actually accepted at a higher rate than men for the departments with lower acceptance rates.

The CDDPL metric gives a single measure for all of the disparities found in the subgroups defined by an attribute of a dataset by averaging them. It is defined as the weighted average of demographic disparities in predicted labels (DDPLi) for each of the subgroups, with each subgroup disparity weighted in proportion to the number of observations in contains. The formula for the conditional demographic disparity in predicted labels is as follows:

        CDDPL = (1/n)\$1∑ini \$1DDPLi 

Where: 
+ ∑ini = n is the total number of observations and niis the number of observations for each subgroup.
+ DDPLi = n'i(0)/n(0) - n'i(1)/n(1) = PiR(y'0) - PiA(y'1) is the demographic disparity in predicted labels for the subgroup.

So the demographic disparity for a subgroup in predicted labels (DDPLi) are the difference between the proportion of predicted rejected labels and the proportion of predicted accepted labels for each subgroup. 

The range of DDPL values for binary, multicategory, and continuous outcomes is [-1,\$11]. 
+ \$11: when there are no predicted rejection labels for facet *a* or subgroup and no predicted acceptances for facet *d* or subgroup.
+ Positive values indicate there is a demographic disparity in predicted labels as facet *d* or subgroup has a larger proportion of the predicted rejected labels than of the predicted accepted labels. The higher the value the greater the disparity.
+ Values near zero indicate there is no demographic disparity on average.
+ Negative values indicate there is a demographic disparity in predicted labels as facet *a* or subgroup has a larger proportion of the predicted rejected labels than of the predicted accepted labels. The lower the value the greater the disparity.
+ -1: when there are no predicted rejection lapels for facet *d* or subgroup and no predicted acceptances for facet *a* or subgroup.

# Counterfactual Fliptest (FT)


The fliptest is an approach that looks at each member of facet *d* and assesses whether similar members of facet *a* have different model predictions. The members of facet *a* are chosen to be k-nearest neighbors of the observation from facet *d*. We assess how many nearest neighbors of the opposite group receive a different prediction, where the flipped prediction can go from positive to negative and vice versa. 

The formula for the counterfactual fliptest is the difference in the cardinality of two sets divided by the number of members of facet *d*:

        FT = (F\$1 - F-)/nd

Where:
+ F\$1 = is the number of disfavored facet *d* members with an unfavorable outcome whose nearest neighbors in favored facet *a* received a favorable outcome. 
+ F- = is the number of disfavored facet *d* members with a favorable outcome whose nearest neighbors in favored facet *a* received an unfavorable outcome. 
+ nd is the sample size of facet *d*.

The range of values for the counterfactual fliptest for binary and multicategory facet labels is [-1, \$11]. For continuous labels, we set a threshold to collapse the labels to binary.
+ Positive values occur when the number of unfavorable counterfactual fliptest decisions for the disfavored facet *d* exceeds the favorable ones. 
+ Values near zero occur when the number of unfavorable and favorable counterfactual fliptest decisions balance out.
+ Negative values occur when the number of unfavorable counterfactual fliptest decisions for the disfavored facet *d* is less than the favorable ones.

# Generalized entropy (GE)


The generalized entropy index (GE) measures the inequality in benefit `b` for the predicted label compared to the observed label. A benefit occurs when a false positive is predicted. A false positive occurs when a negative observation (y=0) has a positive prediction (y'=1). A benefit also occurs when the observed and predicted labels are the same, also known as a true positive and true negative. No benefit occurs when a false negative is predicted. A false negative occurs when a positive observation (y=1) is predicted to have a negative outcome (y'=0). The benefit `b` is defined, as follows.

```
 b = y' - y + 1
```

Using this definition, a false positive receives a benefit `b` of `2`, and a false negative receives a benefit of `0`. Both a true positive and a true negative receive a benefit of `1`.

The GE metric is computed following the [Generalized Entropy Index](https://en.wikipedia.org/wiki/Generalized_entropy_index) (GE) with the weight `alpha` set to `2`. This weight controls the sensitivity to different benefit values. A smaller `alpha` means an increased sensitivity to smaller values.

![\[Equation defining generalized entropy index with alpha parameter set to 2.\]](http://docs.amazonaws.cn/en_us/sagemaker/latest/dg/images/clarify-post-training-bias-metric-ge.png)


The following variables used to calculate GE are defined as follows:
+ bi is the benefit received by the `ith` data point.
+ b' is the mean of all benefits.

GE can range from 0 to 0.5, where values of zero indicate no inequality in benefits across all data points. This occurs either when all inputs are correctly predicted or when all the predictions are false positives. GE is undefined when all predictions are false negatives.

**Note**  
The metric GE does not depend on a facet value being either favored or disfavored.