As machine learning technology progresses rapidly, there is much interest in understanding its societal impact. A particularly successful branch of machine learning is
supervised learning. With enough past data and computational resources, learning algorithms often produce surprisingly effective predictors of future events. To take one hypothetical example: an algorithm could, for example, be used to predict with high accuracy who will pay back their loan. Lenders might then use such a predictor as an aid in deciding who should receive a loan in the first place. Decisions based on machine learning can be both incredibly useful and have a profound impact on our lives.
Even the best predictors make mistakes. Although machine learning aims to minimize the chance of a mistake, how do we prevent certain groups from experiencing a disproportionate share of these mistakes? Consider the case of a group that we have relatively little data on and whose characteristics differ from those of the general population in ways that are relevant to the prediction task. As prediction accuracy is generally correlated with the amount of data available for training, it is likely that incorrect predictions will be more common in this group. A predictor might, for example, end up flagging too many individuals in this group as ‘high risk of default’ even though they pay back their loan. When group membership coincides with a sensitive attribute, such as race, gender, disability, or religion, this situation can lead to unjust or prejudicial outcomes.
Despite the need, a vetted methodology in machine learning for preventing this kind of discrimination based on sensitive attributes has been lacking. A naive approach might require a set of sensitive attributes to be removed from the data before doing anything else with it. This idea of “fairness through unawareness,” however, fails due to the existence of “redundant encodings.” Even if a particular attribute is not present in the data, combinations of other attributes can act as a proxy.
Another common approach, called demographic parity, asks that the prediction must be uncorrelated with the sensitive attribute. This might sound intuitively desirable, but the outcome itself is often correlated with the sensitive attribute. For example, the incidence of heart failure is substantially more common in men than in women. When predicting such a medical condition, it is therefore neither realistic nor desirable to prevent all correlation between the predicted outcome and group membership.
Taking these conceptual difficulties into account, we’ve proposed a methodology for measuring and preventing discrimination based on a set of sensitive attributes. Our framework not only helps to scrutinize predictors to discover possible concerns. We also show how to adjust a given predictor so as to strike a better tradeoff between classification accuracy and non-discrimination if need be.
At the heart of our approach is the idea that individuals who qualify for a desirable outcome should have an equal chance of being correctly classified for this outcome. In our fictional loan example, it means the rate of ‘low risk’ predictions among people who actually pay back their loan should not depend on a sensitive attribute like race or gender. We call this principle equality of opportunity in supervised learning.
When implemented, our framework also improves incentives by shifting the cost of poor predictions from the individual to the decision maker, who can respond by investing in improved prediction accuracy. Perfect predictors always satisfy our notion, showing that the central goal of building more accurate predictors is well aligned with the goal of avoiding discrimination.