Data as a source of discrimination: Is Artificial Intelligence being inclusive?

Artificial intelligence is already being used to make decisions that improve our lives. However, this use can also result in more complex unintended consequences, raising concerns about discrimination.

The word discrimination has its origin in the Greek word for discrimination διάκριση, meaning “to distinguish between”. This means that the very concept implies a sense of separation, expressed in a real unequal treatment of the individual in society.

In the long history of humanity, discrimination phenomena associated with the belief of the hegemony of one group over another have always played a major role in society, preventing the guarantee of individual dignity. During this time, equality was still seen as an idealization, a simple utopia that structured society for vast centuries.

It is only in the post-World War II era, after innumerable events that undermined human dignity, that the foundations for equality among all human beings began to be established; especially on 10 December 1948 with the proclamation of the Universal Declaration of Human Rights by the United Nations.

This milestone was an enabling vehicle for further research around discrimination by UNESCO between 1950-1960, which scientifically dismantled beliefs in establishing or legitimizing privileges of one human group over another.

Nonetheless, until the 1970s, race, nationality, and skin color were often used to make predictions.

Since then, the concept has evolved through social, political, and economic developments, shaping the protection against discrimination under Portuguese Law, namely in Article 13º of CRP (Constitution of the Portuguese Republic). At European level, this protection is primarily found in the Charter of Fundamental Rights of the European Union, the OECD principles, the European Convention on Human Rights and the Universal Declaration of Human Rights.

Despite the efforts in battling discrimination, discriminatory decisions are being systemically reflected in data, and especially in digital data. Today’s research, interpretations, and statistics are a mirror of the outlined conceptions of the past, which in turn become data used to train algorithms. As an example, historical employment data shows males being promoted more often than females. From a historically biased result influenced by the labor market, machine learning systems will continue to favor men, because based on the information inserted, they will understand that women are worse hires – as they are less often promoted.

In other words, Artificial intelligence systems can perpetuate this proliferation of discrimination when, on its own, the data sample used is already biased. Moreover, in the form of over- or under-sampling, the operation is limited to a poor representation of reality. For instance, a sample of data submitted with references relating only to men, will inevitably lead to results that are not very demonstrative of the women’s reality, if we are also trying to get results for women.

Technology navigates at an overwhelming speed, which thus exposes the aggravation of systemic bias in society through machine learning systems. In this sense, one can look at the COMPAS[1] software used in the United States, where risk analysis and risk assessment are the strongest drivers of discrimination. After a study by ProPublica, it was found that the formula used in the algorithm was undoubtedly susceptible to mislabeling defendants according to their color. Black defendants were 77% more likely to be considered high risk re-offenders to commit a future violent crime and 45% more likely to re-offend in another crime. White defendants were labelled as low risk by almost twice as many as black defendants. In this context, with a decision coming from algorithms working under discriminatory factors, if the result is wrong, either a criminal can go free, or a harsher penalty can be unjustly applied. Even though it may produce biased or even illegal results, this model was continuously used as it is considered to be programmed correctly.

Artificial intelligence and discrimination also intersect, for example, when facial recognition is used to identify, monitor, and locate potential criminal suspects. In this regard, there is a clear underlying increased risk of under-represented communities in the systems seeing their faces detected with less accuracy. For instance, In 2019 a facial recognition system misidentified a student as a suspect in the attacks on a church in Sri Lanka.

Notwithstanding the above, even if algorithms in machine learning are trained on transparent data that are sufficiently representative of the reality under study, their design or implementation may still encode discrimination depending on the chosen model. An algorithmic model is nothing more than an abstract representation of a process designed with a specific purpose that guides its decisions and trained to recognize certain types of patterns. To build a model we select relevant data – important facts and actions, a section of reality that is considered enough for the specific purpose. Inevitably, some important information is left out, as no model can include all the complexity of the real world. Furthermore, considering that the same model can be trained with different data, may behave differently in different contexts. Therefore, algorithms that have proven effective in one context may discriminate in another. Considering the above, lack of data representativity creates barriers that not only confirm but also exacerbate social injustices hidden behind algorithmic decisions that day by day become more complex to combat. Given the direct impact on our individual and collective lives, all technology developments must maintain a citizen protection approach to ensure that the use of Artificial Intelligence is done in a prudent, fair, and responsible manner, contributing to a new era of technologies that serve all society.


[1] COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) developed byNorthpointe, is one of several risk assessment algorithms used in the US to predict the main characteristics of violent crimes, determine the types of supervision that inmates may need, or provide information that may be useful in sentencing. The algorithm considers the answers to a 137-question questionnaire to predict this risk. It asks defendants things like “Has one of your parents ever been sent to jail or prison?”, “How many of your friends are taking drugs illegally?” and “How often did you get into fights while at school?”. The questionnaire also asks people to agree or disagree with statements such as “A hungry person has the right to steal” and “If people annoy me or lose their temper, I can be dangerous”.

Os Insights aqui publicados reproduzem o trabalho desenvolvido para este efeito pelo respetivo autor, pelo que mantêm a língua original em que foram redigidos. A responsabilidade pelas opiniões expressas no artigo são exclusiva do seu autor pelo que a sua publicação não constitui uma aprovação por parte do WhatNext.Law ou das entidades afiliadas. Consulte os nossos Termos de Utilização para mais informação.

Deixe um Comentário

Gostaríamos muito de ouvir a tua opinião!

Estamos abertos a novas ideias e sugestões. Se tens uma ideia que gostarias de partilhar connosco, usa o botão abaixo.