AI-Led Credit Checks – How Training Data Perpetuates Inequality in Credit Checks

DataGovernance, Fintech

The retrospective nature of algorithmic systems risks amplifying unconscious bias in credit checks; how is the EU addressing this regulatory challenge?

AI-led credit checks are a multi-step process based on an algorithmic system. These algorithms have been used in creditworthiness assessments for decades, although they have become increasingly sophisticated over time. Its regulation and discussion have not been able to keep up with the exponential growth and day-to-day integration in our lives and business practices, and the question has now been raised whether the necessary guardrails have been put in place to prevent the continuous perpetuation of historical inequalities and bias.

Mechanisms of Unconscious Bias in Algorithmic Systems

Artificial Intelligence and Machine Learning within credit scoring models are utilised as a tool to analyse and process a wider variety of data than the traditional rule-based system. Its capabilities for identifying correlations and nuance have become increasingly sophisticated and are able to process “weak signals”. These are data points not traditionally included during credit scoring but can be used to create a more comprehensive assessment of applicants with “thin files”, meaning little to no conventional credit history. These broader sets of information allow credit institutions to offer more dynamic pricing, not only for the benefit of the consumer but also as a method of risk protection for the lender, and whilst it does enhance accuracy, it also is a source of amplification for unconscious bias.

The cause of bias within algorithmic systems emerges through the projection of imperfections in the utilised training data. These imperfections are not intentional but merely a reflection of the faults and inequalities of modern society. The three main factors in unconscious bias are (1) historical data ingestion, (2) data quality disparities, and (3) utilisation of proxy variables.

As Artificial Intelligence and Machine Learning Algorithms are inherently retrospective, the perpetuation of historical inequalities and discrimination is inevitable. Because systems are calibrated for accuracy using historical data, past systemic biases are incorporated into modern decision-making. This is even further exacerbated by data quality disparities, most commonly referred to as the “Thin File Problem”. It simply describes the concept that borrowers from disadvantaged and/or minority backgrounds simply have limited or volatile credit histories. Additionally, there is less historical data available to counteract the impact this has on general risk modelling. Lastly, the issue of proxy variables, also known as masked factors, often acts as a stand-in for legally protected characteristics, such as race or gender. This form of redlining often relates to factors such as distance between home and workplace, level of completed education, or employment history. Ultimately, it creates a misleading flair of neutrality despite these factors often being tied to a borrower’s socioeconomic background.

Risks and Opportunities for Lenders & Borrowers

The correlation between these factors has a particular detriment to disadvantaged consumers upon the introduction of risk-based pricing, often amplifying wealth inequality and higher cost of capital to disadvantaged communities, and then resulting in a vicious cycle of unaffordable and/or predatory lending.

With the rise in digital applications for credit products, particularly following the COVID-19 pandemic, a majority of consumers utilise external brokers to find the best terms on the market. These brokers often have prior insight into the terms offered to different credit files by each of their partners, at times being provided with favourable terms in exchange for highlighting a specific lender’s offering. Through this data, they are able to operate off a system of pre-approvals, allowing lenders to receive preliminary terms without running a comprehensive creditworthiness assessment. After sharing a significant amount of personal information and potentially being denied prime credit terms, these institutions have the opportunity to steer these individuals towards credit products with less favourable terms. This practice, also known as algorithmic steering, contributes significantly to the vicious cycle of unaffordable capital and predatory lending practices. Biased AI models not only restrict opportunity and access but also artificially inflate demand for non-prime credit, creating a market gap that predatory agents are uniquely positioned to fill.

Despite the inherent risks of using human-removed systems, they also provide a distinct opportunity to utilise so-called “weak signals”, alternative markers not traditionally used within credit checks. This mechanism can not only help reduce an institution’s reliance on generic group characteristics, increase approval rates and offer truly personalised credit products, but also expand financial inclusion across all levels of the population. At this point, it is important to note that whilst possible, these considerations need to be made during the algorithm’s development, which requires not only a new system of data collection but also a re-evaluation of previously granted and denied credit, as well as the data collected and evaluated as part of those applications.

Conflict between Regulations

The obvious question is: what legal protections and guardrails need to be introduced to protect consumers whilst ensuring impartial risk assessments? Within the European Union, the compliance burden now not only rests on GDPR, but the AI Act also establishes requirements for the implementation and use within credit checks.

Due to its potential impact on a borrower’s quality of life and economic opportunities, the use of AI in this matter is classified as a high-risk operation (per Annex III 5.b) and has very stringent compliance requirements detailed within Chapter 3, Section 2 of the AI Act (Regulation 2024/1689). These compliance requirements are imposed to the provider but also to the deployer of such systems.

GDPR (Regulation 2016/679) has already established a legal foundation, under Article 22, allowing consumers to not be subjected to decisions based solely on automated processing. It should be noted that these protections are relevant only when the use case specifically qualifies as automated decision-making. In a landmark ruling concerning German credit bureau SCHUFA (Case C-634/21) the CJEU had decided that if third-parties, in this case banks, “[draw] strongly on that probability value to establish, implement or terminate a contractual relationship”, then it constitutes ‘automated individual decision-making’, which safeguards borrower’s rights, not limited to but including the right for human intervention and the right to challenge a decision. This decision, whilst not mandating SCHUFA to disclose their algorithms, creates broad protections for consumers and increases transparency within banks’ procedures.

Implementation Challenge

With this dual regime across the AI Act and the GDPR, there are new challenges in implementing, at times, conflicting requirements in a coherent manner. A major sticking point is the conjunction of the AI Act with, in particular, Article 9 of the GDPR, which imposes strict restrictions on the processing of sensitive personal data (e.g., race, gender), which are restated in Directive (EU) 2023/2225 (New Consumer Credit Directive). The AI Act, however, requires the processing of such data for the purpose of identifying and avoiding discrimination. Within the financial sector, these ambiguous requirements could potentially hinder innovation, and a dominant reliance on the GDPR basis could increase fragmentation across member states due to divergent enforcement at a member state level.

To address the broader technical friction between data protection and the data-heavy needs of AI training, the Commission has proposed a “Digital Omnibus” framework. The goal of the proposal is to relief some of the administrative burden of relevant players, as well as attempting to resolve the “Data Paradox”: how to ingest AI models with sufficient data to eliminate bias without compromising data integrity and the privacy of individuals within the dataset. Designed to streamline the digital rulebook, this agenda posits two controversial shifts which directly impact credit scoring models:

Suspension of High-Risk Obligations: The proposal seeks to delay the application of high-risk AI system obligations (originally set for 2026) until late 2027 or 2028. While intended to allow time for the development of harmonised technical standards, critics argue this creates a regulatory vacuum for sensitive use cases.

Relativising Personal Data: There is a growing push to relativise the notion of personal data, suggesting that its classification should depend on the data holder entity rather than the data itself. This would mean data could be considered “personal” for a controller who holds the identification key, but “non-personal” for a recipient who cannot re-identify the subject. This shift aims to facilitate the data flows necessary for AI training but risks diluting the absolute protections traditionally afforded by the GDPR.

Conclusion

The three primary factors of bias—historical data ingestion, data quality disparities, and the utilisation of proxy variables—traps AI-led credit checks in a retrospective loop. Without proactive safeguards, these systems will not only replicate but amplify historical inequalities under a veneer of objectivity.

The current dual regime of the AI Act and GDPR offers a theoretical safety net, but the regulatory friction creates a stalemate. If lenders cannot access demographic data to audit their models for bias, they cannot prove compliance with the AI Act’s fairness mandates. The resolution of this legislative overlap is a matter of necessary legal clarity and essential to financial inclusion. To avoid further fragmentation across member states, the EU needs to establish a clear path for harmonious “bias-detection processing”, striking the necessary balance between the right to privacy and an unbiased financial future.

Stella M Fleuren

Post Views: 17

Stella M Fleuren

The Insights published herein reproduce the work carried out for this purpose by the author and therefore maintain the original language in which they were written. The opinions expressed within the article are solely the author’s and do not reflect in any way the opinions and beliefs of WhatNext.Law or of its affiliates. See our Terms of Use for more information.

We'd love to hear from you

We’re open to new ideas and suggestions. If you have an idea that you’d like to share with us, use the button bellow.

AI-Led Credit Checks – How Training Data Perpetuates Inequality in Credit Checks

Stella M Fleuren

Leave a Comment Cancel Reply

Recent Insights

Beyond the Black Box: Forging an Ethical Compass for...

A impressão 3D e a proteção dos desenhos –...

Agentic AI and EU Competition Law: Probes, Risks and...

Unlocking the European Data Economy: Challenges of Turning the...

We'd love to hear from you

info@whatnext.law