With the increasing digitalisation of today’s world, there is a growing demand for a solution capable of addressing the key issues related to the protection of personal data, especially with the growing and widespread use of machine learning (ML) technologies. Federated Machine Learning (FedML) has emerged as a promising alternative to traditional machine learning methods, offering a distributed approach to model training that aims to be a solution to privacy concerns. This Insight analyses FedML’s viability in addressing these privacy issues in the cities of the future, considering not only its effectiveness in preserving personal data but also the potential risks associated with its decentralised approach.
Contextualising FedML
FedML is a revolutionary paradigm, similar to existing ML mechanisms, designed to train and improve the responsiveness of artificial intelligence (AI) algorithm models. However, the primary difference of FedML compared to existing ML mechanisms is that FedML trains AI algorithms on decentralised servers. This means that it is no longer necessary to aggregate sensitive data on a central server for algorithm training purposes, thereby ensuring greater protection of user data.
FedML operates as follows: the global model serves as the primary model, which is distributed individually or via the network to the devices of participating organisations during a second phase. These devices are locally trained using their own data, creating updates to the AI algorithm. As the algorithm receives updates, it only transmits the updated information to a central server, diligently excluding raw data containing personal information. Finally, the central server aggregates the updated information to generate a superior global model that accumulates the collective knowledge of all devices. This process is continuously repeated to improve the global model’s ability to meet the needs of FedML member institutions.
FedML use is primarily intended for scenarios where AI algorithms are trained using sensitive data, as is the case in industries such as healthcare, finance, and the Internet of Things (IoT).
However, despite evidence that FedML is more advantageous than other more common forms of machine learning, as has been demonstrated , it also presents, or does not entirely eliminate, certain associated risks. Indeed, FedML is more effective in terms of personal data privacy and protection; however, even when using a distributed server, it is not immune to cyber-attacks during the communication process between participating organisations and the server, which can compromise its security and data privacy. Furthermore, despite being a distributed mechanism, it does not entirely mitigate the risk that certain patterns in personal data could lead to the identification of specific individuals.
One of the main advantages of FedML is the large volumes of data it can handle to improve the global model. However, given this high volume of data transactions between participating organisations and the models being trained, ensuring that all data subjects have given their informed consent for processing becomes a challenge.
Reflecting on the FedML learning method also raises transparency issues, as most ordinary citizens do not understand how AI algorithms process data to arrive at the presented results. Despite the clear need for a transparent demonstration of how AI algorithms operate in these circumstances, this is an almost impossible task considering that most existing AI algorithms are black boxes.
As a rule, the emergence of new technologies raises concerns that fundamental human rights may be compromised. The case of FedML is no exception, with concerns arising about the fundamental right to privacy, especially when dealing with sensitive data in a medical context. Other fundamental rights, such as non-discrimination and fairness, which are essential to prevent algorithmic bias, must also be considered, thus ensuring that the results align with ethical and social values.
All forms of machine learning (not just FedML) are subject to the General Data Protection Regulation (GDPR), as its function involves the processing of personal data, as established in Articles 2 and 3 of the GDPR. Both the participating organisations and the company that owns the FedML mechanism must comply with the GDPR rules, especially regarding the rights of data subjects (Chapter 3 of the GDPR) and the transfer of personal data (Chapter 5 of the GDPR). It is also necessary to comply with the AI Act, which, according to its Article 6, would classify FedML as a high-risk system when used in critical sectors such as healthcare, subject to certain obligations set forth under its Article 8.
Risk Mitigation
The GDPR and the AI Act are two regulatory frameworks aiming to mitigate some of the risks presented to data subjects’ privacy and individual´s fundamental rights in the context of data processing to train AI systems using FedML. The issue of transparency in FedML, as well as the issue of the right to non-discrimination and fairness, can be addressed by implementing the solution presented by the authors Wulf, A. J. & Seizov, O., namely, the AIX-1P (One-Page Information Notice on Artificial Intelligence Explanations). This solution consists of a concise standardised document presented to the data subject whenever an automated decision-making process is used to make significant assessments and decisions based on their personal data. This document provides data subjects with a better understanding of the complex process by which AI algorithms handle their data. By providing retrospective insights into the inputs (user characteristics) and outputs (resulting decisions) of the applications, the document acts as a practical tool to bridge the gap between complex AI processes and users’ understanding of the relationship between their processed data and the decisions made, promoting transparency and accountability in AI algorithms.
As shown above, Federated Machine Learning (FedML) is a viable solution in response to concerns related to the privacy of personal data in the context of machine learning. The decentralised algorithm training process is crucial for mitigating the risks associated with the centralisation occurring in common machine learning. However, FedML is not entirely immune from cyber-attacks, and issues such as consent need further development to comply with legal bases. In summary, by acting in compliance with the GDPR and the AI Act and by adopting a mechanism that promotes transparency, like the AIX-1P, FedML can offer a more efficient approach than other alternatives, better balancing innovation and the protection of data subjects’ rights.