The Enigma of Enforcing GDPR on LLMs • AI Blog

In the digital age, the privacy of data is a great concern, and regulations such as the General Data Protection (GDPR) aims to protect the personal data of individuals. However, the appearance of large language models (LLMS) such as GPT-4, BERT and Kin are major challenges to enforce the gross domestic product. These models, which generate a text by predicting the following symbol based on patterns of huge quantities of training data, are held by the organizational scene. Here is the reason for enforcement of GDP on LLMS is practically impossible.
Llms nature and data storage
To understand the enforcement dilemma, it is necessary to understand how LLMS works. Unlike traditional databases where data is stored in an organized manner, LLMS works differently. They are trained in huge data collections, and through this training, they control millions or even billions of parameters (weights and biases). These parameters pick up complicated patterns and knowledge of data, but do not store the same data in a model that can be recovered.
When LLM creates a text, it does not reach a database of stored phrases or sentences. Instead, the parameters learned to predict the following word use the most likely sequence. This process is similar to how a person can create a text based on the learned language patterns instead of summoning the exact phrases of memory.
The right to forget it
One of the rights of the cornerstone under GDP is the “right to forget”, allowing individuals to ask to delete their personal data. In traditional data storage systems, this means determining the location of the specified data entries and their erasure. However, with LLMS, determining and removing specific parts of the personal data included in the parameters of the model is actually impossible. The data is not explicitly stored, but instead it is published across a countless number of parameters in an unconnected way or change.
Data erasing and reclaiming the form
Even if it is theoretically possible to determine specific data points within LLM, their erasure will be another huge challenge. Removing data from LLM requires re -training the form, which is an expensive process and takes a long time. Re -training from zero point to exclude some data may require the same wide resources used at the beginning, including calculation and time, making it inappropriate.
Non -disclosure of his identity and reduce data
GDP also emphasizes the lack of disclosure and reduce its identity. While LLMS can be trained in unidentified data, ensuring that its difficult identity is not disclosed. Anonymous data can still reveal personal information when combined with other data, which leads to the redefinition of a possible identity. Moreover, LLMS needs wide amounts of data to actively, conflicting with the principle of data reduction.
Lack of transparency and ability to explain
There are other requirements of gross domestic product that is the ability to explain how to use personal data and make decisions. However, LLMS is referred to as “black boxes” because decision -making processes are not transparent. Understanding the reason for the existence of a model generates a specific text that includes decoding complex interactions between many parameters, a task that exceeds the current technical capabilities. This lack of clarification hinders compliance with the requirements of the GDP transparency.
Moving forward: organizational and technical modifications
Looking at these challenges, the enforcement of GDP on LLMS requires both organizational and technical modifications. Organizers need to develop guidelines that explain the unique nature of LLMS, which may focus on the moral use of Amnesty International and the implementation of strong measures to protect from data during training and typical publishing.
Technologically, developments can help explain the model and control compliance. Techniques to make LLMS more transparent and methods of tracking the source of data within the models are areas of continuous research. In addition, the differential privacy, which ensures that removing or adding a single data point does not significantly affect the output of the model, can be a step towards align LLM practices with the principles of gross domestic product.
2024-05-29 22:47:00