With privacy concerns rising, can we teach AI chatbots to forget?

The way AI systems work means that we can’t easily delete what they have learned. Now, researchers are seeking ways to remove sensitive information without having to retrain them from scratch.

I HAVE been writing on the internet for more than two decades. As a teenager, I left a trail of blogs and social media posts in my wake, ranging from the mundane to the embarrassing. More recently, as a journalist, I have published many stories about social media, privacy and artificial intelligence, among other things. So when ChatGPT told me that my output may have influenced its responses to other people’s prompts, I rushed to wipe my data from its memory.

Pete Reynolds

As I quickly discovered, however, there is no delete button. AI-powered chatbots, which are trained on datasets including vast numbers of websites and online articles, never forget what they have learned.

That means the likes of ChatGPT are liable to divulge sensitive personal information, if it has appeared online, and that the companies behind these AIs will struggle to make good on “right-to-be-forgotten” regulations, which compel organisations to remove personal data on request. It also means we are powerless to stop hackers manipulating AI outputs by planting misinformation or malicious instructions in training data.

All of which explains why many computer scientists are scrambling to teach AIs to forget. While they are finding that it is extremely difficult, “machine unlearning” solutions are beginning to emerge. And the work could prove vital beyond addressing concerns over privacy and misinformation. If we are serious about building AIs that learn and think like humans, we might need to engineer them to forget.

The new generation of AI-powered chatbots like ChatGPT and Google’s Bard, which produce text in response to our prompts, are underpinned by large language models (LLMs). These are trained on mountains of data, most of which is scraped from the internet – from social media posts to about 250,000 books and nearly all publicly available information, including news websites and Wikipedia pages.

From this, they learn to spot statistical patterns, which means they can predict the likeliest next word in a sentence. They function remarkably well, producing fluent answers to our every query.

The trouble is that the way AI chatbots work means that when they learn something, it can’t be unlearned. LLMs generate their responses based on aggregated data, so there is no easy way for them to forget or “delist” specific pieces of information, as search engines like Google can, or even for individuals to track down exactly what an AI app knows about them, says David Zhang, an AI researcher and engineer at CSIRO, Australia’s national science agency.

Privacy and GDPR

This creates a significant problem when it comes to privacy, as Zhang and his colleagues made clear in recent research. They highlighted how difficult it will be for AI companies to comply with the “right to be forgotten”, which the European Union declared a human right back in 2014.

Under the EU’s General Data Protection Regulation (GDPR), people have the right to request that their personal information be removed from an organisation’s records. Typically, on the internet, it is enforced in a variety of scenarios: not only do you have an undo button for your personal online content, such as social media posts or a public journal entry, but people also have the option to ask companies like Meta to offload the data they have collected about them. But such solutions aren’t compatible with AI chatbots, says Zhang. “Not offering a way to delete or forget data from their models’ memories is not upholding an individual’s right to erasure.”

AI companies will struggle to comply with the right to be forgotten
Estelle Lagarde/Millennium Images, UK

The companies behind AI chatbots are going to have to find a way to deal with the issue, however, especially as LLMs begin to be trained on more sensitive information, such as medical data, email inboxes and more, says Florian Tramèr, a computer scientist at ETH Zurich.

It gets worse, though, because AI-powered chatbots are also vulnerable to attacks in which information is concealed in the training data to trick the model into behaving in unintended ways. Security researchers have shown that this technique, known as “indirect prompt injection”, can be used to get chatbots to run code remotely on users’ devices, for example, or ask them to hand over their bank account details. GCHQ, the UK’s intelligence agency, has called attention to the problem. But the risks from malicious prompt injection are only set to increase.

The good news, given the dangers, is that the work to figure out how to selectively delete information from an AI’s knowledge base has begun. The bad news is that it is far from straightforward.

AI companies currently rely on band-aid fixes like “machine silencing”, where they program their services to block access to certain information and withhold responses. “I’m very sorry, but I can’t assist with that request,” says ChatGPT, for example, when I ask it to build a personal dossier on me. This approach can work to an extent, says Luciano Floridi, the director of Yale University’s Digital Ethics Center. But the target data is still there, he stresses, which means there is always the risk that it will surface in responses as a result of glitches or malicious interventions.

Large language models

The difficulty is that the most obvious way to induce amnesia in LLMs, namely to retrain the models with specific data points removed, is wildly impractical. It takes weeks of computation. What we really need to figure out is how to remove, or at least mask, specific pieces of information without having to retrain the model from scratch, says Yacine Jernite at AI firm Hugging Face. “It’s a fantastic research problem.”

Work towards solutions began in 2014, when Yinzhi Cao, then at Columbia University in New York, came up with a simple fix: rather than training the algorithm on the entirety of the available data, you break down what the algorithm learned from it into a series of smaller pieces known as summations. That way, when a person requests some information to be removed, you only have to modify the summation housing the data in question, dramatically reducing computation costs.

The principle is sound. But Cao’s particular method only worked for models that were far simpler than the LLMs behind today’s AI chatbots. In these new LLMs, fragments of data are so deeply intertwined that even isolating them down to a summation is unfeasible, says Cao, now at Johns Hopkins University in Baltimore.

Another method was put forward in 2019 by Nicolas Papernot at Toronto University in Canada and his colleagues. Known as sharded isolated sliced aggregated (SISA), it works with the more complex artificial neural networks behind many LLMs, and makes it easier to locate and delete specific data points. It splits datasets into various smaller chunks, and the model is trained on each separately before the results are combined. It saves its progress like checkpoints in a video game as it goes from one chunk to another. When it is faced with an unlearn request, it can return to a checkpoint, cut off the chunk that houses the data in question and retrain from there.

When Papernot and his team tested SISA training on two large datasets – one containing details of some 600,000 home addresses, another with 300,000 purchase histories – they found that it significantly sped up retraining compared with doing it from scratch.

SISA isn’t without problems, not least the fact that it could have a big negative impact on the AI’s performance. But the principle at its core has since inspired numerous iterations. In 2021, Min Chen at the CISPA Helmholtz Center for Information Security in Saarbrücken, Germany, methodically divided and merged data – rather than doing so randomly, as in SISA training – to more effectively unlearn data without comprimising too much in terms of the AI’s performance.

Pete Reynolds

Elsewhere, other groups are taking a slightly different approach. Because deleting data can be so detrimental to a machine learning model’s performance, some have instead chosen to hide or obscure the relevant data so that it can’t be extracted. Researchers at Microsoft and at Ohio State University, for instance, introduced noise into the information used to train a model in such a way that its subsequent outputs were shaped by broader patterns in the data rather than specific, identifiable examples. “This provides a theoretical guarantee that the model won’t reveal specifics about individuals in the training data,” says team member Xiang Yue at Ohio State University.

Such generalisation tends to undercut some of the statistical learning prowess that makes AI chatbots so powerful. To circumvent that problem, Minjoon Seo at the Korea Advanced Institute of Science and Technology in South Korea and his colleagues chose a post-hoc approach. With their method, which they call “knowledge unlearning”, the idea is to reverse the influence a piece of data had on the algorithm instead of deleting it altogether, so that the chatbot never references it. Knowledge unlearning has emerged as one of the most promising leads in this field, since it does the job with far fewer computing resources in less time, and works on a slightly older counterpart of the underlying design behind ChatGPT.

The truth is that there is no frontrunner in the machine unlearning race. Google has organised a contest to reward those who can come up with efficient solutions, which not only demonstrates the importance of the challenge, but also suggests we might soon have a better idea of which methods could ensure we have a new generation of LLMs that can forget what they have learned.

Selective memory

That is a goal worth pursuing because it could have broader implications than concerns over data protection and malicious misuse of AI chatbots, says Ali Boyle, a philosopher at the London School of Economics who works on AI. Although the human tendency to forget is often viewed as a cognitive glitch, it can sometimes be beneficial, since we don’t need to retain every piece of information we learn. By forgetting certain things, we make the process of retrieving the useful memories more efficient. The same might be true of AI systems, says Boyle.

She has argued that the principle was demonstrated in 2017, when researchers at Google DeepMind developed an AI that could play multiple Atari video games. It was effective at generalising its knowledge because, rather than learning in real time from its stream of experience, it stored memories of its game play that it could recall and learn from later. That is tantamount to memory, of course. But the researchers then refined the model such that it preferentially stored and recalled “surprising” events that diverged from its predictions – forgetting the rest of the data – and saw the system’s performance improve.

The implication is that for an AI, selectively forgetting can improve performance. The trick is to find the right balance between remembering too much and too little. But if the ultimate goal for AI researchers is to build systems that learn and think like humans, which was certainly one of the original targets for the field, then we need to design them to selectively forget. “Forgetting is not a design flaw,” says Boyle. “It is a design feature of an efficient, well-functioning memory system.”

Shubham Agarwal is a technology journalist based in Ahmedabad, India.

Science Hub