How worried should we be about AI chatbots using our data?

DEEPSEEK, THE CHINESE company behind a new open-source AI chatbot, was issued a warning letter last week by the Irish Data Protection Commission (DPC) voicing concerns over how the firm’s large language model used citizens’ personal data.

It’s not the first time this complex issue has come up as the DPC works to regulate fast-moving technology developments.

The Irish data regulator recently requested legal advice from the EU on how to tackle the issue.

Ireland’s request has left quite a bit of uncertainty surrounding the future of AI and how companies and developers train their large language models and robots.

So how likely is it that these robots have used our data? Could regulators pour cold water over developers’ dream systems?

What does the opinion say?

Ireland requested legal assistance from the European Data Protection Bureau late last year, after Meta paused its collection of Facebook user data following a request to do so from the DPC.

The issue boils down to whether companies which develop AI technologies use citizens’ private data, without consent, when building the robots.

If it were found that companies did so, their systems could be determined unlawful and shut down and the firms could face hefty fines.

We asked Deepseek what it thought about all of this.

Following the legal opinion issued by the European Data Protection Bureau, AI companies must satisfy a four-part test.

Companies must prove:

The data was obtained legally under General Data Protection Regulation (GDPR) rules.
There was a legitimate reason to use the personal data.
The use of the data is necessary to use the software.
That developers did not override laws to benefit their business interests.

What type of data might be used?

The personal data used by companies could range from social media posts to hospital documents, according to Deepak Padmanabhan, a senior lecturer at the School of Electronics, Electrical Engineering and Computer Science at Queen’s University Belfast.

Companies can collect this data through a method known as ‘web scraping’, when a machine scours the internet to answer a query inputted by the user.

This, in theory, is already illegal under the EU’s GDPR laws, but only in cases where web scraping was carried out ‘indiscriminately’.

Companies can argue that their machines are not in breach of data laws when a user requests specific information and directs the robot to seek out only that data.

This information gathering is quite complex, Padmanabhan said, and has changed over time.

coding Padmanabhan said the technology has gotten more complex over the last decade. Alamy Alamy

While systems such as predictive texting used thousands of documents to find out the most-likely next word, large language models use the information to monitor more obscure things, such as trends, slang and topics.

Developers could use a lot of personal data to teach their machines to look out for these patterns and trends, Padmanabhan said, and some personal information might well be included in their training datasets for that reason.

So should citizens be concerned? Not really, according to Padmanabhan.

“It’s enshrined in the AI community that if you use personal data to build your model, you should develop it in such a way that a user will not be able to tell that their data has been used,” he said.

Padmanabhan said experts tend to steer clear of using personal data for AI systems, except in purpose-built applications.

How likely is it that my data has been used?

Associate Professor at Sutherland School of Law in University College Dublin (UCD) TJ McIntyre said is “very likely” that many AI chatbot systems will be found in breach of data protection laws in the future, following the standards of the recent EU legal opinion.

This is because it is probable that developers will “fail” at the first step of the EU’s test, he told The Journal, namely compliance with GDPR.

McIntyre, a leading data protection academic at UCD and a practicing data protection lawyer at FP Logue LLP, said it is likely many developers have completed “indiscriminate web scraping” at the development stage of their robots.

1334 Public Service Card scheme_90526222 McIntyre (pictured in 2017) believes there's enough case law to challenge chatbots' use of data. RollingNews.ie RollingNews.ie

The lawyer disagreed with Padmanabhan’s assessment that AI developers act in good faith, arguing that the issue was not how well the computer systems hide personal data but that the information was used without the owner’s consent in the first place.

He said the ‘out of sight, out of mind’ approach is “simply wrong as a matter of law”.

“It reflects a fundamental contempt for individuals – not ‘users’ as techbros would like to rebrand citizens.”

“How could anyone trust a service which shows such disregard for individuals?

“A central principle of data protection law is transparency, and developers who attempt to conceal their use of personal data should expect higher fines as a result,” he said.

McIntyre added that unless developers can prove they took mitigating measures outlined in the legal opinion, such as seeking consent first and not putting business interests ahead of the rights of citizens, it is likely developers will also fail the fourth step.

So, are AI chatbots on a collision course with regulators?

Well, we don’t know.

McIntyre said that there is “quite a bit of case law” on the use of personal data and web scraping, “albeit in other contexts”.

He said that a determination on whether the AI chatbots are in breach of data protection laws will depend on individual circumstances. He warned, however, that companies will most likely be found in breach of web scraping laws.

But there’s a “key gap” in the EU’s opinion, McIntyre said, as it did not specifically state whether any type of web scraping to create these machines is compatible or not with laws relating to GDPR – if we are to assume that indiscriminate scraping is itself a breach.

McIntyre pointed to recently-published guidelines from the Dutch data protection authority that said all web scraping must be strictly limited and targeted – alluding to the need for a legal precedent to be set first.

Readers like you are keeping these stories free for everyone...

Our Explainer articles bring context and explanations in plain language to help make sense of complex issues. We're asking readers like you to support us so we can continue to provide helpful context to everyone, regardless of their ability to pay.

Learn More Support The Journal