According to a UBS study, ChatGPT was estimated to have reached 100 million monthly active users in just two months after launch. This would make it the fastest-growing consumer application in history. By now, you have no doubt read all the stories, seen all the hype, and likely been in meetings with executives asking “How can we use this technology to further our business?” According to the internet pundits, nothing will ever be the same again.
As the hype cycle continues around ChatGPT and new AI technologies more broadly, many are beginning to take a step back from the creative “what if” discussion and ask questions like “Is this safe?” and “Is this in the best interest of consumers?”
More concretely, the question of “Is this legal and compliant?” is starting to grow in volume.
Beginning with a bug on March 24 which leaked the personal information and chat histories for as many as 1.3 percent of ChatGPT users, privacy concerns started to be voiced. On March 30, the Italian Data Protection Authority (Garante per la protezione dei dati personali) officially opened the question of compliance by effectively banning ChatGPT from operating in Italy pending a further investigation of GDPR concerns. In the following two weeks, the CNIL in France opened their own GDPR investigations and the Spanish AEPD requested for the European Data Protection Board to formally look into the technology. The concern culminated with the EDPB introducing a dedicated task force on ChatGPT on April 13. The task force is meant to foster cooperation and exchange information on possible enforcement actions from European Data Protection Authorities.
So what are the privacy concerns being raised? And what insights can we take from these concerns to apply to our own usage of AI tools and techniques within our own organization? Let’s explore.
Concern About Data Used to Train Models
AI chatbots—such as ChatGPT—currently use large language models that are trained on a mix of datasets. This mix includes data scraped from the internet. The concern begins with the fact that no one seems to know where all of the data is coming from. Further, some of this information scraped and included in training data is likely to contain personal information and personal data of individuals. While privacy regulations vary by location, generally there is some level of protection to understand what personal data is being processed and to have a documented reason for that processing. Specifically, the GDPR in the EU requires a lawful basis of processing for personal data, including that which is made public. All of the sources of personal data would need to be identified, with a lawful basis defined, and protections in place. None of these concerns seem to be addressed today.
Privacy regulations across the globe require disclosure to individuals about how their data is collected and processed. If we can’t even identify where data is coming from and what data is included in training datasets, how can we be transparent with consumers about what is collected from them and the results of the processing?
Disclosures must be made available to users at the point of collection of their personal data for these purposes and descriptions of the processing must be understandable. This includes the need to explain how the system makes decisions about users so they can understand impacts and exercise privacy rights in an informed manner. This understanding is often very difficult when using newer types of AI algorithms.
Questions About How Consumers Can Exercise Privacy Rights
Central to this concern, specifically for GDPR, is the consumer’s “right to be forgotten”. This is the consumer’s right to have their data deleted upon a verifiable request. Beyond the EU and GDPR, this right is common in regulations globally. Operationally it is difficult enough to execute these requests for deletion in traditional databases; deleting personal data from machine learning models is significantly more difficult and in some cases impossible to do while maintaining the utility of the model.
Inaccuracy in Processing of Personal Data
Another requirement of GDPR is the consumer’s right to have their personal data be kept accurate and up-to-date. Specifically cited by the Italian DPA is the concern that because information provided by the AI bot does not always match real data, the potential for inaccuracy could result in a GDPR violation. Questions of accuracy also pervade the discussions around ethics of AI and the risks of disinformation.
To address these privacy concerns and lift the current “ban”, on April 13, 2023 the Italian Garante issued a list of requirements for OpenAI to meet and demonstrate compliance with GDPR:
- Publish an information notice detailing data processing
- Adopt an age gate to prevent minors from accessing ChatGPT and implement robust age verification measures
- Clarify the legal basis claimed for processing personal data for training its AI
- This includes the stipulation that they cannot rely on “performance of a contract”. This will force the use of either “legitimate interest” or “consent”, both of which carry at least the requirement for people to opt-out.
- Provide ways for users and non-users to exercise privacy rights
- Provide users with the ability to object to OpenAI’s processing of their data for training its algorithms
- Conduct a local awareness campaign to inform Italians that it’s processing their information to train its AIs
All of this must be satisfied by April 30, 2023.
Whether OpenAI can or will meet these requirements—as well as the broader legal questions raised about compliance—are yet to be seen. Many will be answered in the regulatory process. But what can businesses interested in using these technologies and developing their own capabilities look to for their own guidance to limit compliance risk?
According to a recent IAPP study, more than 60 percent of organizations have published guidelines for the ethical use of AI. So start with your internal legal teams and processes— there’s a better than 50 percent chance that some guidance is already in place.
From there, consider current laws and regulations in place. With the GDPR as an example, ensure that you are aware of the data being used to train machine learning models, identify if and where personal data is included, and ensure that it has been collected and processed in a compliant manner. Be transparent with your consumers about what information is being used, where AI and automated processing is in place, and what the results are for them. Ask yourself, can I meet the requirements outlined by the Italian DPA for OpenAI in my own data collection and model training? Simply by following the rules as currently written, you’ll be in a good place regardless of the novelty of the technology introduced.
Beyond what is currently there, actions always follow hype. We are seeing this in the form of industry and regulatory bodies working on targeted regulations and controls. The EU is working on a new AI Act targeted specifically at AI technology regulation, while industry groups are publishing content and knowledge, and regulatory bodies are publishing their own guidance. This is a nascent area of technology with much more to come. Pay attention to the fundamentals, follow the requirements as written, and respect the privacy rights of your consumers.