UK kicks off review into training AI models on copyrighted content

On Dec. 9, OpenAI made its artificial intelligence video generation model Sora publicly available in the U.S. and other countries.

Cfoto | Future Publishing | Getty Images

The U.K. is drawing up measures to regulate the use of copyrighted content by tech companies to train their artificial intelligence models.

The British government on Tuesday kicked off a consultation which aims to increase clarity for both the creative industries and AI developers when it comes to both how intellectual property is obtained and then used by AI firms for training purposes.

Some artists and publishers are unhappy with the way their content is being scraped freely by companies like OpenAI and Google to train their large language models — AI models trained on huge quantities of data to generate humanlike responses.

Large language models are the foundational technology behind today’s generative AI systems, including the likes of OpenAI’s ChatGPT, Google’s Gemini and Anthropic’s Claude.

Last year, The New York Times brought a lawsuit against Microsoft and OpenAI accusing the companies of infringing its copyright and abusing intellectual property to train large language models.

In response, OpenAI disputed the NYT’s allegations, stating that the use of open web data for training AI models should be considered “fair use” and that it provides an “opt-out” for rights holders “because it’s the right thing to do.”

Separately, image distribution platform Getty Images sued another generative AI firm, Stability AI, in the U.K., accusing it of scraping millions of images from its websites without consent to train its Stable Diffusion AI model. Stability AI has disputed the suit, noting that the training and development of its model took place outside the U.K.

Proposals to be considered

First, the consultation will consider making an exception to copyright law for AI training when used in the context of commercial purposes but while still allowing rights holders to reserve their rights so they can control the use of their content.

Second, the consultation will put forward proposed measures to help creators license and be remunerated for the use of their content by AI model makers, as well as give AI developers clarity over what material can be used for training their models.

The government said more work needs to be done by both the creative industries and technology firms to ensure any standards and requirements for rights reservation and transparency are effective, accessible and widely adopted.

The government is also considering proposals that would require AI model makers to be more transparent about their model training datasets and how they’re obtained so that rights holders can understand when and how their content has been used to train AI.

That could prove controversial — technology firms aren’t especially forthcoming when it comes to the data that fuels their coveted algorithms or how they train them up, given the commercial sensitivities involved in revealing those secrets to potential competitors.

Previously, under former Prime Minister Rishi Sunak, the government attempted to agree a voluntary AI copyright code of practice.

AI copyright rules: U.K. versus U.S.

In a recent interview with CNBC, the boss of app development software firm Appian said he thinks the U.K. is well placed to be the “global leader on this issue.”

“The U.K. has put a stake in the ground declaring its prioritization of personal intellectual property rights,” Matt Calkins, Appian’s CEO, told CNBC. He cited 2018’s Data Protection Act as an example of how the U.K. is “closely associated with intellectual property rights.”

The U.K. is also not “subject to the same overwhelming lobbying blitz from domestic AI leaders that the U.S. is,” Calkins added — meaning it might not be as prone to bowing down to pressure from tech giants as politicians stateside.

“In the U.S., anybody who writes a law about AI is going to hear from Amazon, Oracle, Microsoft or Google before that bill even reaches the floor,” Calkins said.

“That’s a powerful force stopping anyone from writing sensible legislation or protecting the rights of individuals whose intellectual property is being taken wholesale by these major AI players.”

The issue of potential copyright infringement by AI firms is becoming more notable as tech firms are moving toward a more “multimodal” form of AI — that is, AI systems that can understand and generate content in the form of images and video as well as text.

Last week, OpenAI made its AI video generation model Sora publicly available in the U.S. and “most countries internationally.” The tool allows a user to type out a desired scene and produce a high-definition video clip.