AI chatbots are scraping news reporting and copyrighted content, says US news media trade group
New York, Nov 1: A top US news media trade group is calling out AI technology companies for scraping news material to train their chatbots, a media report said.
The News Media Alliance, which represents nearly 2,000 outlets in the US, published a research that found developers of generative artificial intelligence (AI) systems, such as OpenAI and Google, “have copied and used news, magazine, and digital media content to train” their bots, CNN reported.
Importantly, the research indicated that AI companies have trained their bots to give far more credence to information published by those credible publishers versus material elsewhere across the web, the report said.
“The research and analysis we’ve conducted show that AI companies and developers are not only engaging in unauthorised copying of our members’ content to train their products, but they are using it pervasively and to a greater extent than other sources,” Danielle Coffey, Chief Executive of the News Media Alliance, said in a statement, CNN reported.
“This shows they recognise our unique value, and yet most of these developers are not obtaining proper permissions through licensing agreements or compensating publishers for the use of this content,” Coffey added.
“This diminishment of high-quality, human created content harms not only publishers but the sustainability of AI models themselves and the availability of reliable, trustworthy information.”
In the published white paper, the trade group also rejected arguments that AI bots have simply “learned” facts by reading various sets of data, like a human being would.
The group said “it is inaccurate” to form such a conclusion “because models retain the expressions of facts that are contained in works in their copied training materials (and which copyright protects) without ever absorbing any underlying concepts”, CNN reported.