Tech Giants Used YouTube Videos Without Consent to Train AI

Tech Giants Used YouTube Videos Without Consent to Train AI

A recent report by Proof News has uncovered that several major tech companies, including Apple, Nvidia, and Salesforce, have utilized subtitle files from over 170,000 YouTube videos to train their AI systems without obtaining permission from the content creators. These subtitle files, essentially transcripts of the video content, were collected by EleutherAI, a nonprofit organization that compiles open datasets to aid in AI model training. The dataset, known as "YouTube Subtitles," is part of a larger compilation called The Pile, which includes various other sources like European Parliament materials, English Wikipedia articles, and Enron Corporation emails.

The investigation highlights that the dataset includes videos from popular YouTube creators such as MrBeast and Marques Brownlee, as well as from news outlets like ABC News, the BBC, and The New York Times. The usage of these subtitle files breaches YouTube's rules against unauthorized harvesting of content. While the dataset was initially intended for academic and research purposes, it was also employed by large tech firms. Salesforce confirmed its use of the dataset in an AI model for academic research, which was later released to the public in 2022. The report raises significant ethical and legal questions regarding the consent and use of third-party data in AI training.

Summary

Other news in technology