Uncategorized

Apple, Anthropic, and other companies used videos on YouTube to train their machines

An Interactive Tool to Find Your Content in a News-Focus Dataset and Explain Its Implications for the Data-hungry AI Models

Tech companies are using controversial methods to feed their data-hungry artificial intelligence models and are often unaware of what they’re doing.

Some clips from news sources like ABC News and The New York Times can be found in the dataset. More than 100 videos from The Verge are included in the dataset.

Proof News created an interactive tool as part of its investigation. You can use its search feature to see if your content — or your favorite YouTuber’s — appears in the dataset.

She told The Wall Street Journal that the data was publicly available or licensed, but she did not want to give any more details. When pressed by the Journal about YouTube content specifically, Murati said she “wasn’t sure about that.”

“We have terms and conditions, and we would expect people to abide by those terms and conditions when you build a product, so that’s how I felt about it,” Pichai said.

The David Pakman Show: When AI Companies Pay to Promote Artificial Intelligence (Research Paper) Thousands of YouTube Videos and How They Can Be Used for Training AI

Proof News also found material from YouTube megastars, including MrBeast (289 million subscribers, two videos taken for training), Marques Brownlee (19 million subscribers, seven videos taken), Jacksepticeye (nearly 31 million subscribers, 377 videos taken), and PewDiePie (111 million subscribers, 337 videos taken). Some of the material used to train AI also promoted conspiracies such as the “flat-earth theory.”

David Pakman, host of The David pakman Show which has more than 2 million subscribers and more than 2 billion views, said that no one had asked him to use it. Nearly 160 of his videos were swept up into the YouTube Subtitles training dataset.

Four people work full time on Pakman’s enterprise, which posts multiple videos each day in addition to producing a podcast, TikTok videos, and material for other platforms. If AI companies are paid, Pakman said, he should be compensated for the use of his data. He pointed out that some media companies have recently penned agreements to be paid for use of their work to train AI.

“This is my livelihood, and I put time, resources, money, and staff time into creating this content,” Pakman said. “There’s really no shortage of work.”