Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn’t stop emailing each other about it: ‘Torrenting from a corporate laptop doesn’t feel right’

First reported by Ars Technica, the copyright case against Facebook parent company Meta over its use of authors’ work to train large language models has unearthed some embarrassing dirty laundry in discovery. Dozens of emails, allegedly between Meta employees, discuss torrenting massive amounts of pirated material⁠—and seeding those torrents to boot⁠—in order to train the company’s AI models.

It was revealed via court documents last month that Meta had obtained AI training data from LibGen, a large file sharing database that includes everything from paywalled news and academic articles, to whole books. The prosecution alleges that Meta downloaded over 80 terabytes from LibGen and another so-called “shadow library” by the name of Z-Library. This is, to be clear, internet piracy on a scale that would make a Nintendo lawyer blush, and the lawsuit alleges the emails put in writing “Meta’s decision to take and use copyrighted works without permission that it knew to be pirated, despite clear ethical concerns.”

One of the emails in evidence quotes an alleged Meta employee futilely advising that “using pirated material should be beyond our ethical threshold” before arguing that databases like LibGen “are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they’re infringing it.”

There are repeated examples of emails ascribed to Meta employees flagging the use of LibGen as a concern, either in failed “lone sane man fashion,” or in the context of hiding the activity. One researcher proposed only accessing LibGen through a VPN, and later joked that “torrenting from a corporate laptop doesn’t feel right 😂.”

Meta would ultimately operate in “stealth mode,” to quote one AI researcher at the company, concealing the activity by only downloading and seeding the torrents outside official Facebook servers. As an aside: It was real neighborly of them to seed the torrents too! Wonder how good their ratios were.

The prosecution further argues that these discovery documents⁠ suggest that Meta executives up to and including Mark Zuckerberg were aware of the use of pirated material to train AI models at the company. Another detail that stands out to me: The emails filed as evidence indicate that Meta employees believed OpenAI used LibGen for its own models, framing the company’s use of the database as a sort of arms race.

If the Internet Archive isn’t allowed to loan books as a digital library, I don’t think companies like Meta should be allowed to swallow up terabytes of pirated material to train a chatbot that will lie to you about how many planets are in the solar system. In a twist of fate, our international copyright regime looks to be one of the most sturdy bulwarks against an AI future. I’m no fan of the Digital Millennium Copyright Act, but I say let them fight.

One other thing I just can’t escape is how low-rent this all is: Our Silicon Valley thought leaders and mavericks need unprecedented injections of capital in order to… do internet piracy and conquer a new frontier in cheating on your homework? The sheer body of written communication allegedly confirming it all is just the cherry on top of a schadenfreude sundae. “Subject: Forwarded: Re:Re:Re:Re: Crimes.” I’m reminded of how Valve was saved from ruin by a similar disregard for opsec on the part of its former publisher Vivendi, or, indeed, that one I Think You Should Leave sketch.

Source link

Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn’t stop emailing each other about it: ‘Torrenting from a corporate laptop doesn’t feel right’

Add comment

Cancel reply

The best Half-Life 2 mods: 20th anniversary edition

Frostpunk 2 Review – Drawing A Line In The Snow

Rise of the Ronin review – samurai action that’s as grounded as it is approachable

Advertisement

Popular Posts

7 Creative Business Concepts for The Sims 4″ “Inspirational Small Business Ventures for The Sims 4” “Top 7 Entrepreneurial Ideas for The Sims 4” “Unique Business Startup Ideas in The Sims 4” “7 Fun and Profitable Sims 4 Business Ventures” “Sims 4: 7 Small Business Ideas to Try

Mastering the Gameplay of Human Torch

Assassin’s Creed Shadows: Understanding the Mechanics of Mastery and Knowledge

Latest Posts

Neil Druckmann Explains the "Significant Purpose" Behind the Return of Spores in The Last of Us Season 2

"PS5 Version of Indiana Jones and The Great Circle Rated, Bringing Xbox and PlayStation Fans Closer Through a Shared Passion for Whips and Adventure"

Mickey 17 by Bong Joon Ho Shares Important Connections with Parasite and Snowpiercer

Advertisement

Court documents show not only did Meta torrent terabytes of pirated books to train AI models, employees wouldn’t stop emailing each other about it: ‘Torrenting from a corporate laptop doesn’t feel right’

Add comment

You may also like

Advertisement

Popular Posts

Latest Posts

Advertisement