Skip to content
Blimey!

Judge: Pirate libraries may have profited from Meta torrenting 80TB of books

Meta may defeat authors’ torrenting claim due to lack of evidence.

Ashley Belanger | 83
Story text

Now that Meta has largely beaten an AI training copyright lawsuit raised by 13 book authors—including comedian Sarah Silverman and Pulitzer Prize-winning author Junot Diaz—the only matter left to settle in that case is whether Meta violated copyright laws by torrenting books used to train Llama models.

In an order that partly grants Meta's motion for summary judgment, judge Vince Chhabria confirmed that Meta and the authors would meet on July 11 to "discuss how to proceed on the plaintiffs’ separate claim that Meta unlawfully distributed their protected works during the torrenting process."

Chhabria's order suggested that authors may struggle to win this part of the fight, too, due to a lack of evidence, as there has not yet been much discovery on this issue that was raised so late in the case. But he also warned that Meta was wrong to argue its torrenting was completely "irrelevant" to whether its copying of books was fair use.

Chhabria suggested the torrenting—which may have comprised more than 80.6 terabytes of data from one shadow library, LibGen—is "at least potentially relevant" in "a few different ways."

First, Chhabria noted that Meta deciding to pirate books from shadow libraries was "relevant to the issue of bad faith." That’s connected to the first factor of a fair use analysis, which weighs the character of the use.

Authors had argued that Meta had sparked conversations with some publishers about licensing authors' works, but "after failing to acquire licenses," CEO Mark Zuckerberg "escalated" the issue, Chhabria explained. That prompted a decision to acquire books from pirate libraries instead, Chhabria wrote, with Meta admittedly using BitTorrent to seize data after abandoning its pursuit of licensing deals for the same books.

However, that aspect of the trial may not matter much, since Chhabria noted that "the law is in flux about whether bad faith is relevant to fair use."

It could certainly look worse for Meta if authors manage to present evidence supporting the second way that torrenting could be relevant to the case, Chhabaria suggested.

"Meta downloading copyrighted material from shadow libraries" would also be relevant to the character of the use, "if it benefitted those who created the libraries and thus supported and perpetuated their unauthorized copying and distribution of copyrighted works," Chhabria wrote.

Counting potential strikes against Meta, Chhabria pointed out that the "vast majority of cases" involving "this sort of peer-to-peer file-sharing" are found to "constitute copyright infringement." And it likely doesn't help Meta's case that "some of the libraries Meta used have themselves been found liable for infringement."

However, Meta may overcome this argument, too, since book authors "have not submitted any evidence" that potentially shows how Meta's downloading may perhaps be "propping up" or financially benefiting pirate libraries.

Finally, Chhabria noted that the "last issue relating to the character of Meta’s use" of books in regards to its torrenting is "the relationship between Meta’s downloading of the plaintiffs’ books and Meta’s use of the books to train Llama."

Authors had tried to argue that these elements were distinct. But Chhabria said there's no separating the fact that Meta downloaded the books to serve the "highly transformative" purpose of training Llama.

"Because Meta’s ultimate use of the plaintiffs’ books was transformative, so too was Meta’s downloading of those books," Chhabria wrote.

AI training rulings may get more authors paid

Authors only learned of Meta's torrenting through discovery in the lawsuit, and because of that, Chhabria noted that "the record on Meta’s alleged distribution is incomplete."

It's possible that authors may be able to show evidence that Meta "contributed to the BitTorrent network" by providing significant computing power that could've meaningfully assisted shadow libraries, Chhabria said in a footnote.

But Chhabria dinged authors for citing only an outdated Ars Technica article from 2010 that suggested that people only rarely used torrents to pirate books. (E-book piracy has significantly spiked since then, as TorrentFreak has documented in more recent reports that also note research showing that taking pirated books offline can benefit book sales.)

More will be revealed as the Meta case advances next month, but Chhabria noted that one potential outcome, win or lose for authors, could be that publishers become incentivized to make it easier to license authors' works for AI training.

"Publishers may not currently hold the subsidiary rights necessary to make group licensing possible," Chhabria wrote. "But it’s hard to believe that they won’t soon start negotiating those rights with their authors so that they can engage in large-scale negotiation and licensing" with large language model (LLM) developers—"assuming they haven’t already started to do so."

"It seems especially likely that these licensing markets will arise if LLM developers’ only choices are to get licenses or forgo the use of copyrighted books as training data," Chhabria noted.

That could be the outcome if other authors suing AI companies secure victories that Chhabria views as inevitable. They would need to show evidence that AI products dilute markets for their works, which the authors suing Meta failed to do.

In his ruling granting Meta the win against authors' copyright infringement claims, Chhabria suggested that Meta won only because authors raised the "wrong arguments," suggesting Meta may be more inclined to renew licensing talks in the future if a stronger copyright fight is raised, despite winning this landmark copyright battle against a handful of authors this week.

And if AI companies facing that potential reality "instead choose to use only public domain works as training data (instead of licensing copyrighted works), that would indicate that they don’t actually need the copyrighted works as badly as they say they do," Chhabria wrote. And if that's true, there's likely little excuse for torrenting of pirated books that authors otherwise had long considered an obvious example of copyright infringement.

Photo of Ashley Belanger
Ashley Belanger Senior Policy Reporter
Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.
83 Comments