Elon Musk's X arguably revolutionized social media fact-checking by rolling out "community notes," which created a system to crowdsource diverse views on whether certain X posts were trustworthy or not.
But now, the platform plans to allow AI to write community notes, and that could potentially ruin whatever trust X users had in the fact-checking system—which X has fully acknowledged.
In a research paper, X described the initiative as an "upgrade" while explaining everything that could possibly go wrong with AI-written community notes.
In an ideal world, X described AI agents that speed up and increase the number of community notes added to incorrect posts, ramping up fact-checking efforts platform-wide. Each AI-written note will be rated by a human reviewer, providing feedback that makes the AI agent better at writing notes the longer this feedback loop cycles. As the AI agents get better at writing notes, that leaves human reviewers to focus on more nuanced fact-checking that AI cannot quickly address, such as posts requiring niche expertise or social awareness. Together, the human and AI reviewers, if all goes well, could transform not just X's fact-checking, X's paper suggested, but also potentially provide "a blueprint for a new form of human-AI collaboration in the production of public knowledge."
Among key questions that remain, however, is a big one: X isn't sure if AI-written notes will be as accurate as notes written by humans. Complicating that further, it seems likely that AI agents could generate "persuasive but inaccurate notes," which human raters might rate as helpful since AI is "exceptionally skilled at crafting persuasive, emotionally resonant, and seemingly neutral notes." That could disrupt the feedback loop, watering down community notes and making the whole system less trustworthy over time, X's research paper warned.
"If rated helpfulness isn’t perfectly correlated with accuracy, then highly polished but misleading notes could be more likely to pass the approval threshold," the paper said. "This risk could grow as LLMs advance; they could not only write persuasively but also more easily research and construct a seemingly robust body of evidence for nearly any claim, regardless of its veracity, making it even harder for human raters to spot deception or errors."