Can AI fact-check its own lies?

As AI car crashes go, the recent publishing of a hallucinated book list in the Chicago Sun-Times quickly became a multi-vehicle pile-up. After a writer used AI to create a list of summer reads, the majority of which were made-up titles, the resulting article sailed through lax editorial review at the Sun-Times (and at least one other newspaper) and ended up being distributed to thousands of subscribers. The CEO eventually published a lengthy apology. The most obvious takeaway from the incident is that it was a badly needed wake-up call about what can happen when AI gets too embedded in our information ecosystem. But CEO Melissa Bell resisted the instinct to simply blame AI, instead putting responsibility on the humans who use it and those who are entrusted with safeguarding readers from its weaknesses. She even included herself as one of those people, explaining how she had approved the publishing of special inserts like the one the list appeared in, assuming at the time there would be adequate editorial review (there wasn’t). The company has made changes to patch this particular hole, but the affair exposes a gap in the media landscape that is poised to get worse: as the presence of AI-generated content—authorized or not—increases in the world, the need for editorial safeguards also increases. And given the state of the media industry and its continual push to do “more with less,” it’s unlikely that human labor will scale up to meet the challenge. The conclusion: AI will need to fact-check AI. Fact-checking the fact-checker I know, it sounds like a horrible idea, somewhere between letting the fox watch the henhouse or sending Imperial Stormtroopers to keep the peace on Endor. But AI fact-checking isn’t a new idea: In fact, when Google Gemini first debuted (then called Bard), it shipped with an optional fact-check step if you wanted it to double-check anything it was telling you. Eventually, this kind of step simply became integrated into how AI search engines work, broadly making their results better, though still far from perfect. Newsrooms, of course, set a higher bar, and they should. Operating a news site comes with the responsibility to ensure the stories you’re telling are true, and for most sites the shrugging disclaimer of “AI can make mistakes,” while good enough for ChatGPT, doesn’t cut it. That’s why for most, if not all, AI-generated outputs (such as ESPN’s AI-written sports recaps), humans check the work. As AI writing proliferates, though, the inevitable question is: Can AI do that job? Put aside the weirdness for a minute and see it as math, the key number being how often it gets things wrong. If an AI fact-checker can reduce the number of errors by as much if not more than a human, shouldn’t it do that job? If you’ve never used AI to fact-check something, the recently launched service isitcap.com offers a glimpse at where the technology stands. It doesn’t just label claims as true or false—it evaluates the article holistically, weighing context, credibility, and bias. It even compares multiple AI search engines to cross-check itself. You can easily imagine a newsroom workflow that applies an AI fact-checker similarly, sending its analysis back to the writer, highlighting the bits that need shoring up. And if the writer happens to be a machine, revisions could be done lightning fast, and at scale. Stories could go back and forth until they reach a certain accuracy threshold, with anything that falls short held for human review. All this makes sense in theory, and it could even be applied to what news orgs are doing currently with AI summaries. Nieman Lab has an excellent write-up on how The Wall Street Journal, Yahoo News, and Bloomberg all use AI to generate bullet points or top-line takeaways for their journalism. For both Yahoo and the Journal, there’s some level of human review on the summaries (for Bloomberg, it’s unclear from the article). These organizations are already on the edge of what’s acceptable—balancing speed and scale with credibility. One mistake in a summary might not seem like much, but when trust is already fraying, it’s enough to shake confidence in the entire approach. Human review helps ensure accuracy, of course, but also requires more human labor—something in short supply in newsrooms that don’t have a national footprint. AI fact-checking could give smaller outlets more options with respect to public-facing AI content. Similarly, Politico’s union recently criticized the publication’s AI-written reports for subscribers based on the work of its journalists, because of occasional inaccuracies. A fact-checking layer might prevent at least some embarrassing mistakes, like attributing political stances to groups that don’t exist. The AI trust problem that won’t go away Using AI to fight AI hallucination might make mathematical sense if it can prevent serious errors, but there’s another problem that stems from relying even more on mach

Jun 13, 2025 - 18:30

As AI car crashes go, the recent publishing of a hallucinated book list in the Chicago Sun-Times quickly became a multi-vehicle pile-up. After a writer used AI to create a list of summer reads, the majority of which were made-up titles, the resulting article sailed through lax editorial review at the Sun-Times (and at least one other newspaper) and ended up being distributed to thousands of subscribers. The CEO eventually published a lengthy apology.

The most obvious takeaway from the incident is that it was a badly needed wake-up call about what can happen when AI gets too embedded in our information ecosystem. But CEO Melissa Bell resisted the instinct to simply blame AI, instead putting responsibility on the humans who use it and those who are entrusted with safeguarding readers from its weaknesses. She even included herself as one of those people, explaining how she had approved the publishing of special inserts like the one the list appeared in, assuming at the time there would be adequate editorial review (there wasn’t).

The company has made changes to patch this particular hole, but the affair exposes a gap in the media landscape that is poised to get worse: as the presence of AI-generated content—authorized or not—increases in the world, the need for editorial safeguards also increases. And given the state of the media industry and its continual push to do “more with less,” it’s unlikely that human labor will scale up to meet the challenge. The conclusion: AI will need to fact-check AI.

Fact-checking the fact-checker

I know, it sounds like a horrible idea, somewhere between letting the fox watch the henhouse or sending Imperial Stormtroopers to keep the peace on Endor. But AI fact-checking isn’t a new idea: In fact, when Google Gemini first debuted (then called Bard), it shipped with an optional fact-check step if you wanted it to double-check anything it was telling you. Eventually, this kind of step simply became integrated into how AI search engines work, broadly making their results better, though still far from perfect.

Newsrooms, of course, set a higher bar, and they should. Operating a news site comes with the responsibility to ensure the stories you’re telling are true, and for most sites the shrugging disclaimer of “AI can make mistakes,” while good enough for ChatGPT, doesn’t cut it. That’s why for most, if not all, AI-generated outputs (such as ESPN’s AI-written sports recaps), humans check the work.

As AI writing proliferates, though, the inevitable question is: Can AI do that job? Put aside the weirdness for a minute and see it as math, the key number being how often it gets things wrong. If an AI fact-checker can reduce the number of errors by as much if not more than a human, shouldn’t it do that job?

If you’ve never used AI to fact-check something, the recently launched service isitcap.com offers a glimpse at where the technology stands. It doesn’t just label claims as true or false—it evaluates the article holistically, weighing context, credibility, and bias. It even compares multiple AI search engines to cross-check itself.

You can easily imagine a newsroom workflow that applies an AI fact-checker similarly, sending its analysis back to the writer, highlighting the bits that need shoring up. And if the writer happens to be a machine, revisions could be done lightning fast, and at scale. Stories could go back and forth until they reach a certain accuracy threshold, with anything that falls short held for human review.

All this makes sense in theory, and it could even be applied to what news orgs are doing currently with AI summaries. Nieman Lab has an excellent write-up on how The Wall Street Journal, Yahoo News, and Bloomberg all use AI to generate bullet points or top-line takeaways for their journalism. For both Yahoo and the Journal, there’s some level of human review on the summaries (for Bloomberg, it’s unclear from the article). These organizations are already on the edge of what’s acceptable—balancing speed and scale with credibility. One mistake in a summary might not seem like much, but when trust is already fraying, it’s enough to shake confidence in the entire approach.

Human review helps ensure accuracy, of course, but also requires more human labor—something in short supply in newsrooms that don’t have a national footprint. AI fact-checking could give smaller outlets more options with respect to public-facing AI content. Similarly, Politico’s union recently criticized the publication’s AI-written reports for subscribers based on the work of its journalists, because of occasional inaccuracies. A fact-checking layer might prevent at least some embarrassing mistakes, like attributing political stances to groups that don’t exist.

The AI trust problem that won’t go away

Using AI to fight AI hallucination might make mathematical sense if it can prevent serious errors, but there’s another problem that stems from relying even more on machines, and it’s not just a metallic flavor of irony. The use of AI in media already has a trust problem. The Sun-Times‘ phantom book list is far from the first AI content scandal, and it certainly won’t be the last. Some publications are even adopting anti-AI policies, forbidding its use for virtually anything.

Because of AI’s well-documented problems, public tolerance for machine error is lower than for human error. Similarly, if a self-driving car gets into an accident, the scrutiny is obviously much greater than if the car was driven by a person. You might call this the automation fallout bias, and whether you think it’s fair or not, it’s undoubtedly true. A single high-profile hallucination that slips through the cracks could derail adoption, even if it might be statistically rare.

Add to that what would probably be painful compute costs for multiple layers of AI writing and fact-checking, not to mention the increased carbon footprint. All to improve AI-generated text—which, let’s be clear, is not the investigative, source-driven journalism that still requires human rigor and judgment. Yes, we’d be lightening the cognitive load for editors, but would it be worth the cost?

Despite all these barriers, it seems inevitable that we will use AI to check AI outputs. All indications point to hallucinations being inherent to generative technology. In fact, newer “thinking” models appear to hallucinate even more than their less sophisticated predecessors. If done right, AI fact-checking would be more than a newsroom tool, becoming part of the infrastructure for the web. The question is whether we can build it to earn trust, not just automate it.

The amount of AI content in the world can only increase, and we’re going to need systems that can scale to keep up. AI fact-checkers can be part of that solution, but only if we manage—and accept—their potential to make errors themselves. We may not yet trust AI to tell the truth, but at least it can catch itself in a lie.