What is HelloBuilder.ai?

HelloBuilder.ai is a daily newsletter for AI builders featuring how regular people are building real products, automating their work, and creating their dream life with AI and vibe coding. Join 1000+ builders getting daily updates.

Vibe coding is building software using AI tools like Cursor, Claude, v0, and Bolt without traditional coding skills. It focuses on describing what you want to build and letting AI handle the implementation details.

Is the HelloBuilder newsletter free?

Yes! HelloBuilder.ai newsletter is completely free. We deliver daily stories of what builders shipped, tool updates, and real product launches to your inbox every morning at no cost.

How often do you send newsletters?

We send one newsletter every weekday morning, featuring the latest from the AI builder economy. You can unsubscribe anytime.

Which AI Hallucinates Most? ChatGPT vs Claude vs Perplexity Tested

A 1,000-prompt test reveals which AI chatbots hallucinate most. ChatGPT scored 12%, Claude 15%, and Perplexity 3.3%—but there's a surprising catch.

By HelloBuilder Team•November 28, 2025•5 min read

As AI chatbots become our go-to tools for research, writing, and problem-solving, one critical question emerges: how often do these systems make things up? When an AI confidently states false information—a phenomenon known as "hallucination"—the consequences can range from embarrassing to dangerous, especially in professional or academic contexts.

According to a recent experiment shared on Reddit by user BluebirdFront9797, a comprehensive test of three leading AI models reveals surprising differences in how often they hallucinate—and why raw accuracy numbers don't tell the whole story.

The Testing Methodology: How to Catch an AI in a Lie

The researcher designed a rigorous three-step verification process to test 1,000 identical prompts across ChatGPT, Claude, and Perplexity. Here's how the hallucination detection system worked:

Step 1: Claim Extraction

An LLM analyzed each AI response and extracted all verifiable factual claims—essentially breaking down the answer into individual statements that could be fact-checked.

Step 2: Web Verification

For each extracted claim, Exa (a search technology) scoured the web for the most relevant authoritative sources that could confirm or contradict the statement.

Step 3: Verdict Assignment

Another LLM compared each claim against the sources found and assigned one of three verdicts: - True: Supported by credible sources - Unsupported: No evidence found - Conflicting: Contradicted by available sources

Each verdict came with a confidence score to account for ambiguity.

The hallucination threshold: As reported in the study, an answer was marked as containing a hallucination if at least one of its claims was either unsupported or conflicting with source material. This is a strict but fair standard—after all, one false statement can undermine an entire response.

The Results: Which AI Lies Most?

Out of 1,000 prompts tested, here's how each model performed:

ChatGPT: The Middle Ground

Hallucination rate: 12% (120 out of 1,000 answers)
Performance analysis: ChatGPT showed moderate reliability, with roughly one in eight responses containing at least one unverifiable or false claim

Claude: The Worst Performer

Hallucination rate: 15% (150 out of 1,000 answers)
Performance analysis: According to the test results, Claude had the highest hallucination rate among the three models, with approximately three in twenty responses containing problematic claims

Perplexity: The Complicated Winner

Hallucination rate: 3.3% (33 out of 1,000 answers)
The catch: While Perplexity appeared to be the clear winner with the lowest hallucination rate, the Exa verification system revealed a significant caveat

As the researcher noted, most of Perplexity's "safe" answers were "low-effort copy-paste jobs, generic summaries or stitched quotes." More tellingly, "in the rare cases where it actually tried to generate original content, the hallucination rate exploded."

What This Really Means: The Trade-Off Between Safety and Usefulness

This is where the results become fascinating from a practical standpoint. Perplexity's strategy appears to be risk avoidance rather than genuine accuracy. (This is my interpretation based on the reported findings.)

Think of it this way: If you ask someone a complex question and they simply read back excerpts from a textbook without synthesizing information, they're technically "accurate"—but are they actually helpful? Perplexity seems to have optimized for not being wrong rather than being genuinely insightful.

This creates an important distinction for users:

When you need: - Quick fact verification → Perplexity's approach may work well - Original synthesis and analysis → ChatGPT or Claude might be more useful despite higher hallucination rates - Critical information requiring verification → Always fact-check regardless of the model

The Broader Implications for AI Trust

This experiment highlights several critical considerations for anyone using AI tools:

1. No Model is Fully Reliable

Even the "best" performer (Perplexity at 3.3%) still hallucinated in some responses. This means approximately 1 in 30 answers contained false information—a rate that's unacceptable for critical applications without human verification.

2. The Copy-Paste Problem

Perplexity's low hallucination rate came at the cost of originality. This raises questions about what we actually want from AI: safe regurgitation of existing content, or creative synthesis with higher risk?

3. Context Matters

The 1,000 prompts tested weren't specified in detail, but hallucination rates likely vary significantly based on: - Topic complexity - Availability of training data - Recency of information required - Whether the question requires reasoning vs. recall

4. Verification is Essential

As reported in this study, even sophisticated AI models make things up regularly. The takeaway? Never use AI-generated information for important decisions without independent verification [LINK: AI fact-checking tools].

Key Takeaways for AI Users

✓ ChatGPT hallucinated in 12% of responses tested—roughly 1 in 8 answers contained false claims

✓ Claude showed the highest hallucination rate at 15%—approximately 3 in 20 responses had issues

✓ Perplexity had only 3.3% hallucinations but achieved this primarily through copy-pasting rather than original content generation

✓ When Perplexity attempted original synthesis, its hallucination rate increased dramatically

✓ All three models showed significant reliability issues, reinforcing that human fact-checking remains essential

What You Should Do Next

Based on these findings, here are actionable steps for working with AI chatbots:

Choose your tool based on your task: Use Perplexity for straightforward fact-gathering, but consider ChatGPT or Claude when you need creative synthesis (with appropriate fact-checking)
Implement verification workflows: For any important use case, cross-reference AI outputs with authoritative sources [LINK: source verification methods]
Be skeptical of confident-sounding claims: AI models don't indicate uncertainty well—they often present hallucinations with the same confidence as facts
Consider using hallucination detection tools: Technologies like Exa can help identify unsupported claims in AI-generated content
Stay informed about model updates: These hallucination rates reflect specific versions tested; newer releases may perform differently

The Bottom Line

This experiment, as shared by BluebirdFront9797, provides valuable empirical data about AI reliability. While no model performed perfectly, the results reveal that accuracy and usefulness exist in tension—the safest AI (Perplexity) achieved low hallucination rates partly by avoiding original thought.

For users, the message is clear: AI chatbots are powerful tools, but they're not yet trustworthy enough to use without verification. Understanding each model's strengths and weaknesses—and implementing appropriate safeguards—is essential for anyone relying on AI in their work or research.

Related Resources

ChatGPT

Tools

AI assistant for conversations and tasks

ChatGPT is OpenAI's flagship conversational AI that has revolutionized how people interact with artificial intelligence. Built on advanced language models, it can handle everything from simple questions to complex problem-solving tasks across multiple domains. The tool excels at creative writing, helping users craft everything from emails and articles to stories and poetry. For developers, it's a powerful coding companion that can write, debug, and explain code in dozens of programming languages. Students and professionals use it for research, analysis, and brainstorming, while content creators leverage its ability to generate ideas and overcome creative blocks. What sets ChatGPT apart is its conversational memory within sessions, allowing for natural back-and-forth discussions. The latest versions include image generation capabilities, file analysis, and web browsing, making it a comprehensive assistant. Whether you're a beginner looking for explanations or an expert seeking a thinking partner, ChatGPT adapts to your level and needs. The free tier offers substantial functionality, while paid plans unlock faster responses, priority access, and advanced features. It's become an essential tool for millions of users worldwide who want to augment their thinking and productivity.