AI search engine Perplexity says its latest release goes above and beyond for user satisfaction -- especially compared to OpenAI's GPT-4o.
On Tuesday, Perplexity announced a new version of Sonar, its proprietary model. Based on Meta's open-source Llama 3.3 70B, the updated Sonar "is optimized for answer quality and user experience," the company says, having been trained to improve the readability and accuracy of its answers in search mode.
Also: The billion-dollar AI company no one is talking about - and why you should care
Perplexity claims Sonar scored higher than GPT-4o mini and Claude models on factuality and readability. The company defines factuality as a measure of "how well a model can answer questions using facts that are grounded in search results, and its ability to resolve conflicting or missing information." However, there isn't an external benchmark to measure this.
Instead, Perplexity displays several screenshot examples of side-by-side answers from Sonar and competitor models including GPT-4o and Claude 3.5 Sonnet. They do, in my opinion, differ in directness, completion, and scannability, often favoring Sonar's cleaner formatting (a subjective preference) and higher number of citations -- though that doesn't speak directly to source quality, only quantity. The sources a chatbot cites are also influenced by the publisher and media partner agreements of its parent company, which Perplexity and OpenAI each have.
More importantly, the examples don't include the queries themselves, only the answers, and Perplexity does not clarify a methodology on how it provoked or measured the responses -- differences between queries, the number of queries run, etc. -- instead leaving the comparisons up to individuals to "see the difference." has reached out to Perplexity for comment.
One of Perplexity's "factuality and readability" examples.
Perplexity says that online A/B testing revealed that users were much more satisfied and engaged with Sonar than with GPT-4o mini, Claude 3.5 Haiku, and Claude 3.5 Sonnet, but it didn't expand on the specifics of these results.
Also: The work tasks people use Claude AI for most, according to Anthropic
"Sonar significantly outperforms models in its class, like GPT-4o mini and Claude 3.5 Haiku, while closely matching or exceeding the performance of frontier models like GPT-4o and Claude 3.5 Sonnet for user satisfaction," Perplexity's announcement states.
According to Perplexity, Sonar's speed of 1,200 tokens per second enables it to answer queries almost instantly and work 10 times faster than Gemini 2.0 Flash. Testing showed Sonar surpassing GPT-4o mini and Claude 3.5 Haiku "by a substantial margin," but the company doesn't clarify the details of that testing. The company also says Sonar outperforms more expensive frontier models like Claude 3.5 Sonnet "while closely approaching the performance of GPT-4o."
Sonar did beat its two competitors, among others, on academic benchmark tests IFEval and MMLU, which evaluate how well a model follows user instructions and its grasp of "world knowledge" across disciplines.
Also: Cerebras CEO on DeepSeek: Every time computing gets cheaper, the market gets bigger
Want to try it for yourself? The upgraded Sonar is available for all Pro users, who can make it their default model in their settings or access it through the Sonar API.