The battle for Medical AI heats up: "Self-disarmament" to conquer hallucinations
Hallucinations are the single biggest bottleneck for Artificial Intelligence (AI)’s large-scale implementation. This has been true since the advent of ChatGPT. To this day, while every generation of updates claims to lower hallucination rates, "spouting nonsense with a straight face" still occurs suddenly and insidiously.
According to data released by Google DeepMind in December 2025, even the leading Large Language Models (LLMs) only achieve 69% accuracy when answering factual questions without external search. Even with search assistance, accuracy only reaches 84%. In other words, the linguistic and mathematical genius that assists us relentlessly is also a "compulsive liar" that fabricates an answer once every five questions.
In certain industries, the damage caused by hallucinations is exponentially amplified. For a doctor, being guided by AI hallucinations in prescribing medication could lead to catastrophic consequences.
Yet, medical AI remains one of the most promising sectors for AI application. OpenAI released data showing that among its 800 million users, one-quarter submit medical-related queries weekly, with over 40 million users seeking health consultation daily. In January, OpenAI launched a dedicated ChatGPT Health portal; Four days later, Anthropic’s Claude introduced Claude for Healthcare, integrating health data with Apple and Android devices.
In China, DeepSeek had already penetrated more than 100 hospitals across 20 provinces by February 2025, though updates have been scarce since. According to local medical publications, early adoption has been confined to relatively simple and low-risk workflows such as data collection and structuring. The primary obstacle remains hallucination: When an LLM generates a patient’s medical record, it may inadvertently splice in details from someone else’s history.
To put it simply: unless the hallucination issue is tackled, there is no path forward in the medical industry. LLMs now provide solutions that are all-encompassing and omnipotent yet perilously unreliable. When products like TikTok’s Doubao, Elon Musk’s Grok, and DeepSeek can confidently answer anything related to our health – another shiny offering that fails to move the needle on accuracy adds little, if any, real value.
The accuracy-first path
The recent $250 million fundraising by OpenEvidence – a US medical AI startup valued at $12 billion – validates a new path forward: placing accuracy above all other metrics. To conquer hallucinations, the company adopted a radical strategy. It has completely disconnected from the general internet and relied solely on medical journals.
In terms of model training, cutting off the internet is akin to “self-disarmament” -abandoning vast oceans of data. However, OpenEvidence compensates for this lack of breadth with depth. General LLMs read widely but shallowly – often only scanning the abstracts of medical papers. OpenEvidence reads less, but deeper. It has never ingested internet rumors like “vaccines cause cerebral palsy" or "mung bean soup cures the flu." Instead, it has partnered with top medical journals, covering 35 million papers with real-time updates.
A purified dataset, however, is only half the equation; the model must also be trained to answer questions based on the clean data. One reason hallucinations persist is that LLMs tend to prioritize semantic fluency over content accuracy. OpenEvidence requires that every sentence in an answer carry a citation to authoritative literature, making all conclusions traceable and verifiable.
These measures have allowed its model to achieve a hallucination rate significantly lower than the industry average, and it even scored full marks on the US Medical Licensing Examination (USMLE). Within a year of its launch, it has captured 30% of US doctors. The average interaction time per session is 13 minutes, exceeding the combined usage time of other mainstream US medical AI tools.
In China, the direct counterpart to this approach is "Hydrogen Ion" (Qing Lizi), launched in January by Alibaba Health. Its business model mirrors OpenEvidence entirely: it relies exclusively on medical literature, refuses to answer without a source, serves only doctors, and is dedicated to building the AI assistant with the lowest hallucination rate.
No commerciliazation-pressure
Launched in 2024, the Hydrogen Ion project was elevated to Group Strategy in July 2025 and given an independent budget. In an interview last month, Alibaba Health CTO Xiangzhi told local medical media that its hallucination rate is comparable to OpenEvidence’s – and two to three times better than domestic peers. The group’s most impressive support is a three-year runway free of revenue targets.
The KPI is, however, bolder than making money: “When Chinese doctors encounter a medical problem, their first reaction should be to open Hydrogen Ion.” The project has assembled an internal medical team that partners with over 100 physicians for testing and receives weekly lectures from specialists to fine-tune the model.
A competing version comes from Baichuan AI, led by Wang Xiaochuan. Baichuan AI also rushed to release a new generation medical large model, M3 Plus, in January, claiming a hallucination rate lower than OpenEvidence and ranking first globally. Unlike OpenEvidence, however, Baichuan AI did not adopt strict data isolation; it relies primarily on algorithms to address hallucinations.
This is closely related to Wang Xiaochuan’s background. He founded Sogou, the top input method in the Chinese-speaking world, from which he expanded into browsers and search engines. The company was listed in the US in 2017 and acquired by Tencent in 2020. After GPT burst onto the scene, he re-entered the arena to build a "Chinese OpenAI", before pivoting to Medical AI in 2024. Given this trajectory, his starting dataset is inevitably complex and cannot be as pristine as OpenEvidence’s.
The business models reflect different targets. OpenEvidence and Hydrogen Ion aim to be a doctor’s assistant, saving the roughly three hours clinicians often spend searching the literature. Baichuan leans toward the consumer side, aiming to put a “doctor” into ordinary households.
That, in turn, implies different tolerances for hallucinations. Baichuan AI places more value on comprehensive capabilities. In other words, Baichuan AI sacrifices a degree of accuracy in exchange for stronger reasoning and exploratory capabilities.
"Medical large models face an unavoidable issue: the stronger the reasoning ability, the easier it is to hallucinate in medical scenarios. Yet, blindly suppressing hallucinations makes the model too conservative when facing complex problems," a company executive said in a recent interview.
The consumer heavyweight: Ant Group’s “Ant Afu”
For now, the most remarkable player on the consumer front may be Ant Group’s “Ant Afu.” It is also under the Alibaba umbrella and is built on Alibaba’s Qianwen model. Ant Afu targets end-consumers and positions itself as a “health companion.” Conquering hallucinations is not framed as its top priority; instead, it emphasizes privacy, having obtained China’s highest level of certification for AI product privacy protection.
Endorsed by famous host Jiong He, its advertisements are everywhere. Since its brand upgrade and launch on December 15 last year, monthly active users have exceeded 30 million.
To counter hallucination, Ant Afu holds a final trump card: the integration of human expertise. Ant Afu emphasizes that its AI cannot replace doctors; instead, it connects users to the 300,000 professional physicians on Ant Goup’s "Hao Daifu" (Good Doctor) platform for online consultations.
From "Ant Afu" answering daily light inquiries, to "Hao Daifu" online doctors handling complex consultations, and finally to "Hydrogen Ion" assisting top-tier hospitals in solving difficult cases, the Alibaba ecosystem seems to have constructed an impregnable closed-loop medical solution.
Yes, the demand is undeniable, and the growth is remarkable. Yet, Liang Chen, the Chief Market Officer of Ant’s healthcare unit, candidly admits that the firm has not yet crystallized its business model.
“To be frank, there is considerable internal debate about ‘Ant Afu’s’ business model, and we don’t have an answer yet,” Liang stated publicly. However, from the perspective of Ant Health, the answer lies in prioritizing social value.
Liang drew a parallel with Alipay’s own trajectory: “Back in 2008 and 2009, Alipay was still small. But it solved critical pain points—eliminating long bank queues and simplifying bill payments. Once those needs were met, the business model naturally revealed itself.”
For China’s seasoned entrepreneurs, surviving in such a hyper-competitive market -often referred to as "involution" – requires a specific trifecta: addressing real customer demands, providing ultimate convenience, and maintaining low costs.
In the exploration of AI medicine, the tech giants are forging different paths based on their inherent strengths: JD.com has developed a suite of AI assistants, ranging from nutritionists to diagnostic experts. Their play is fundamentally about e-commerce - accompanying the customer through the entire journey and facilitating the purchase of medication and nutritional supplement.
Baidu aims to leverage its AI to reinforce the "stickiness" of its search engine, keeping users within its information ecosystem. Ant Group, while publicly unsure of its revenue model, is viewed by outsiders as having a distinct advantage in ecosystem integration. The logical endgame for Ant is a seamless loop involving easy payments and deep integration with medical insurance.
All three are offering incredible convenience. But for a user who is genuinely concerned about their health, transactional or purchase friction is a minor nuisance. The paramount requirement is an accurate answer free from hallucinations. The ultimate winner will be the one who solves the trust issue.
This wrestling match over Medical AI has only just begun, but one thing is certain: Medical AI is positioned to be the first vertical to experience truly explosive growth.