I’ve spent my profession swimming in information — as former Chief Information Officer at Kaiser Permanente, UnitedHealthcare, and Optum — and at one level, I had oversight of practically 70% all of America’s healthcare claims. So after I inform you the issue with enterprise AI isn’t the mannequin structure however the information that fashions are being fed, imagine me: I’ve seen it firsthand.
LLMs are already peaking
The cracks are already displaying in LLMs. Take GPT-5. Its launch was plagued with complaints: it failed primary math, missed context that earlier variations dealt with with ease, and left paying clients calling it “bland” and “generic.” OpenAI even needed to restore an older mannequin after customers rejected its colder, checklist-driven tone. After two years of delays, many began asking if OpenAI had misplaced its edge — or if the whole LLM strategy was merely hitting a wall.
Meta’s LLaMA 4 tells an analogous story. In long-context exams — the type of work enterprises really want — Maverick confirmed no enchancment over LLaMA 3, and Scout carried out “downright atrociously.” Meta claimed these fashions might deal with thousands and thousands of tokens; in actuality, they struggled with simply 128,000. In the meantime, Google’s Gemini sailed previous 90% accuracy on the similar scale.
The info downside nobody needs to confess
As a substitute of confronting the bounds we’re already seeing with LLMs, the trade retains scaling up — pouring extra compute and electrical energy into these fashions. And but, regardless of all that energy, the outcomes aren’t getting any smarter.
The reason being easy: the web information these fashions are constructed on has already been scraped, cleaned, and retrained again and again to dying. That’s why new releases really feel flat — there’s little new to be taught. Each cycle simply recycles the identical patterns again into the mannequin. They’ve already eaten the web. Now they’re ravenous on themselves.
In the meantime, the actual gold mine of intelligence — non-public enterprise information — sits locked away. LLMs aren’t failing for lack of knowledge — they’re failing as a result of they don’t use the fitting information. Take into consideration what’s wanted in healthcare: claims, medical data, scientific notes, billing, invoices, prior authorization requests, name middle transcripts — the knowledge that truly displays how companies and industries are run.
Till fashions can prepare on that type of information, they’ll at all times run out of gasoline. You may stack parameters, add GPUs, and pour electrical energy into greater and greater fashions, nevertheless it gained’t make them smarter.
Small language fashions are the long run
The way in which ahead isn’t greater fashions. It’s smaller, smarter ones. Small Language Fashions (SLMs) are designed to do what LLMs can’t: be taught from enterprise information and deal with particular issues.
Right here’s why they work.
First, they’re environment friendly. SLMs have fewer parameters, which suggests decrease compute prices and sooner response occasions. You don’t want an information middle filled with GPUs simply to get them operating.
Second, they’re domain-specific. As a substitute of making an attempt to reply each query on the web, they’re skilled to do one factor properly — like HCC danger coding, prior authorizations, or medical coding. That’s why they ship accuracy in locations the place generic LLMs stumble.
Third, they match enterprise workflows. They don’t sit on the skin as a shiny demo. They combine with the information that truly drives your small business —billing information invoices, claims, scientific notes — they usually do it with governance and compliance in thoughts.
The long run isn’t greater — it’s smaller
I’ve seen this film earlier than: huge investments, infinite hype, after which the belief that scale alone doesn’t remedy the issue.
The way in which ahead is to repair the information downside and construct smaller, smarter fashions that be taught from the knowledge enterprises already personal. That’s the way you make AI helpful — not by chasing measurement for its personal sake. And I’m not the one one saying it. Even NVIDIA’s personal researchers now say the way forward for agentic AI belongs to small language fashions.
The trade can hold throwing GPUs at ever-larger fashions, or it will probably construct higher ones that truly work. The selection is clear.
Picture: J Studios, Getty Pictures
Fawad Butt is the co-founder and CEO of Penguin Ai. He beforehand served because the Chief Information Officer at Kaiser Permanente, UnitedHealthcare Group, and Optum, main the trade’s largest group of knowledge and analytics specialists and managing a multi-hundred-million greenback P&L.
This submit seems via the MedCity Influencers program. Anybody can publish their perspective on enterprise and innovation in healthcare on MedCity Information via MedCity Influencers. Click on right here to learn how.