LL.M.s have eaten up the internet – now they’re starving themselves

I've spent my career in data—as the former chief data officer for Kaiser Permanente, UnitedHealthcare, and Optum—and at one point, I oversaw nearly 70% of all healthcare claims in the United States. So trust me when I tell you that the problem with enterprise AI is not the model architecture but the data fed into the model: I’ve seen it firsthand.
LLM has reached its peak
Problems with the LL.M. have emerged. Take GPT-5 as an example. Its launch was met with complaints: it didn't comply with basic math, missed the easy processing context of earlier versions, and had paying customers calling it “boring” and “generic.” OpenAI even had to revert to its old model after users rejected its cold, checklist-driven tone. After two years of delays, many are asking whether OpenAI has lost its edge, or whether the entire LLM approach has simply hit a wall.
Meta's LLaMA 4 tells a similar story. In the long-context test (the type of work businesses actually need), Maverick showed no improvement over LLaMA 3, while Scout performed “very poorly.” Meta claims these models can handle millions of tokens; in fact, they only require 128,000 people. At the same time, Google's Gemini's accuracy exceeded 90% at the same scale.
The data problem no one wants to admit
Rather than facing the limitations of LL.M.s we’ve seen, the industry continues to scale up – pumping more compute and power into these models. However, despite all this power, it doesn't turn out any smarter.
The reason is simple: the internet data these models rely on has been scraped, cleaned, and retrained over and over again. That's why the new version feels bland – there's almost nothing new to learn. Each loop just recycles the same pattern back into the model. They've eaten up the internet. Now they are starving.
Meanwhile, the real gold mine of intelligence—private industry data—is locked away. The LL.M. failed not because of a lack of data, but because the correct data was not used. Think about what is needed in healthcare: claims, medical records, clinical notes, bills, invoices, prior authorization requests, call center records – this information actually reflects how businesses and industries operate.
Before models can train on such data, they will always run out of fuel. You can stack parameters, add GPUs, and pump power into larger and larger models, but that doesn't make them any smarter.
Small language models are the future
The way forward is not bigger models. It's smaller and smarter. Small Language Models (SLMs) are designed to do what an LL.M. cannot: learn from enterprise data and focus on a specific problem.
That's why they work.
First, they are efficient. SLM has fewer parameters, which means lower computational costs and faster response times. You don't need a data center full of GPUs to run them.
Second, they are domain specific. Rather than trying to answer every question on the internet, they are trained to do one thing well – such as HCC risk coding, prior authorization, or medical coding. That's why they provide accuracy where ordinary LL.M.s struggle.
Third, they fit into enterprise workflows. They don't appear out there as shiny demos. They integrate with the data that actually drives your business (billing data invoices, claims, clinical records) and do so with governance and compliance in mind.
The future isn’t bigger – it’s smaller
I've seen this movie before: massive investment, endless hype, and then the realization that scale alone doesn't solve the problem.
The way forward is to solve data problems and build smaller, smarter models that learn from the information the enterprise already has. That’s how you make AI useful — not chasing scale for its own sake. And I'm not the only one saying this. Even NVIDIA's own researchers now say that the future of agent AI belongs to small language models.
The industry can continue to put GPUs into larger models, or it can build better models that actually work. The choice is obvious.
Photo: J Studios, Getty Images
Fawad Butt is the co-founder and CEO of Penguin AI. He previously served as chief data officer at Kaiser Permanente, UnitedHealthcare Group and Optum, where he led the industry's largest team of data and analytics experts and managed hundreds of millions of dollars in profit and loss.
This article appeared in Medical City Influencers program. Anyone can share their thoughts on healthcare business and innovation on MedCity News through MedCity Influencers. Click here to learn how.



