Why Small AI Models Will Win in 2026: An Expert’s Case Against the AI Tower of Babel - Edly

Why Small AI Models Will Win in 2026: An Expert’s Case Against the AI Tower of Babel

Lala Rukh
May 7, 2026

The loudest story in artificial intelligence right now is still the story of scale.
Bigger models. More parameters. Larger data centers. Training runs that sound less like engineering projects and more like national infrastructure plans.
But when you speak with someone who has worked on intelligence from both sides — biological and computational — a different picture starts to take shape.
That was the case in a recent conversation with Johannes Nagel, Head of AI Innovation at Alexander Thamm and host of the Nagel Mid-Subscription podcast. Johannes began his career in neuroscience, studying how mammalian brains build cognitive maps, before moving into applied AI. His view of where the field is heading is refreshingly different: technical, grounded, and shaped by an understanding of how real biological intelligence actually works.

Here is what our expert had to say.

The “Tower of Babel” Problem With Today’s AI

Johannes uses a sharp metaphor for the current direction of AI: the Tower of Babel.
In his view, the big tech approach is trying to build one massive, monolithic model that contains everything inside a single system. You query it from the outside, and the model is expected to respond to almost anything.
It is impressive. It is also expensive. And, as Johannes sees it, fragile.
“If you take out individual bricks of such systems, the whole thing is at risk,” he explains.

A monolithic model is trained as one large unit. When it encounters something outside its training distribution, its predictions can break in unpredictable ways. There is no simple way to remove one part, repair it, or replace it without affecting the broader system. There is also little room for graceful degradation.
Johannes argues for flipping the architecture.

Instead of building one giant tower, build many smaller ones from reusable, autonomous bricks. Each brick is its own agent. Each agent has a narrow, clearly defined responsibility. And the same agent can be reused across different workflows.
“It’s like you build multiple towers from the same matter,” he says.
The advantage is resilience. If one brick fails, the surrounding system can compensate. Biology already works this way. So do organizations. Johannes believes AI should move in the same direction.

Task Decomposition Beats Brute Force

Under that architectural argument is a core machine learning idea: the no free lunch theorem. No single model can solve every problem optimally. Every model involves trade-offs.
The industry’s current response is to make the trade-off as large as possible: add more data, more parameters, more compute, and hope the model becomes “good enough” across a wide range of tasks.

Johannes sees a smarter route: task decomposition.
Take a complex problem and break it into the smallest possible operations. Then assign each operation to an agent designed for that specific job, whether through prompting, a smaller specialized model, or fine-tuning. Once those smaller parts are handled well, combine the results.
“You are just a summation agent. You only add numbers that the user gives you,” he says, using an intentionally simple example.

That is a task an LLM can handle reliably. Solving a huge system of differential equations in one step is a different matter. But break that system down into additions, multiplications, substitutions, and checks, and a modest model suddenly becomes much more useful.Johannes compares this to enzymes replicating DNA. For years, scientists wondered how the error rate could be so low. The answer was not one perfect enzyme. It was cooperation. A small system of enzymes worked together, iterating, checking, and correcting one another. No single enzyme was flawless. The system was effective because it was collaborative.

“None of us as an individual person is capable of doing anything,” Johannes says, “but we can train to be specialists in certain domains, and we can bring together multiple specialists in a team, and then we can build crazy things.”
That is the philosophy he wants applied to AI.

Knowledge Belongs Outside the Model

This is where Johannes’s thinking moves furthest away from mainstream AI practice.
The dominant approach treats knowledge as something that should be trained into a model’s weights. Johannes argues for the opposite. As much cognition as possible should live outside the model, in the structure of the system itself.

His own AI setup is built around a wiki: a graph of human-readable markdown pages, each linked to others. Each page can be associated with an agent that gets triggered when the page is accessed. The wiki lives in a Git repository, which means it is version-controlled, branchable, and available from anywhere.
The structure is flexible. If Johannes wants to change the ontology, he does not retrain a model. He adds a link, edits a page, or reorganizes the graph.
That changes the maintenance model completely.
When something breaks, you fix the relevant wiki page. When you find a prompt that works, you ask the system to improve and store it for next time. The system learns through its knowledge layer, not by updating opaque model weights. It becomes a form of online learning without backpropagation, and debugging without a full training pipeline.

“You can never look into the weights of the big large language model and manipulate there,” Johannes points out. “You would need to put on the whole machinery again, the training pipeline, add some other more data and retrain. All of those steps you don’t even need when you have this knowledge being put into the system in a human-readable way.”

There is another benefit: visibility.
Johannes remembers building his first wiki at fourteen. Years later, he noticed that he had unintentionally created two parallel subnetworks describing the same concept in different language. Because the knowledge was stored as a graph, the duplication became visible. Merging the two was simple.
That kind of cleanup is nearly impossible inside a 70-billion-parameter model.

The Real AI Democratization Story Is About Carpenters

Most discussions about “AI for everyone” focus on consumer chatbots. Johannes thinks the more important story is happening somewhere else: small businesses.
Carpenters. Electricians. Independent tradespeople.

These are people who often started their careers because they loved the actual work — the material, the tools, the customer site, the practical problem-solving. But as their businesses grow, they spend more and more time behind a desk writing quotes, preparing offers, updating documents, and handling repetitive administrative work.
That is exactly the kind of work small generative AI systems can help with.
A locally runnable model, connected to a knowledge base of past quotes, materials, pricing, and customer details, could give a carpenter hours back each week. It would not need to be a frontier model. It would need to be useful, reliable, and adapted to the way that business actually works.

So why has this not happened at scale?
Johannes is direct about the obstacle: “Time.”
Large consultancies are built around large contracts. If the same engineering effort can serve a Fortune 500 company, the small business request usually gets pushed down the list.

The solution, in Johannes’s view, is the same modular approach he applies to AI architecture: build reusable agents, then adapt them vertically for specific trades instead of starting from scratch every time.
He is also watching a hopeful countertrend. The children of electricians, carpenters, and other tradespeople are starting to build these systems themselves. The tools are becoming easier to use, the barrier to entry is dropping, and entrepreneurs are beginning to fill the gap.

Physical AI and the End of the Cloud-Only Era

Small models also matter because of physical AI.
Johannes points out that language models are surprisingly good at reasoning about the physical world from text alone. But there is a hard limit. They cannot sense the world directly. They cannot read from local sensors. They cannot sample reality as it changes. They exist inside language.

Put smaller, capable models onto local devices, and that limit begins to shift.
A dog with a smart collar. A fridge with its own small agent. A robot vacuum coordinating with a smart speaker and a calendar. Each device can run a narrow embedded model, observe its local environment, act where it is, and coordinate with nearby systems when needed.

In that world, intelligence is not trapped in a centralized cloud. It is distributed across the spaces people already live in.
The household becomes a governance layer, much like a family or small team already functions. Devices can coordinate locally, share context when useful, and remain closer to the people who own them.

This is also where Johannes’s vision pushes back hardest against a centralized AI future. If intelligence lives across thousands of small devices that people own and control, it becomes much harder for all AI capability to be concentrated in the hands of a few companies.

The Real Bottleneck Is Compute Speed

When asked what single obstacle he would remove with one engineering breakthrough, Johannes did not choose a missing model capability or an unsolved scientific problem.

He chose raw compute speed.
“If you can just run 10,000 queries to your local LLM within five seconds, then the playground becomes very nice.”
His reasoning is simple but powerful. With enough speed, you do not always need elegant algorithms. You can brute-force experimentation. You can mutate prompts, test variations, observe what improves the result, and repeat the process thousands of times.

“As if as a kid you just throw your ball 10,000 times to this wall and want to hit this one spot,” he says. “Eventually you maybe learn it.”

That is why he is watching alternative compute architectures closely: optical computers, neuromorphic chips, analog systems, and even fungal computing. More silicon can still help, but the improvements are incremental. A real shift in how computation happens could change the field much more dramatically.

The Future of AI Isn’t Bigger; It’s Smarter, Smaller, and Specialized

What connects all of Johannes’s ideas is a single intuition: intelligence works better when it is distributed, modular, inspectable, and embedded in structure rather than locked inside opaque model weights.
That idea shows up everywhere in his thinking.

It shapes his architecture: agents over monoliths. It shapes his knowledge philosophy: wikis over weights. It shapes his view of access: small businesses, not just enterprise clients. And it shapes his prediction for physical AI: local devices, not only massive data centers.
This is not the dominant view in the industry today. The dominant view is still to scale everything up.
But Johannes is making a serious case for a different future: many small models, governed locally, working over human-readable knowledge, and cooperating inside modular systems.

That future may not win because it sounds more elegant. It may win because it is cheaper, more resilient, easier to inspect, and more useful for everyone who is not a Fortune 500 company.

And that makes it worth paying attention to.

Watch the full interview on our youtube channel.

Recent Posts

Great online learning experiences start here

Get in touch to see what Edly can do for you