The Limits of AI Thinking Are Closer Than You Think - and That's Your Advantage
By Aleksei Zulin
The most dangerous assumption in the age of AI is that the machine is smarter than you. It's not. Not in the ways that matter when the world gets genuinely strange.
I want to make a stronger claim than that. Not only does AI thinking have hard limits - architectural, cognitive, philosophical - but those limits are precisely where you should be building your edge. Every professional I've seen lose ground to AI automation did so because they were competing on AI's turf. The people who've accelerated? They found the seams.
Let me show you where those seams are.
AI Doesn't Reason. It Completes.
Gary Marcus and Ernest Davis spent years documenting what happens when language models meet the real world. Their conclusion, detailed in Rebooting AI, is uncomfortable for anyone who's watched GPT-4 write a legal brief or pass a bar exam: these systems are extraordinarily sophisticated pattern matchers, and pattern matching is not reasoning.
More precisely - and this is where the argument becomes hard to dismiss - Judea Pearl, Turing Award winner and father of modern causal inference, describes a three-rung ladder of causation in The Book of Why. The first rung is association (what tends to go with what). The second is intervention (what happens if I do this). The third is counterfactual (what would have happened if). Current large language models, Pearl argues, live almost entirely on rung one. They've been trained on human outputs that reflect causal thinking, but the models themselves hold no internal causal model of the world.
This matters practically. Ask an LLM to predict what happens when you remove a variable from a system it's never seen, and it will hallucinate confidently. No mechanism exists for genuine intervention logic - only for producing text that sounds like intervention logic.
The implication is sharper than most people want to sit with. (I'll come back to this - there's a version of this argument that actually cuts against my thesis, and I should be honest that I haven't fully resolved it.)
Yann LeCun at Meta has been arguing for years that genuine machine intelligence requires world models - internal simulations that let a system imagine consequences before acting. Today's frontier models lack these. They lack embodied experience. They have never dropped something and felt it land.
Where the Architecture Breaks
Scale is not destiny. The assumption driving most AI investment between 2020 and 2024 was that more compute, more data, and more parameters would eventually produce something like understanding. The scaling laws suggested exponential gains. Those gains have flattened.
Researchers at Epoch AI published analysis in 2024 showing that the era of predictable capability jumps from scale alone may be ending. We're not hitting the ceiling of what's possible - we're hitting the ceiling of what's achievable through next-token prediction at scale, and those are different ceilings.
Attention mechanisms, the core innovation inside transformer models, have a quadratic complexity problem. Context windows can be extended but not indefinitely, and even within a long context, models degrade in their ability to use information from earlier sections. This is sometimes called the "lost in the middle" problem - documented by researchers at Stanford and UC Berkeley. If you've ever had a long conversation with an LLM and noticed it forgetting what you said three pages ago, you've experienced this architecture live.
Then there's hallucination. Hallucination is a feature of the architecture, not a bug awaiting a patch. Because these models optimize for plausible-sounding text, they produce confident nonsense in exactly the situations where you most need accuracy - novel domains, edge cases, the outer boundary of the training distribution. Melanie Mitchell at the Santa Fe Institute has been particularly rigorous in documenting how fragile analogical reasoning becomes the moment you push outside familiar territory.
You. Cannot. Fix this by asking nicely.
The Cognitive Gap You Can Actually Exploit
Here's where the article pivots from diagnosis to action - though I'll warn you the actions aren't the productivity hacks you might be expecting.
Karl Friston's work on active inference suggests that biological brains are fundamentally different from current AI in one critical way: we operate as prediction machines that update on surprise. We seek out novelty not just to learn but because the felt experience of being wrong reconfigures our internal model. Humans have metacognition - we know when we don't know, and that discomfort drives inquiry. LLMs have none of this. They have no felt uncertainty.
This means the first concrete edge you can cultivate is comfort with productive confusion. When you hit a problem that makes you genuinely uncertain, that's not a moment to outsource to an LLM for a quick answer. The confusion is doing cognitive work. Sitting with it, mapping its edges, asking why you're confused - this is exactly what a language model cannot do on your behalf. Outsourcing the discomfort means outsourcing the learning.
Genuine interdisciplinary transfer is the second edge. Not the superficial kind where you prompt an LLM to "think like a biologist." Real interdisciplinary thinking happens when you've spent enough time in two different fields that patterns start firing across domains without conscious effort. The philosopher Daniel Dennett called these "intuition pumps" - mental tools that you can only develop through years of varied reading, conversation, and friction with ideas that don't fit together neatly. No prompt engineer has built a shortcut here.
Adversarial fluency is the third edge, and the one most immediately actionable. Most people use LLMs as oracles. The people extracting the most value from these tools use them as sparring partners - they ask the model to argue against their position, to find the flaw in their logic, to generate the strongest counterexample. Then they evaluate the counterexample using their own judgment. The model generates options. The human adjudicates. That division of labor works because it plays to both sides' strengths.
What Human Intuition Actually Is
Intuition gets a bad rap in rationalist circles. Wrong move.
When Gary Klein studied expert decision-making in firefighters and military commanders, he found that experienced humans don't run through deliberate cost-benefit calculations in high-stakes situations. They pattern-match to previous experience and then run a rapid mental simulation - will this work? The simulation is fast, embodied, and draws on thousands of episodes of real-world feedback. This is what Kahneman called System 1, but Klein's contribution was showing it's not irrational - it's compressed expertise operating below the threshold of conscious deliberation.
LLMs have statistical regularities from text. Humans have compressed expertise from lived consequence. These are not equivalent. When a surgeon has a bad feeling about a patient's trajectory and orders another scan that turns out to be decisive, that feeling is real information. It cannot be replicated by a model trained on deidentified records.
The practical implication - and this is the part that doesn't get said enough - is that you should be investing in your intuition. Not trusting it blindly, but building it through deliberate exposure to feedback loops. Make predictions. Track them. Put yourself in situations where you'll be wrong and find out quickly. The AI is getting better at everything that can be learned from static data. Your edge lives in everything that requires being wrong in real time and updating on the consequences.
The Hybrid Mind Isn't a Metaphor
Researchers building next-generation human-AI collaboration don't think in terms of tools. They think in terms of cognitive partnership - and the distinction compounds over years.
A tool does what you tell it. A cognitive partner pushes back. The best use of an LLM isn't to generate the answer - it's to stress-test your thinking before you commit to it. Feed your half-formed hypothesis to a model, ask it to falsify your reasoning, then bring your own judgment to bear on whether the falsification actually holds. This workflow requires something the model can't supply: knowing when the model's objection is trivially wrong versus genuinely dangerous to your argument.
That calibration is a skill. It develops through practice. And the people who build it early will have a structural advantage that compounds over time - not because they're using better AI, but because they've developed better judgment about when to trust it and when to override it.
The ceiling on AI thinking is real. The ceiling on your thinking, if you build it against that edge, is somewhere else entirely.
FAQ
Can AI ever truly overcome its reasoning limits with better architectures?
Possibly. Neurosymbolic hybrids and LeCun's proposed world model architectures may eventually close the causal reasoning gap. But "true reasoning" remains philosophically contested, and even optimistic timelines don't eliminate the window - likely years, maybe a decade - where cultivating human judgment delivers compounding returns that no product cycle can erase.
What's the fastest practical way to start thinking beyond AI's capabilities today?
Start a judgment journal: write your prediction, record the outcome, review monthly. This builds feedback-grounded intuition that LLMs cannot replicate because they have no felt consequences. Pair it with adversarial prompting - use AI to attack your ideas rather than confirm them, then evaluate those attacks with your own critical judgment.
Related Articles
About the Author
Aleksei Zulin is the author of The Last Skill, a book on how to think with AI as a cognitive partner rather than use it as a tool. Systems engineer turned writer exploring the frontier of human-AI collaboration.
The Last Skill is a book about thinking with AI as a cognitive partner.
Get The Book - $29