How to Shift Perspectives on AI Capabilities: The "AI Can't Yet" Trap

A client of mine - a senior product manager at a mid-sized logistics company - told me last spring that she'd stopped exploring AI for her core operations. Her reasoning: "It can't handle the nuance of our routing decisions. The edge cases kill it." I nodded, asked a few questions, then showed her what a current reasoning model did with three of her "impossible" edge cases. She was quiet for a long moment. Then: "Okay, but it still can't do the contract interpretation piece."

She was right. That day.

Two months later, it could.

The "AI can't yet" framing feels like epistemic humility. It acknowledges reality. It resists hype. But here's what it actually does: it freezes your mental model at the moment of last contact and treats that snapshot as a reliable forecast. My client wasn't being careful - she was anchored. And anchoring, in the context of a technology changing this fast, is not a neutral error. It has real costs: forgone pilots, delayed investments, teams stuck on manual workflows that could be partially automated right now.

The Cognitive Architecture of "Can't Yet"

Psychologists call it present bias when applied to time, but there's a specific flavor that shows up in technology assessment. We anchor to the last failure we witnessed. We generalize from a specific gap to a category-level verdict. And crucially, we apply human timescales - years, decades - to systems improving on weekly release cycles.

Gary Marcus has been a useful intellectual foil here. His critiques of large language models, from symbol grounding to systematic generalization, were technically precise and, for their moment, defensible. The problem isn't that his criticisms were wrong - some hit real targets. The problem is that criticisms pointing at architectural limitations have a shorter shelf life than they used to. What Marcus identified as a structural flaw in 2022 reasoning capabilities was, in significant part, addressed by chain-of-thought prompting, reinforcement learning from human feedback, and the reasoning models that followed.

Rodney Brooks, who spent decades building robots at MIT and iRobot, made a career of pointing out what embodied AI still lacked. He was usually right about the present. His error - one he acknowledged in some form - was assuming the rate of progress in physical manipulation and spatial reasoning would stay as slow as it had been. It didn't.

The cognitive trap has two components. First: we remember failure more vividly than improvement. Second: we underestimate how many different teams, labs, and techniques are being thrown at any given problem simultaneously. The parallelism of AI research is relentless in a way that's genuinely hard to hold in mind. A single benchmark can be under assault by dozens of independent groups at once, each attacking it from a different angle - different architectures, different training regimes, different prompting strategies. The result is that progress often looks sudden from the outside, even when it's the accumulated outcome of hundreds of concurrent efforts.

What History Actually Shows

Chess fell in 1997. Fine, people said - chess is finite. Go fell in 2016; AlphaGo's victory over Lee Sedol was described by many experts as coming "a decade early." Protein structure prediction was considered one of biology's grand unsolved challenges; Demis Hassabis and the AlphaFold team solved it. Reading comprehension benchmarks like SQuAD were supposed to demonstrate how far machines lagged behind human understanding - models surpassed human baselines on SQuAD 2.0 in 2019 and promptly exposed that the benchmark was too narrow.

Each transition followed a similar pattern. Years of slow progress. Confident expert claims that the remaining gap was qualitative, not quantitative. Then a jump - enabled by scale, new training techniques, or better data - that crossed the threshold faster than forecasters expected.

Richard Sutton's 2019 essay "The Bitter Lesson" identified the underlying mechanism. Researchers who built in human knowledge and hard-coded structure consistently lost, over the long run, to researchers who scaled compute and let models learn. The lesson is bitter because it means our intuitions about what's hard are often wrong. We look at a task, identify the complexity, assume it requires human-like reasoning, and underestimate what happens when you throw sufficient gradient descent at the problem.

The Kaplan et al. scaling laws paper from OpenAI (2020) gave this a mathematical form. Performance improves predictably - according to power laws - with increases in model size, data, and compute. There are diminishing returns within any given . Paradigms shift. And when they do, the "can't yet" claims built on the old become artifacts. This has happened often enough now that it should register as a pattern rather than a series of surprises. It hasn't, for most people. That gap between historical evidence and updated priors is where a lot of strategic mistakes get made.

A Framework for Updating Your Priors

Stop asking "can it do this?" Start asking "what would need to be true for it to do this in 18 months, and is any of that currently being worked on?"

That reframe sounds simple. The actual practice requires something harder - you have to track the research frontier at least loosely, because otherwise you're updating priors without data.

Benchmark tracking matters more than most practitioners realize. Organizations like Epoch AI and METR publish capability evaluations far more granular than press releases. When you see a benchmark drop from 60% to 85% accuracy in six months, you're looking at a capability trajectory. Trajectories are what matter.

Counterfactual framing is another tool. When you say "AI can't handle X," ask yourself what evidence would change your view. If you can't name it, the belief isn't really an empirical claim - it's closer to an aesthetic preference. This is worth sitting with. A lot of "AI can't" claims, when pressed, turn out to have unfalsifiable cores.

Synthetic data and multimodal architectures are quietly dismantling two of the biggest claimed barriers - data scarcity in specialized domains and the gap between language and perception. Models trained on synthetic reasoning traces are already outperforming models trained purely on human-generated data in certain mathematical domains. The implications for medicine, law, and engineering - where labeled data is scarce and expertise is expensive - haven't fully landed in most organizations' planning cycles yet.

(I want to say something here about how this isn't just about being an optimist. It's about being calibrated. But maybe those are the same thing, past a certain threshold of evidence. I'm genuinely not sure.)

The Practical Posture

Demis Hassabis has described the goal at DeepMind as building systems that can do science. Not assist with science. Do it. That framing would have sounded grandiose in 2018. By 2025, AlphaFold 3 was predicting molecular interactions across biological macromolecules, and the lab was publishing results in fundamental physics using AI-guided discovery. The goalposts didn't move - the game changed.

For practitioners deciding whether to invest in AI tools or build systems that incorporate AI, the posture looks something like this: treat every "AI can't" claim as having a six-month expiration date by default. Reserve longer timelines for things requiring genuinely new physics or economics, not just better algorithms and more compute. Test quarterly. Document what changed. Build organizational muscle for adaptation rather than for any specific current capability.

Humans are reasonably good at tracking linear improvements and reasonably bad at tracking exponential ones. We've known this since Kurzweil and before. Knowing about a cognitive bias doesn't automatically correct for it - you have to design around it. Scheduled re-evaluation. Explicit assumptions. Peer checks with people closer to the research frontier than you are.

One structural approach that helps: identify a small number of people in your network who read primary research, not just tech journalism, and talk to them regularly. Not to outsource your judgment, but to get a continuous feed of ground-truth updates that can disrupt comfortable assumptions. The best "AI can't yet" corrections come not from a single dramatic demo but from a drip of incremental evidence that eventually forces recalibration.

My client, the logistics manager, is now running a pilot on the contract interpretation piece. Won't work perfectly. Probably works well enough to restructure how her team allocates time. That gap - between "works perfectly" and "works well enough to matter" - is where most of the real opportunity lives, and it's precisely the gap that "AI can't yet" thinking makes invisible.

FAQ

Why do smart, technically literate people still underestimate AI progress?

Expertise creates anchoring. The more deeply you understand a domain's complexity, the more clearly you see what AI is missing - and the less you naturally track incremental capability improvements. Technical fluency can sharpen perception of current gaps while blurring the trajectory of change. Following benchmark data, not just product announcements, helps correct for this.

How often should I actually re-evaluate what AI can and can't do?

Quarterly is a reasonable default for practitioners, with immediate re-evaluation when you see a significant public benchmark shift in your domain of interest. Annual reviews are too slow for the current pace of capability development. The question isn't whether your assessment will become outdated - it will. The question is how quickly you'll notice.

What's the most common mistake organizations make when assessing AI limitations?

Conflating a single failed test with a durable capability ceiling. One bad demo, one unsatisfying pilot, one awkward output - and teams conclude "not ready." The right response is to document what failed, revisit in three to six months with a current model, and be explicit about what would constitute a pass. Structured failure analysis is very different from a blanket verdict.

The Cognitive Architecture of "Can't Yet"

What History Actually Shows

A Framework for Updating Your Priors

The Practical Posture

FAQ

About the Author