Why Involve Non-Technologists in Debates on AI Thinking?

Are you wondering whether the people debating AI's cognitive architecture actually need to include historians, nurses, or philosophers - or whether that's just performative inclusion? The answer is sharper than you might expect.

Non-technologists must be involved in debates on AI thinking because the questions being decided are not engineering questions. They are questions about value, judgment, harm, and what it means to know something - questions that engineers are not trained to answer and that technical framing actively obscures. When AI systems make decisions about creditworthiness, medical diagnosis, or content moderation, the people who understand what those decisions feel like from the inside are rarely in the room where the system is designed. That asymmetry produces machines that are locally coherent and structurally blind.

The short answer: without non-technologists, AI debates optimize for what is buildable rather than what is good.

The Problem Isn't Bias. It's Epistemic Monoculture.

Everyone talks about bias. Fewer people talk about the structural condition that produces it.

When AI labs and standards bodies consist primarily of people with similar educational backgrounds, similar career trajectories, and similar mental models of what a "problem" looks like, the outputs of those debates will carry that shape. Not because of malice. Because of what cognitive scientists call the availability heuristic at scale - the ideas that come easily to mind are the ideas from experience you actually have.

Scott Page, a complexity theorist at the University of Michigan, has spent decades studying how diversity of perspective - not diversity of identity as a checkbox, but genuine diversity of cognitive toolkit - outperforms homogeneous groups of high-ability individuals on complex problem-solving tasks. His 2007 book The Difference provides formal models showing that diverse problem-solving teams consistently find solutions that specialists miss. The mechanism matters here: it is not that outsiders "bring heart" while insiders "bring rigor." Outsiders bring different heuristics, different failure modes, different definitions of what counts as a solved problem.

This finding has been replicated in organizational research. A 2009 study by Katherine Phillips, Katie Liljenquist, and Margaret Neale published in Personality and Social Psychology Bulletin found that groups with socially diverse members - people from different backgrounds and life experience - were better at fact-finding and problem-solving than homogeneous groups, even when the diverse groups reported feeling less confident. The discomfort of difference, the study found, is a cognitive signal worth paying attention to.

Applied to AI thinking debates: a philosopher asking "but what does the system actually represent when it represents a concept?" is not slowing down the engineers. She is raising the question that determines whether the system works at all.

What Happened When Technologists Decided Alone

History is instructive. It is not encouraging.

In 2018, Amazon scrapped an internal AI recruiting tool after discovering it systematically downranked women's resumes. The system had been trained on ten years of historical hiring data - data that reflected a decade of predominantly male hiring. The engineers who built the tool were solving a real problem: reduce recruiter workload. They solved it. The tool was fast, consistent, and wrong in ways that required someone outside the engineering function to name.

Virginia Eubanks, a political scientist at the University at Albany, documented a series of similar cases in Automating Inequality (2018). Her research tracked automated decision systems in welfare, child protective services, and criminal justice - systems built by technically competent teams that nonetheless produced outcomes disproportionately harmful to poor and minority communities. Her conclusion was specific and worth quoting directly: "The targeting of poor and working-class people by these systems is not a glitch. It is a design feature, invisible to the designers."

Invisible to the designers. That phrase should stop you.

The designers were not incompetent. They were operating inside a frame that made certain questions unaskable - questions about what the system was for, who bore its costs, and whether efficiency was the right metric in the first place. Those are exactly the questions that non-technologists ask.

A complementary line of evidence comes from public health research. Ruha Benjamin, a sociologist at Princeton University, analyzed health care algorithms in her 2019 book Race After Technology and found that systems designed to reduce costs as a proxy for predicting health needs systematically underestimated the severity of illness in Black patients. The proxy was technically defensible. The outcome was discriminatory. The error was a conceptual one - confusing cost with need - that required a social scientist to name.

Philosophy Isn't Decoration

There's a tendency - I've seen it in engineering culture and I've felt it myself - to treat philosophical input as decorative. As the thing you do at the end, when the real work is done, to make the stakeholder presentation feel thoughtful.

That tendency is expensive.

Luciano Floridi, who directed the Oxford Internet Institute for years and now chairs AI ethics at the Alan Turing Institute, has argued consistently that the conceptual frameworks underpinning AI systems are not neutral. The choice to model cognition as prediction, or to define intelligence as performance on benchmarks, or to treat fairness as statistical parity across groups - each of these is a philosophical position, adopted usually implicitly, that determines what the system can and cannot do. When philosophers are excluded from the debate, the philosophical positions don't disappear. They just go unexamined.

The AI Now Institute, co-founded by Kate Crawford and Meredith Whittaker, has produced detailed structural analyses of how AI systems embed assumptions about the world that their creators never made explicit. Crawford's 2021 book Atlas of AI traces the material and conceptual supply chains of AI - the mines, the labor, the historical categories - and shows that what looks like a technical system is always also a social one. You can disagree with her conclusions. But you cannot build a coherent defense of those implicit assumptions without first having someone name them.

Philosopher Shannon Vallor at the University of Edinburgh, in her 2016 book Technology and the Virtues, makes the related point that technical design is always moral design - every choice about what a system rewards, penalizes, or ignores is a choice about what kind of behavior and thinking the system encourages. That is a philosophical determination. Leaving it to engineers by default does not make it neutral. It makes it unexamined.

Edge Cases: When Inclusion Goes Wrong

Two failure modes are worth naming, because they are common and they corrupt the process.

The first is performative inclusion. A company convenes an ethics board of non-technologists, ignores their recommendations, and uses their participation as a PR shield. This happened visibly with Google's AI ethics board in 2019, which was dissolved within a week of formation. The lesson is not that non-technologists shouldn't be involved - the lesson is that inclusion without decision-making power produces legitimacy theater. If the non-technologist's input cannot change the outcome, their presence changes nothing except the optics.

The second failure mode is overcorrection into paralysis. Some deliberative processes - particularly in the European regulatory context - have become so committed to including every stakeholder that the forums themselves produce no usable guidance. The EU AI Act consultations, which ran from 2021 through 2024, involved an enormous range of voices. The result was a document of genuine breadth that is also, in sections, operationally incoherent. Inclusion requires facilitation. Diversity of perspective requires someone whose job is synthesis.

There is also a population this advice does not apply to cleanly: highly specialized technical debates at the frontier. Whether a particular transformer architecture is more sample-efficient than an alternative is not a question historians should weigh in on. The principle of inclusion is about the framing of systems, not the implementation details.

The Cognitive Partnership Argument

Here is where I want to push further than the standard ethics argument.

The standard argument for inclusion is moral. Diverse voices prevent harm. That is true and it is not sufficient.

The deeper argument is cognitive. When you are designing a system intended to think - or to assist thinking - the range of examples you use to define "good thinking" determines what you get. AI systems trained primarily on academic and technical text, evaluated primarily by researchers measuring academic benchmarks, will produce systems optimized for academic and technical cognition. That is a narrow slice of human intelligence.

Clinical intuition, the kind that a nurse develops over years of watching patients before test results confirm what she already suspected - that is a form of intelligence. Legal reasoning, which is less about finding the right answer and more about navigating competing legitimate claims - that is a form of intelligence. Craft knowledge, the tacit understanding a machinist has of what a lathe sounds like when something is wrong - intelligence. None of these map cleanly onto the metrics currently used to evaluate AI systems, and the people who possess them are largely absent from the conversations defining what AI should be.

Cognitive scientist Barbara Tversky at Columbia University has spent decades documenting how spatial, embodied, and narrative forms of reasoning are fundamental to human cognition - not peripheral decorations on top of propositional logic. Her 2019 book Mind in Motion presents evidence that humans think with their bodies, their environments, and their stories as much as with abstract symbols. AI systems calibrated exclusively against abstract symbolic performance are not measuring general intelligence. They are measuring one kind of intelligence that happens to be easy to benchmark.

When I argue for cognitive partnership between humans and AI - which is the central claim of The Last Skill - I mean that the AI should extend human thinking broadly, not replicate one variety of it efficiently. That goal requires that the people defining the system's cognitive targets include people who think in the ways the system is supposed to partner with.

Limitations

Let me be direct about what the evidence here does not prove.

Including non-technologists does not guarantee better outcomes. It raises the probability by expanding the frame, but inclusion is neither necessary nor sufficient for good AI systems. There are examples of technically homogeneous teams producing thoughtful, careful systems, and examples of diverse multidisciplinary bodies producing outcomes that pleased no one and helped no one.

The research on cognitive diversity - including Scott Page's formal models and the Phillips et al. study - is mostly derived from problem-solving experiments and organizational studies, not from AI development specifically. The transfer is plausible but not demonstrated in controlled settings.

There is also no clear evidence about how much inclusion is enough, what forms of participation are most effective, or how to structure deliberation so that non-technical voices genuinely shape technical decisions rather than annotating them. These are open questions. Anyone claiming a precise formula is selling something.

Finally, this argument assumes that non-technologists possess meaningfully different epistemic frameworks - which is generally true but not universal. A philosopher trained exclusively in formal logic may share more cognitive habits with a machine learning researcher than with a hospice nurse. Diversity of credential is not the same as diversity of cognitive approach.

FAQ

Doesn't involving non-technologists just slow everything down?

Sometimes, yes - and that is occasionally the right outcome. Systems moving fast toward deployment in high-stakes contexts probably should slow down. The cost of deliberation is real. So is the cost of deploying systems that harm people in ways that could have been anticipated by someone with different expertise.

What specific roles should non-technologists play in AI debates?

Frame-setting at the design stage, not just review at the end. Philosophers naming the implicit assumptions. Sociologists specifying what the system's effects will look like in practice. Domain experts - doctors, lawyers, teachers - defining what good performance actually means in their context, which is rarely the same as benchmark accuracy.

How do you prevent non-technical input from becoming veto power that blocks all progress?

Structure matters more than composition. Non-technologists should have input on what is built and for whom, not necessarily on how. The distinction between ends and means is imperfect but functional. Decision rights should be explicit from the start of the process, not negotiated under pressure at the end.

The question of who belongs in AI debates is adjacent to a larger question - who gets to define what intelligence is, and what it is for. That question connects directly to debates about AI consciousness, AI rights, and the long-term trajectory of human-machine collaboration. If those topics are where your thinking is heading, they deserve their own attention.