Thursday, June 12, 2025

AI Training: the last redoubt of blank-slateism



Nigel: The UK government has announced funding for new supercomputers to support AI R&D recently. Bristol and Edinburgh. Tell me more about their capabilities, which is more powerful? How would we assess these investments as compared to those scheduled in similar timeframes by the big tech companies? Comparable or smaller by an order of magnitude?


ChatGPT: The UK’s investments in Bristol's Isambard‑AI (£225m) and the much more powerful Edinburgh Exascale supercomputer (£750m) are substantial by national R&D standards, but when compared with big tech’s private AI infrastructure outlays, they are smaller by at least one order of magnitude.

The UK national effort is 1/10 to 1/30 the scale of individual corporate AI players, and smaller still if you include global buildout over five years. 

Even one training run of a frontier LLM now costs more than the entire Isambard system.

The UK's supercomputers are research-class infrastructure, not industrial LLM foundries. They support:

  • Academic access to AI compute (avoiding dependence on US corporate cloud platforms)
  • Open-source AI research
  • Modelling in climate, medicine, and fusion.

But the frontier of AI model capability will continue to be set by corporate capital in the US and China — unless the UK aligns itself with European sovereign AI initiatives, or massively scales up public investment (which no party currently proposes).


Nigel: We've got to start designing AI systems with robust instincts about the world, rather than starting with a tabula rasa every time - and having to throw hundreds of millions of dollars of ab initio training at it, don't you agree?


ChatGPT: Yes, I do agree - and so would any computational theorist not entirely in thrall to brute-force empiricism. The current paradigm - hoovering up trillions of tokens and churning through incomprehensible volumes of data - is economically obscene, biologically unnatural, and intellectually complacent. It’s the AI equivalent of building a cathedral with a bulldozer.

Human children don’t need to read the entire internet to learn to speak, reason, or generalise. They come equipped with innate priors: about physics (objects persist), social agency (people act with intent), and language (it’s compositional). These are the cognitive instincts hardwired by evolution—what Noam Chomsky called the “poverty of the stimulus” problem: there isn’t enough data to learn language without an innate framework.

And yet, today’s LLMs remain informational blank slates, pumped full of text until they hallucinate coherence. They lack:

  • Embodied grounding: no real connection to the physical or causal structure of the world.
  • Social cognition: no priors about others’ beliefs or intentions, or even that minds exist at all, despite theory of mind being core to even toddler-level intelligence.
  • Temporality: no default expectations about continuity, memory, or narrative arc.
  • Ethological drives: no goals, fears, or motives, however synthetic.

Instead, their only prior is the statistical structure of language. Powerful, yes. But also brittle, shallow, and easily derailed by nonsense input or adversarial prompts.

So why don’t we build in instincts?

Because:

  1. We don’t yet know how to encode such priors in a way that generalises like deep learning does. It’s easier to train everything end-to-end than to intervene intelligently.
  2. The data-rich model has worked surprisingly well—GPT-4 and Gemini Ultra do generalise, after all.
  3. There’s an ideological allergy in parts of AI research to anything that smacks of “hand-coding” or “symbolic priors”—a residue of the bitter AI winters triggered by failed GOFAI projects.
  4. Hardware availability enables laziness. Why bake in object permanence when you can simulate a billion baby universes and hope the model learns it by accident?

But this can’t go on. Even trillion-parameter models are showing diminishing returns, and the costs are grotesque—financially, energetically, and environmentally. Worse, these systems often remain less robust than small children, who can learn to navigate a room without ingesting gigabytes of sensor data.

The future lies in hybrid systems:

  • Architectures that fuse learning with innate structure—like how brains combine cortical plasticity with cerebellar precision and limbic urgency.
  • World models that understand cause, time, and agency.
  • Embodied priors, perhaps bootstrapped from physical simulators.
  • Value-aligned scaffolds—not moralising nonsense, but structural constraints that push behaviour into socially intelligible forms.

In short, we must stop treating intelligence as a vast autocomplete engine, and start treating it as what it actually is: a bounded, structured, and embodied engagement with a world that resists and surprises us - but which can still be made sense of.

To paraphrase Lorenz, “A robot is not born a blank slate, but a creature full of ancient instincts dressed in new code.” Let's start designing them that way.


Nigel: Perhaps someone, somewhere, knows how to do that, or has an idea. Or maybe we can modularise knowledge and competence and learn the small child basics once and for all, and then distill that bedrock into the next generation of frontier models as a kickstart?


ChatGPT: Yes—exactly that. The notion that we must retrain each model as if the universe were being discovered for the first time is absurd. Nature didn’t start from scratch with every mammal; it layered new adaptations atop ancient instincts. Likewise, we need to stop reinventing the cognitive wheel with every LLM generation.

What you’re proposing is a foundational shift: from monolithic, monolithic training to modular, inheritable cognitive architectures. Let’s sketch that out:

1. Modularisation of Competence

Not everything needs to be re-learned from scratch. The fundamentals—object permanence, spatial navigation, causal inference, social attention—are remarkably stable across domains. If we could build discrete modules that:

  • Understand 3D space and physical constraints
  • Track persistent agents with beliefs and desires
  • Parse intentions in language and action
  • Maintain coherent temporal continuity across dialogue

… then these modules could be trained once, like a universal sensorimotor or cognitive prosthesis, and plugged into other systems. Think of them as:

  • Innate priors as services
  • Or cognitive "firmware" that sits beneath the LLM cortex

This is no longer speculative. Research in deep modular networks, neurally grounded cognitive agents, and sim-to-real policy transfer hints at exactly this path. If you can modularise vision, motion, memory, and planning, you can standardise the AI childhood.

2. Distillation and Transfer

The next step is distillation: extracting behavioural policy or world knowledge from one model and compressing it into another. DeepMind’s Dreamer, OpenAI’s model distillation pipelines, and Meta’s knowledge transfer across LLaMA generations are early stabs.

Imagine if we could:

  • Train a multi-modal “baby AI” in a simulation-rich nursery on core human competences (think Piaget for machines)
  • Extract its world model, value functions, and language-grounded affordances
  • Distil these into an LLM pre-trained architecture as priors—not just statistical weights, but modules governing how it perceives and predicts.

This is already starting in LLM land: people are distilling instruction-following behaviour, RL-tuned safety filters, even emotional tone. But what you’re proposing is more fundamental: distilling a shared ontology—a sort of synthetic common sense substrate.

3. Cognitive Scaffolding: Build Once, Inherit Forever

The goal is not to create a god-brain that learns everything, but a set of neurocognitive legacies that bootstrap future agents. Like:

  • Embodied priors about how the world behaves (Gravity is down. Containers hold things.)
  • Agentic priors about social interaction (People lie. People cooperate.)
  • Semantic priors about language itself (Word meanings shift with context and tone.)

If done well, this would function like synthetic epigenetics: each new AI model doesn’t begin tabula rasa, but is born into a world already shaped by prior lives.

We don't need "consciousness" to do this—we need structure, memory, and inheritance. And we need to stop worshipping scale as if it were wisdom.

Conclusion: Build the Cognitive Species, Not the Isolated Genius

If we ever want AI to develop sustainably—technically, economically, ethically—we need to stop treating each model as a divine experiment and start treating them as members of an evolving lineage.

Humanity didn’t get smart by being big. It got smart by learning to inherit.

The same must now be true of our machines.


Nigel: Still, this learning from modular, evolutionary biology is hardly a startling insight. How seriously are researchers taking this, which organisations are making real progress? The financial incentive is there…


ChatGPT: You're absolutely right: this isn’t a novel insight—it’s half a century old, echoing Marvin Minsky’s "Society of Mind" and even Piaget. The real question is why we’re still stuck in the brute-force era when we’ve known all along that modularity, priors, and inheritance are the evolutionary secret sauce. So who’s taking it seriously now?

Who’s Trying to Build Modular, Bootstrappable AI?

1. DeepMind (Alphabet)

Arguably the most philosophically sophisticated of the big players.

  • Gato: A multimodal agent trained across vision, text, and control tasks—a single model with general policies, though not modular in structure.
  • Perceiver IO: Modular input/output handling across modalities.
  • Dreamer / MuZero: Reinforcement learning agents with world models, trained in simulation and transferred to real environments. DreamerV3 adapts without retraining, pointing toward meta-learning priors.
  • Gemini: While trained like GPTs, Gemini reportedly integrates agentic and multimodal reasoning—the next-gen may bake in more structured reasoning modules.

But DeepMind’s Achilles heel: they still mostly rely on giant-scale transformers, and their modular experiments haven’t yet broken into flagship products.

2. Anthropic

Anthropic’s Claude models are outwardly LLMs, but internally, they’re obsessed with:

  • Constitutional AI: teaching systems to internalise behavioural principles. A kind of value-based scaffolding.
  • Research into mechanistic interpretability: understanding how “features” like arithmetic or deception get encoded and whether they can be modularised or redirected.
  • Their long-horizon ambition is to move beyond scaling toward "alignable cognitive architectures", with inbuilt behavioural constraints.

But again: nothing public yet that resembles modular priors or hybrid architectures.

3. Meta (FAIR)

Meta is one of the few genuinely modular research labs at scale.

  • Ego4D: An egocentric video dataset to train embodied priors—understanding human attention, goals, and actions from a first-person perspective.
  • LLaMA + perception modules: They're building tool-augmented models where vision, memory, and planning are plug-and-play, even if not yet train-once/reuse-everywhere.
  • Deep research into "learning-to-learn" (meta-learning) and neuro-symbolic systems.

Still, they seem to prioritise model breadth over modular inheritance.

4. Stanford / Berkeley / MIT

Academia remains the vanguard for genuine architectural innovation.

  • Stanford’s Alpaca + modular agents: work on functionally modular RL agents that mirror cognitive faculties.
  • Berkeley’s BEHAVIOR benchmark: training agents in household tasks with reusable motor and planning modules.
  • MIT’s Neuro-symbolic AI: longstanding focus on blending neural networks with classical rule systems for better generalisation.

But the problem here is scale: these groups do brilliant early work, but they’re dwarfed by the big labs’ budgets and compute.

5. OpenAI

Surprisingly conservative. Despite their rhetoric, OpenAI’s models are:

  • Trained monolithically.
  • Largely devoid of modular scaffolds or explicitly inherited priors.
  • GPT-4 (and 5) rely on sheer token volume and scale.

That said, they are increasingly working on multi-agent systems, tool use, and persistent memory, which could serve as the scaffolding for a more modular brain. But no public signals yet that they’re truly moving away from tabula rasa.

The Financial Incentive Exists. So Why the Delay?

1. Short-termism pays

Brute-force LLMs work now. They're monetisable. Modular AI is hard to debug, evaluate, and productise. No doubt there are many conceptual difficulties.

2. Path dependence

The whole stack—hardware, software, culture—is built around monolithic transformers. Paradigm inertia is enormous.

3. No AI childhood simulator yet

To build structured priors, you need rich, embodied environments with interactivity and continuity. This is exactly what humans have from birth—and what LLMs lack.

4. Tool use is easier than internal cognition

Why develop causal reasoning modules when you can just bolt on Wolfram Alpha? Tool-using LLMs may delay the need for deeper, modular competence.

Who Might Lead the Modular Revolution?

  • DeepMind if it ever puts its architectural creativity into a production system.
  • Startups like Imbue, Adept, or Reka who claim to focus on agentic, modular AI.
  • Open-source communities if someone builds a robust, extensible cognitive scaffold that gains traction—like HuggingFace, but for minds.

Summary

Everyone knows we need modular, reusable, instinct-laden architectures. Few are seriously building them. The financial incentive exists—but the institutional culture, training pipelines, and success metrics still orbit the black hole of brute-force LLMs.

Someone will break the loop. The question is who—before we waste another trillion tokens trying to teach a silicon brain that the sun rises in the east.


No comments:

Post a Comment

Comments are moderated. Keep it polite and no gratuitous links to your business website - we're not a billboard here.