Earlier this month Google Research dropped a new paper titled “Nested Learning,” which introduces a new architecture that they are calling “a new ML paradigm for continual learning”. And it looks like a real step towards new ML architectures that can learn and improve over time like humans do. What they’ve essentially done is trained modular neural networks where some of these modules act to immediately process the last tokens, some act as medium-term memory over dozens to hundreds of tokens, and some act as long-term knowledge storage. These networks are able to update themselves at inference time whenever they learn important new data. They call the architecture “HOPE” and it beats transformer-based architectures of equivalent size on a few validated memory tasks, like “needle-in-a-haystack” tasks where the model has to remember a specific idea or phrase dropped into a longer passage.
There is no indication yet that HOPE can support the development of a SOTA general-purpose model. The publicly known models with HOPE architecture only have up to 1.3 billion parameters, which is still two orders of magnitude short of even GPT-3. But continual learning is a major open problem that the field is trying to solve, with the recent “Definition of AGI” paper from Yoshua Bengio and others regarding memory storage as the only component of AGI that doesn’t have any real progress.

Further, AI companies have strong financial incentives to build systems that can update in real-time, learning on the job and making and storing new discoveries. Right now SOTA models can’t even play video games like Pokemon because they keep forgetting what they’ve achieved so far and running back to do things that they don’t realize they’ve already done. Memory storage is key for executing any long-term, multi-step project. And the better AI systems are at developing radically new capabilities the more valuable they are. So there is real incentive to create AI systems that are “self-improving” in a strong sense — not just speeding up machine learning but updating their weights in real time.
At first I expect improvements in continual learning to be marginal, and look a lot like the “in-context” learning that AI models already do when they read and interact with your prompt and recall specific things you’ve said. AI systems will be able to remember which gym leaders they’ve defeated in Pokemon, elements of voice and style you’ve taught them, and things they’ve learned about their customer base that will allow them to appropriately price goods and services. I expect it to be marginal since scaling a new technique typically takes time, and when new paradigms scale too quickly they are too chaotic and unwieldy to be usefully deployed until they’ve been sufficiently refined (see o-series reward hacking, for example). But paradigms can scale very fast, and there’s no telling when we will see AI systems with the learning capabilities of humans, i.e. good enough to go from near tabula rasa to a university professor.
Emerging Governance Challenges
Continual learning looks like it could pose major challenges to current paradigms in safety and governance. An architecture that can update its own weights as it learns information is a product with constantly changing capabilities. That’s not an inherent problem. Computers also have constantly changing capabilities if users are good enough at coding. But if capabilities can shift enough it makes it much harder to say anything that is reliably true about any particular AI system. On one day a model may fail to cross critical red lines but the next day we can’t say for sure. On one day a cybersecurity regime may be sufficiently hardened to deal with LLMs but on the next day the models have acquired new offensive capabilities. We already have this to some extent with existing models, just given how quickly new models roll out and how much we learn about them only after they have been deployed. But at least with those models we can take months to stress-test them and try out different augmentation techniques before taking them to market.
Specifically, continual learning raises challenges to governance paradigms like:
- Evaluations: We could see AI systems that have no static base layer that can be evaluated. 800 million users [this is probably too many, see edit at end of post] could have 800 million different models with different weights and slightly different capabilities.
- Model cards: Correspondingly, it will be harder to have model cards that accurately describe the range of capabilities that an AI model might have, or which reliably measure performance on tests and benchmarks.
- Alignment: Models that “unlearn” dangerous information could re-learn it. In interpretability, network maps that identify the features of a network’s neurons may last only a day. Scheming models may find ways to adversarially hide critical information in complex ways in their weights. Problems of emergent misalignment could intensify as models can change more drastically over time, not just updating once at fine-tuning or through in-context learning but iteratively.
- Safety mitigations: In general, it may be harder to determine whether a mitigation put in place for a specific safety problem stays in place across iterations.
- Systemic risk: Sectors like finance, cyber, military, and media that respond badly to a rapidly changing equilibrium of offensive and defensive capabilities have to confront this even more often, and with many more branching points as AI systems update their capabilities in many more ways.
- Corporate governance: With so much gain of function happening in the wild, how does an AI company decide when it is safe to deploy a model?
It’s totally possible I’m getting ahead of myself here, but I do currently think that the financial incentives of AI companies favor developing and deploying advanced models that can update their weights like HOPE in order to change their capabilities to adapt to various economically valuable tasks. If this can be achieved, I worry about a rupture to many of our current approaches.
I’d like to see more strategic thinking about governance of models with continual learning. Even if we don’t have those models now, we do have a good understanding of what AI companies are trying to do and the incentives they have. For now, I’ll close with four approaches currently in development that look like they could be part of the solution:
- Turning evaluations and control into an “always-on verification layer” that can prove or at least assure various safety properties of a model as it changes and adapts to an environment and observe the model to see whether it does anything anomalous. A worry about this, though, is that verifying features of (e.g.) 800 million different models is likely to be way too compute-intensive to be feasible. I expect that we’ll need new technical approaches to the problem.
- Red-teaming and stress-testing models against red lines under various gain-of-function conditions to find out if the capabilities AI models can acquire will be dangerous.
- Starting to regulate AI systems with criminal law, holding companies liable when AI systems do things that would be illegal if humans did them (cf. law-following AI). The more AI systems learn and behave like humans do, the more using our evolved legal systems for dealing with human crime look appropriate.
- Developing a parallel system of evolving defenses against systemic risk that can update in response to changing offensive capabilities of AI systems (cf. the approach of Red Queen Bio).
And — as always — energetic, technocratic, and adaptive governance.
EDITED TO ADD: One of my favorite things about blogging and X is getting to leverage Cunningham’s Law to learn new things. Gavin Leech points out that part of what makes inference cheap is prefix caching. But prefix caches are weight-specific so you cannot use them for multiple different sets of weights. This means that running inference on lots of different weights leads to a much larger (~>10x) cost. This is affordable for enterprise users but not likely to roll out to 800 million individuals without breakthroughs.

