Google's Nested Learning
๐ฅ ๐๐ผ๐ผ๐ด๐น๐ฒโ๐ ๐ก๐ฒ๐๐๐ฒ๐ฑ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด: ๐๐ผ๐ ๐ฆ๐ฒ๐น๐ณ-๐ ๐ผ๐ฑ๐ถ๐ณ๐๐ถ๐ป๐ด ๐ง๐ถ๐๐ฎ๐ป๐ ๐ ๐ฒ๐ฟ๐ด๐ฒ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ฎ๐ป๐ฑ ๐ ๐ฒ๐บ๐ผ๐ฟ๐

AI has a memory problem. Your brain can learn something new today without wiping yesterday. AI? It forgets instantly. ๐๐ฎ๐๐ฎ๐๐๐ฟ๐ผ๐ฝ๐ต๐ถ๐ฐ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐๐๐ถ๐ป๐ด is its default setting.
For years our fix was โmake it bigger.โ More layers. More parameters. More GPUs.
Googleโs latest research says: ๐ช๐ฒโ๐๐ฒ ๐ฏ๐ฒ๐ฒ๐ป ๐๐ฐ๐ฎ๐น๐ถ๐ป๐ด ๐๐ต๐ฒ ๐๐ฟ๐ผ๐ป๐ด ๐ฑ๐ถ๐บ๐ฒ๐ป๐๐ถ๐ผ๐ป.
๐ง ๐ง๐ต๐ฒ ๐๐ฟ๐ฎ๐ถ๐ป ๐๐ฒ๐ฎ๐ฟ๐ป๐ ๐๐ถ๐ธ๐ฒ ๐ฎ๐ป ๐ข๐ฟ๐ฐ๐ต๐ฒ๐๐๐ฟ๐ฎ โ Not a Metronome
Your brain runs multiple learning tempos at once:
- ๐๐ฎ๐บ๐บ๐ฎ: fast, reactive
- ๐๐ฒ๐๐ฎ: active thinking
- ๐ง๐ต๐ฒ๐๐ฎ/๐๐ฒ๐น๐๐ฎ: slow, deep storage
AI today forces every โinstrumentโ to learn at the same speedโฆ then shuts learning off entirely after training.
This is the ๐ถ๐น๐น๐๐๐ถ๐ผ๐ป ๐ผ๐ณ ๐ฑ๐ฒ๐ฝ๐๐ต.
๐ผ ๐ก๐ฒ๐๐๐ฒ๐ฑ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด: ๐๐ ๐ช๐ถ๐๐ต ๐ ๐๐น๐๐ถ๐ฝ๐น๐ฒ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐ง๐ฒ๐บ๐ฝ๐ผ๐ Googleโs Nested Learning reframes a model as layers of learners, each updating at its own frequency:
- ๐๐ฎ๐๐ โ immediate context
- ๐ ๐ฒ๐ฑ๐ถ๐๐บ โ structural patterns
- ๐ฆ๐น๐ผ๐ โ stable long-term memory
A multi-tempo learning system โ just like your brain.
๐ฅ ๐ง๐ต๐ฒ ๐๐ฟ๐ฒ๐ฎ๐ธ๐๐ต๐ฟ๐ผ๐๐ด๐ต ๐๐ป๐๐ถ๐ด๐ต๐: Optimizers = Memory Systems
Google shows:
- ๐น Backprop is memory of surprise
- ๐น Momentum is memory of gradient history
- ๐น Adam is memory of long-term trends
- ๐น Pre-training is massive long-term consolidation
Once you treat optimizers as memoryโฆ
- โก๏ธ the boundary between training and inference disappears.
- โก๏ธ models can update ๐ฌ๐๐๐ก๐ ๐ฉ๐๐๐ฎ ๐ฉ๐๐๐ฃ๐ .
Thatโs the basis of Googleโs new architecture.
๐น ๐๐ข๐ฃ๐ โ The Model Designed to Never Forget
HOPE blends two memory systems:
๐ป๐ง๐ถ๐๐ฎ๐ป๐ (๐๐ฎ๐๐ ๐ ๐ฒ๐บ๐ผ๐ฟ๐)
- ๐น Self-modifying blocks that adapt during inference.
- ๐น Real-time learning.
๐บ๐๐ผ๐ป๐๐ถ๐ป๐๐๐บ ๐ ๐ฒ๐บ๐ผ๐ฟ๐ ๐ฆ๐๐๐๐ฒ๐บ (๐ฆ๐น๐ผ๐ ๐ ๐ฒ๐บ๐ผ๐ฟ๐)
- ๐น A chain of slow-updating memory modules that donโt get overwritten.
- ๐น Long-term stability.
Together, HOPE learns in multiple tempos โ like cognition, not computation.
๐บ The ๐ฅ๐ฒ๐๐๐น๐๐?
Continual Learning:
- ๐น Retains old tasks while learning new ones.
Zero catastrophic forgetting.
- ๐น Needle-in-a-Haystack: Scored 100% where Transformers buckled under long contexts.
- ๐น Language Modeling: Outperformed strong Transformer baselines even on standard LM tasks.
Weโve spent a decade building bigger models that forget easily. Transformers made AI powerful.
Nested Learning could make it ๐ฎ๐น๐ถ๐๐ฒ โ adaptive, continuous, memorable. And do what your brain does naturally: ๐น๐ฒ๐ฎ๐ฟ๐ป ๐๐ผ๐ฑ๐ฎ๐ ๐๐ถ๐๐ต๐ผ๐๐ ๐น๐ผ๐๐ถ๐ป๐ด ๐๐ฒ๐๐๐ฒ๐ฟ๐ฑ๐ฎ๐.
This isnโt a drop-in replacement for Transformers โ ๐ถ๐โ๐ ๐ฎ ๐ฑ๐ถ๐ฟ๐ฒ๐ฐ๐๐ถ๐ผ๐ป, ๐ป๐ผ๐ ๐ฎ ๐ฑ๐ฒ๐๐๐ถ๐ป๐ฎ๐๐ถ๐ผ๐ป (๐๐ฒ๐).
Reference:
Google Research: Introducing Nested Learning: A new ML paradigm for continual learning