Learning in Dreamscapes

Does learning-in-dreaming generalize beyond AI and individual brains?

Aug 23, 2024

∙ Paid

This research note is part of the Mediocre Computing series as well as the Protocol Narratives series.

Here’s a hypothesis: Storytelling is how societies dream together, and this collective dreaming is what drives collective learning that continually builds and narrates our world into existence, growing and extrapolating it into the future from the past and present. Our narrative protocols, which determine the when and where of storytelling behaviors, are the “sleep” epochs of society.

The thought was trigged by a tweet that caught my eye, which linked to a paper with the remark “the brain finetunes on synthetic data while it sleeps.” The paper, The brain simulates actions and their consequences during REM sleep, is quite interesting.

Vivid dreams mostly occur during a phase of sleep called REM¹^–⁵. During REM sleep, the brain’s internal representation of direction keeps shifting like that of an awake animal moving through its environment⁶^–⁸. What causes these shifts, given the immobility of the sleeping animal? Here we show that the superior colliculus of the mouse, a motor command center involved in orienting movements⁹^–¹⁵, issues motor commands during REM sleep, e.g. turn left, that are similar to those issued in the awake behaving animal. Strikingly, these motor commands, despite not being executed, shift the internal representation of direction as if the animal had turned. Thus, during REM sleep, the brain simulates actions by issuing motor commands that, while not executed, have consequences as if they had been. This study suggests that the sleeping brain, while disengaged from the external world, uses its internal model of the world to simulate interactions with it.

It reflects positively on the rising levels of AI literacy that you can tweet such things in public spaces, and tweet such papers, and expect to be understood.

Finetuning is one of two strategies for making an AI model aware of information it did not encounter during its initial training, the other being RAG (retrieval augmented generation). Finetuning updates the model itself, while RAG simply puts the new information into its “inference context.” It’s like the difference between conscientiously studying a history book to (say) understand the Ukraine conflict better versus scanning a few news articles and posting hot takes. Finetuning takes more effort, and requires more computing resources, but yields better results.

Training on synthetic data is like learning to drive in a video game like Grand Theft Auto or learning to fly in a flight simulator (these examples are frequently used in AI discussions). Where a body of knowledge has roots in a domain that has enough of a “physics” to it to simulate well, this is a feasible strategy to accelerate and/or cost-down training.

Finetuning on synthetic data is a next-level challenge. Initial model training is like humans going to school — a safe learning environment, with no real-world/real-time pressures, where you can acquire foundational knowledge. But finetuning must happen in the context of live activity and information streams. This is dangerous, since new knowledge is generally more fragile and unreliable. On the other hand, sticking to increasingly obsolete old knowledge is also dangerous. This results in what is normally called the exploration/exploitation tradeoff. You must explore in risky ways to gather new knowledge, and exploit existing knowledge while it still has value.

One way to mitigate the risks is to consciously alternate between exploration and exploitation behaviors in appropriately chosen contexts, a strategy known as dual control in control theory (an idea going back to the 1960s). The example in the Wikipedia link is a good illustration. When learning to drive a new car, you might “explore” its handling characteristics in a relatively safe environment, like an empty parking lot or low-traffic roads, but on higher traffic roads, until you’re used to the car, you’ll probably drive more cautiously.

If you take the dual control strategy to the limit, you reach the somewhat obvious conclusion that you should do the exploration in the safest place possible — in simulation. For biological organisms, this is REM sleep. The brain makes up and works through simulated scenarios, while putting the body into paralysis so you don’t thrash about dangerously.

I first heard of this idea almost 20 years ago, in a talk by a biologist who had done experiments with songbirds, attaching EEGs to them as they slept. Turns out, the songbirds “practiced” their songs in their dreaming sleep. I suppose that’s lower risk than trying out your new song while awake, and messing it up and turning off the girl birds you’re trying to attract. The paper above seems to be a continuation of this long line of research. It’s not a particularly surprising finding, but it’s nice to have basic intuitions confirmed. This is how things should work, based on everything we know about both biology and computing. But it is by no means self-evident. There are other hypotheses about why we dream (for eg, that one part of the brain tries to keep itself busy to avoid takeover by another part — iirc this is David Eagleman’s hypothesis).

In the replies to the tweet, I found another paper that takes the general idea to AI system design. This too is obviously not a radical idea. In fact it’s an obvious thing to attempt. The hard part is getting the strategy to work. The connection between dreaming and modern AI goes back to Google’s DeepDream, but if you want to integrate “dreaming” into a conscious learning strategy, you need more than dream-like images being generated. It’s nice to see the basic intuitions validated with a reduction to practice in a real design.

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

The abstract is pretty on-the-nose about the approach. It goes straight at domains that have a solid simulatable “physics” to them:

Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A ``wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience.

Neat, huh?

Just to nail down the connection, the most familiar experience of this whole phenomenon is probably the vivid dreams we all experience when we spend a day working in some highly simulatable domain. For me, the examples that come to mind are playing Tetris and working within mechanical CAD tools. On days (rare now) that I spend many hours playing a game like Tetris, I tend to dream a particular sort of vivid and satisfying dream that is about playing Tetris. With the more complex skill of CAD, it’s even better. These are great dreams to have, and there’s no complicated Jungian shadow unpacking or Freudian archaeology going on. It’s a straightforward connection between what you were trying to do while awake, and what the brain is trying to lock down into long-term procedural memory while “asleep.”

But speaking of complicated Jungian shadow unpacking and Freudian archaeology…. what about those actually? Let’s talk about the thought I opened with.

Keep reading with a 7-day free trial

Subscribe to Contraptions to keep reading this post and get 7 days of free access to the full post archives.