A brief history of learning

Associative learning in bilaterians

The earliest form of learning is Pavlovian association, where some stimulus (conditioned stimulus) becomes associated with a rewarding/punishing event (unconditioned stimulus). It's conditioned because the association is conditional on past coincident events. This kind of associative learning is seen in even the simplest bilaterians, but not in other animals like corals or anemones. To make associative learning work, they needed to at least partially solve:

  1. Continual learning: spontaneous recovery and reacquisition.
  2. Credit assignment: the simplest tricks are eligibility traces, overshadowing, latent inhibition, and blocking.

Reinforcement learning in vertebrates

Then came reinforcement learning, first discovered by Edward Thorndike in experiments with cats and chickens. Characterized by trial-and-error learning, he formulated it in his law of effect:

Responses that produce a satisfying effect in a particular situation become more likely to occur again in that situation, and responses that produce a discomforting effect become less likely to occur again in that situation.

Satisfying and discomforting have since been replaced with reinforcing and punishing. Reinforcement learning is present in all vertebrates, even fish. Marvin Minsky was the first to try building an RL agent, but quickly faced the temporal credit assignment problem that biology must have solved somehow too. The earlier strategies of overshadowing, blocking etc, only worked for stimuli that were simultaneous, not for those separated in time. Richard Sutton was the first to find a good biologically plausible solution - temporal difference learning. His student Peter Dayan (with Read Montague) later connected it to dopamine.

These early fish-like vertebrates also developed, for the first time, pattern recognition abilities, allowing them to recognize exponentially more objects in their environment and respond to correspondingly more states of the world. This required being able to discriminate between partially overlapping patterns and generalize from few instances to similar, novel patterns.

Another critical feature of intelligence that emerged in vertebrates along with reinforcement learning to ensure that it worked effectively was curiosity, which helped to tackle the exploration-exploitation dilemma. Curiosity first emerged in vertebrates; only advanced invertebrates like insects and cephalopods independently developed it later.

Lastly, vertebrates also developed the ability to form spatial maps and models of the world, enabled by neuronal coding of the vestibular sense (inner ear), head-direction (hindbrain) and spatial location (hippocampus). They could now tell the relative location of objects with respect to other objects, marking a massive leap in behavioral intelligence.

World models in mammals

The first small mammals had a hard time surviving around dinosaurs, but presumably had the first-mover advantage in many foraging situations because they were mostly hiding in nooks and crannies by default. They developed a neocortex that allowed them to simulate actions and possible outcomes and to learn by imagination, the first models of the world. Karl Friston and Rajesh Rao's predictive processing models frame perception as constrained hallucination (first proposed by Helmholtz) within this generative model of the world.

World models also enabled model-based RL - planning and learning by simulation, perhaps involving theta sweeps in the hippocampus and steering from the prefrontal cortex and motor cortex. Sutton proposed the first model-based RL algorithm Dyna, but it used a ground-truth model. Modern approaches like Dreamer and MuZero try to learn behavior and the world model simultaneously.

The motor hierarchy perhaps also enabled hierarchical RL.
IMG_3205 Medium.jpeg

Self-models, theory of mind and imitation learning in primates

Eventually, tree-dwelling primates developed large brains. Two prominent theories for what caused the explosion in brain size:

  1. Social-brain hypothesis: primate mini-societies necessitated theory of mind in order to navigate complex primate politics (driven by a new abundance of free time) - family hierarchies, individual relationships (friendships), nonfamily political processes that influence the hierarchies, reconciliations after conflicts.
    • Compelling evidence for the intimate link between self-models and models of others is seen in mirror neurons in the premotor cortex that presumably enable imitation learning allowing cultural transmission of tool use. Theory of mind might've been crucial for this too by enabling active teaching, prolonged practice and discrimination between intended and unintended teacher actions.
  2. Ecological-brain hypothesis: primate frugivorous diets of fruits that were accessible only in precise temporal windows necessitated sophisticated planning and anticipation of future needs. Modeling a future self might also have precipitated the development of theory of mind. Only primates seem able to plan and take actions in anticipation of future needs that do not currently engage valence responses even in simulation because the internal state is in a different context.

Either way, it seems that there was significant evolutionary pressure to develop theory of mind. An intriguing possibility is that modeling the self or mentalizing with a second-order generative model was the trick the brain used to enable the ability to model the other and the future self.
IMG_3221 Medium.jpeg

AI implementations of mentalizing remain limited but developments in robotics do reflect similar functional pressures. For example, directly copying expert behaviors was seen to be a very brittle imitation learning approach for self-driving with Pomerlau and Thorpe's ALVINN system. However, active teaching via human expert intervention (Ross and Bagnell) and inverse reinforcement learning (Abbeel, Coates and Ng) have been found to be more effective.

Language learning in humans

Language is a uniquely complex form of animal communication containing declarative labels (symbols) and grammar for combining them. Language allowed humans to transfer thoughts among themselves with high fidelity, in effect enabling learning from the simulations performed by others. This had enormous benefits in allowing sophisticated coordination at scale, teaching, idea accumulationand memetics.

The Evolution of Progressively More Complex Sources of Learning

REINFORCING IN EARLY BILATERIANS SIMULATING IN EARLY VERTEBRATES MENTALIZING IN EARLY PRIMATES SPEAKING IN EARLY HUMANS
SOURCE OF LEARNING Learning from your own actual actions Learning from your own imagined actions Learning from others' actual actions Learning from others' imagined actions
WHO LEARNING FROM? Yourself Yourself Others Others
ACTION LEARNING FROM? Actual actions Imagined actions Actual actions Imagined actions

Language seems to be loosely based in Broca's area (language production) and Wernicke's area (language understanding) of the neocortex. These aren't novel anatomical areas but were just repurposed from existing cortical structures. Language learning also crucially depends on curricula that include proto-conversations, joint attention and questioning.

The evolutionary pressures that led to language remain hotly debated. One promising theory from Robin Dunbar goes that the pressure arose from the need to transmit tool manufacture and use to children in Homo erectus. This foundation of basic language enabled non-kin linguistic communication that was then supercharged by a positive feedback loop - gossip to detect and punish defectors rewarded altruistic behavior which in turn rewarded more sophisticated cooperation that then increased the returns to punishing defectors. Noam Chomsky, on the other hand, argues language evolved only as a trick for inner thinking and not for communication.

Machine learning in neural networks

From Rosenblatt's perceptron to Claude Code and beyond.


Links

Sources