World models

World models are learned internal representations of the outside world, in particular its dynamics, that enable an agent to simulate, plan and generalize.

Most interesting line of work (near SOTA on Atari100k):

  1. MuZero does MCTS with learned latent space rollouts but did not include a self-predictive loss.
  2. EfficientZero added self-predictive loss
  3. EfficientZero V2 used Gumbel sampling instead of MCTS

Conspicuously missing are approaches to learn action abstractions in the vein of options in hierarchical RL. Some recent work is beginning to address the gap, but overall this is a significant open research idea.


Links

Sources