Hierarchical value functions in deep reinforcement learning

We just released a paper on deep reinforcement learning with hierarchical value functions. One of the major problems in RL is to deal with sparse reward channels. Without observing a non-zero reward, it is hard for any agent to learn a reasonable value function. There is a direct relationship between the amount of exploration and observed rewards. Due to high branching factor in the action space, it can be difficult for the agent to efficiently explore the environment.

This got me thinking about intrinsic notions of reward. My earlier post talks about some of the prior work on intrinsic motivations. There has also been plenty of work on options in the context of reinforcement learning. Our work is inspired by this and many other papers cited in our arxiv paper.

Our model called hierarchical-DQN or h-DQN integrates hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning. A top-level value function learns a policy over intrinsic goals, and a lower-level function learns a policy over atomic actions to satisfy the given goals. h-DQN allows for flexible goal specifications, such as functions over entities and relations. This provides an efficient space for exploration in complicated environments.

Current limitations and open questions

Tejas Kulkarni, Cambridge, MA

comments powered by Disqus