NIPS 2016: Deep Reinforcement Learning

Last week, the 29th conference on Neural Information Processing Systems (NIPS) was held in Barcelona. Attendance has swelled in recently years, this year 6000 attendees, following breakthroughs in applying deep neural networks in industry and academia.

Our primary interest for attending was to absorb the latest research coming out of the deep reinforcement learning community. On this topic, many new developments were presented. For us, the following were a couple of standout developments.


Present deep reinforcement learning algorithms typically optimize action selection over near-term time horizons. However, in many scenarios a sequence of actions (plan) is required to solve the learning problem. For example, if one plays billiards, it is sometimes preferable to ‘set the ball up’ for a follow-up shot, rather than shooting directly towards a pocket. Two notable methods were presented at NIPS that attempted to address the planning problem.

  1.  Value Iteration Networks: in value iteration networks, a convolutional neural network learns longer-term path-planning objectives (publication).
  2. Predictron: a neural architecture that performs an n-lookahead for estimating the reward associated with a [single] action (publication).

Deep RL Algorithms

The present state-of-the-art for deep RL, asynchronous advantage actor-critic (A3C), has been supplanted upon with an algorithm called Retrace(λ)

Reinforcement Environments

In several contexts (talks, symposia and workshops) researchers discussed the need to go beyond simple environments for training A.I. agents.

Drew Purves, of Google DeepMind, provided a lucid and systematic description of the qualities of biological environments that have led to the evolution of complex intelligence. He argued that in silico environments used for training A.I. could be improved by incorporating certain qualities of natural environments.

Pieter Abbeel, of UC Berkeley, argued for generalised learning across environments, releasing OpenAI Universe as a framework for developing such agents.


Leave a comment

Please prove that you are human: