Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

On 11 May, 2017 By admin 0 Comments

Steven Stenberg Hansen

May, 2017

Abstract:

We present a new deep meta reinforcement learner, which we call Deep Episodic Value Iteration (DEVI). DEVI uses a deep neural network to learn a similarity metric for a non-parametric model-based reinforcement learning algorithm. Our model is trained end-to-end via back-propagation. Despite being trained using the model-free Q-learning objective, we show that DEVI's model-based internal structure provides `one-shot' transfer to changes in reward and transition structure, even for tasks with very high-dimensional state spaces.

Attachment:

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning.pdf

Resource Type:

Academic Paper

Tags:

Machine Learning

Reinforcement Learning

Backpropagation

Deep Episodic Value Iteration

Meta-Reinforcement Learning

You are here

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning