Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning

Muhammad Burhan Hafez 1 , Cornelius Weber 1 , Matthias Kerzel 1 ,  and Stefan Wermter 1
  • 1 Department of Informatics, University of, Hamburg, Germany

Abstract

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., Human-level control through deep reinforcement learning, Nature, 2015, 518(7540), 529-533

  • [2] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, Proceedings of the International Conference of Learning Representations (ICLR), 2016

  • [3] T. D. Kulkarni, A. Saeedi, S. Gautam, S. J. Gershman, Deep successor reinforcement learning, arXiv preprint arXiv: 1606.02396, 2016

  • [4] l T. M. Moerland, J. Broekens, C. M. Jonker, Efficient exploration with double uncertain value networks, Deep Reinforcement Learning Symposium, Advances in Neural Information Processing Systems (NIPS), 2017

  • [5] S. Racanière, T. Weber, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, et al., Imagination-augmented agents for deep reinforcement learning, Advances in Neural Information Processing Systems (NIPS), 2017, 5694-5705

  • [6] B. C. Stadie, S. Levine, P. Abbeel, Incentivizing exploration in reinforcement learning with deep predictive models, NIPS Workshop on Deep Reinforcement Learning, 2015

  • [7] M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, et al., Reinforcement learning with unsupervised auxiliary tasks, Proceedings of the International Conference on Learning Representations (ICLR), 2017

  • [8] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, et al., Continuous control with deep reinforcement learning, Proceedings of the International Conference on Learning Representations (ICLR), 2016

  • [9] G. Kalweit, J. Boedecker, Uncertainty-driven imagination for continuous deep reinforcement learning, Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings ofMachine Learning Research, Mountain View, United States, 2017, 195-206

  • [10] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, et al., Asynchronous methods for deep reinforcement learning, Proceedings of the International Conference on Machine Learning (ICML), 2016, 1928-1937

  • [11] S. Lange, M. Riedmiller, Deep auto-encoder neural networks in reinforcement learning, Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2010, 1-8

  • [12] S. Lange, M. Riedmiller, A. Voigtlander, Autonomous reinforcement learning on raw visual input data in a real world application, Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2012, 1-8

  • [13] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, P. Abbeel, Deep spatial autoencoders for visuomotor learning, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016, 512-519

  • [14] R. Legenstein, N.Wilbert, L.Wiskott, Reinforcement learning on slow features of high-dimensional input streams, PLoS Computational Biology, 2010, 6(8), p. e1000894

  • [15] M. B. Hafez, M. Kerzel, C. Weber, S. Wermter, Slowness-based neural visuomotor control with an Intrinsically motivated Continuous Actor-Critic, Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2018, 509-514

  • [16] J. Schmidhuber, Formal theory of creativity, fun, and intrinsic motivation (1990-2010), IEEE Transactions on Autonomous Mental Development, 2010, 2(3), 230-247

  • [17] R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. D. Turck, P. Abbeel, VIME: variational information maximizing exploration, Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 2016, 1109-1117

  • [18] S. Mohamed, D. J. Rezende, Variational information maximisation for intrinsically motivated reinforcement learning, Advances in Neural Information Processing Systems (NIPS), Montréal, Canada, 2015, 2116-2124

  • [19] N. Chentanez, A. G. Barto, S. Singh, Intrinsically motivated reinforcement learning, Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, 2005, 1281-1288

  • [20] D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven exploration by self-supervised prediction, Proceedings of the International Conference on Machine Learning (ICML), 2017, 2778-2787

  • [21] P. Y. Oudeyer, F. Kaplan, V. V. Hafner, Intrinsic motivation systems for autonomous mental development, IEEE Transactions on Evolutionary Computation, 2007, 11(2), 265-286

  • [22] M. B. Hafez, C. Weber, S. Wermter, Curiosity-driven exploration enhances motor skills of continuous actor-critic learner, Proceedings of the 7th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Lisbon, Portugal, 2017, 39-46

  • [23] M. B. Hafez, C. K. Loo, Curiosity-based topological reinforcement learning, Proceedings of the 2014 IEEE International Conference on Systems,Man and Cybernetics (SMC), San Diego, CA, USA, 2014, 1979-1984

  • [24] T. D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, Advances in neural information processing systems (NIPS), 2016, 3675-3683

  • [25] S. Sukhbaatar, Z. Lin, I. Kostrikov, G. Synnaeve, A. Szlam, R. Fergus, Intrinsic motivation and automatic curricula via asymmetric self-play, Proceedings of the International Conference on Learning Representations (ICLR), 2018

  • [26] A. E. Stahl, L. Feigenson, Expectancy violations promote learning in young children, Cognition, 2017, 163, 1-14

  • [27] C. Kidd, S. T. Piantadosi, R. N. Aslin, The goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex, PloS one, 2012, 7(5), p. e36399

  • [28] H. Van Hasselt, Reinforcement learning in continuous state and action spaces, Reinforcement Learning, Springer, Berlin, Heidelberg, 2012, 207-251

  • [29] J. Jockusch, H. Ritter, An instantaneous topological mapping model for correlated stimuli, Proceedings of the International Joint Conference on Neural Networks (IJCNN), Washington, 1999, 529-534

  • [30] D. Kingma, J. Ba, Adam: A method for stochastic optimization, Proceedings of the International Conference on Learning Representations (ICLR), 2014

  • [31] E. Rohmer, S. P. Singh, M. Freese, V-REP: A versatile and scalable robot simulation framework, Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013, 1321-1326

  • [32] M. Kerzel, E. Strahl, S. Magg, N. Navarro-Guerrero, S. Heinrich, S. Wermter, NICO - Neuro-Inspired COmpanion: A tal humanoid robot platform for multimodal interaction, Proceedings of the IEEE International Symposiumon Robot and Human Interactive Communication (RO-MAN), 2017, 113-120

  • [33] A. Cangelosi, M. Schlesinger, Developmental robotics: From babies to robots, Cambridge, MA: MIT Press, 2015

  • [34] M. Kerzel, S. Wermter, Neural end-to-end self-learning of Visuomotor skills by environment interaction, Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2017, 27-34

  • [35] J. L. Elman, Learning and development in neural networks: The importance of starting small, Cognition, 1993, 48(1), 71-99

  • [36] M. Eppe, M. Kerzel, S. Griffiths, H. G. Ng, S. Wermter, Combining deep learning for visuomotor coordination with object identification to realize a high-level interface for robot object-picking, Proceedings of the IEEE-RAS International Conference on Humanoid Robotics (Humanoids), 2017, 612-617

OPEN ACCESS

Journal + Issues

Paladyn. Journal of Behavioral Robotics is a fully peer-reviewed, open access journal that publishes original, high-quality research works and review articles on topics broadly related to neuronally and psychologically inspired robots and other behaving autonomous systems. The journal is indexed in SCOPUS.

Search