Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access January 1, 2019

Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning

Muhammad Burhan Hafez EMAIL logo , Cornelius Weber , Matthias Kerzel and Stefan Wermter

Abstract

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.

References

[1] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., Human-level control through deep reinforcement learning, Nature, 2015, 518(7540), 529-53310.1038/nature14236Search in Google Scholar PubMed

[2] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, Proceedings of the International Conference of Learning Representations (ICLR), 2016Search in Google Scholar

[3] T. D. Kulkarni, A. Saeedi, S. Gautam, S. J. Gershman, Deep successor reinforcement learning, arXiv preprint arXiv: 1606.02396, 2016Search in Google Scholar

[4] l T. M. Moerland, J. Broekens, C. M. Jonker, Efficient exploration with double uncertain value networks, Deep Reinforcement Learning Symposium, Advances in Neural Information Processing Systems (NIPS), 2017Search in Google Scholar

[5] S. Racanière, T. Weber, D. P. Reichert, L. Buesing, A. Guez, D. J. Rezende, et al., Imagination-augmented agents for deep reinforcement learning, Advances in Neural Information Processing Systems (NIPS), 2017, 5694-5705Search in Google Scholar

[6] B. C. Stadie, S. Levine, P. Abbeel, Incentivizing exploration in reinforcement learning with deep predictive models, NIPS Workshop on Deep Reinforcement Learning, 2015Search in Google Scholar

[7] M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, et al., Reinforcement learning with unsupervised auxiliary tasks, Proceedings of the International Conference on Learning Representations (ICLR), 2017Search in Google Scholar

[8] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, et al., Continuous control with deep reinforcement learning, Proceedings of the International Conference on Learning Representations (ICLR), 2016Search in Google Scholar

[9] G. Kalweit, J. Boedecker, Uncertainty-driven imagination for continuous deep reinforcement learning, Proceedings of the 1st Annual Conference on Robot Learning, volume 78 of Proceedings ofMachine Learning Research, Mountain View, United States, 2017, 195-206Search in Google Scholar

[10] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, et al., Asynchronous methods for deep reinforcement learning, Proceedings of the International Conference on Machine Learning (ICML), 2016, 1928-1937Search in Google Scholar

[11] S. Lange, M. Riedmiller, Deep auto-encoder neural networks in reinforcement learning, Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2010, 1-810.1109/IJCNN.2010.5596468Search in Google Scholar

[12] S. Lange, M. Riedmiller, A. Voigtlander, Autonomous reinforcement learning on raw visual input data in a real world application, Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2012, 1-810.1109/IJCNN.2012.6252823Search in Google Scholar

[13] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, P. Abbeel, Deep spatial autoencoders for visuomotor learning, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016, 512-51910.1109/ICRA.2016.7487173Search in Google Scholar

[14] R. Legenstein, N.Wilbert, L.Wiskott, Reinforcement learning on slow features of high-dimensional input streams, PLoS Computational Biology, 2010, 6(8), p. e100089410.1371/journal.pcbi.1000894Search in Google Scholar PubMed PubMed Central

[15] M. B. Hafez, M. Kerzel, C. Weber, S. Wermter, Slowness-based neural visuomotor control with an Intrinsically motivated Continuous Actor-Critic, Proceedings of the 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), 2018, 509-514Search in Google Scholar

[16] J. Schmidhuber, Formal theory of creativity, fun, and intrinsic motivation (1990-2010), IEEE Transactions on Autonomous Mental Development, 2010, 2(3), 230-24710.1109/TAMD.2010.2056368Search in Google Scholar

[17] R. Houthooft, X. Chen, Y. Duan, J. Schulman, F. D. Turck, P. Abbeel, VIME: variational information maximizing exploration, Advances in Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 2016, 1109-1117Search in Google Scholar

[18] S. Mohamed, D. J. Rezende, Variational information maximisation for intrinsically motivated reinforcement learning, Advances in Neural Information Processing Systems (NIPS), Montréal, Canada, 2015, 2116-2124Search in Google Scholar

[19] N. Chentanez, A. G. Barto, S. Singh, Intrinsically motivated reinforcement learning, Advances in Neural Information Processing Systems (NIPS), Vancouver, British Columbia, Canada, 2005, 1281-1288Search in Google Scholar

[20] D. Pathak, P. Agrawal, A. A. Efros, T. Darrell, Curiosity-driven exploration by self-supervised prediction, Proceedings of the International Conference on Machine Learning (ICML), 2017, 2778-278710.1109/CVPRW.2017.70Search in Google Scholar

[21] P. Y. Oudeyer, F. Kaplan, V. V. Hafner, Intrinsic motivation systems for autonomous mental development, IEEE Transactions on Evolutionary Computation, 2007, 11(2), 265-28610.1109/TEVC.2006.890271Search in Google Scholar

[22] M. B. Hafez, C. Weber, S. Wermter, Curiosity-driven exploration enhances motor skills of continuous actor-critic learner, Proceedings of the 7th Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Lisbon, Portugal, 2017, 39-4610.1109/DEVLRN.2017.8329785Search in Google Scholar

[23] M. B. Hafez, C. K. Loo, Curiosity-based topological reinforcement learning, Proceedings of the 2014 IEEE International Conference on Systems,Man and Cybernetics (SMC), San Diego, CA, USA, 2014, 1979-198410.1109/SMC.2014.6974211Search in Google Scholar

[24] T. D. Kulkarni, K. Narasimhan, A. Saeedi, J. Tenenbaum, Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, Advances in neural information processing systems (NIPS), 2016, 3675-3683Search in Google Scholar

[25] S. Sukhbaatar, Z. Lin, I. Kostrikov, G. Synnaeve, A. Szlam, R. Fergus, Intrinsic motivation and automatic curricula via asymmetric self-play, Proceedings of the International Conference on Learning Representations (ICLR), 2018Search in Google Scholar

[26] A. E. Stahl, L. Feigenson, Expectancy violations promote learning in young children, Cognition, 2017, 163, 1-1410.1016/j.cognition.2017.02.008Search in Google Scholar

[27] C. Kidd, S. T. Piantadosi, R. N. Aslin, The goldilocks effect: Human infants allocate attention to visual sequences that are neither too simple nor too complex, PloS one, 2012, 7(5), p. e3639910.1371/journal.pone.0036399Search in Google Scholar

[28] H. Van Hasselt, Reinforcement learning in continuous state and action spaces, Reinforcement Learning, Springer, Berlin, Heidelberg, 2012, 207-25110.1007/978-3-642-27645-3_7Search in Google Scholar

[29] J. Jockusch, H. Ritter, An instantaneous topological mapping model for correlated stimuli, Proceedings of the International Joint Conference on Neural Networks (IJCNN), Washington, 1999, 529-53410.1109/IJCNN.1999.831553Search in Google Scholar

[30] D. Kingma, J. Ba, Adam: A method for stochastic optimization, Proceedings of the International Conference on Learning Representations (ICLR), 2014Search in Google Scholar

[31] E. Rohmer, S. P. Singh, M. Freese, V-REP: A versatile and scalable robot simulation framework, Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2013, 1321-132610.1109/IROS.2013.6696520Search in Google Scholar

[32] M. Kerzel, E. Strahl, S. Magg, N. Navarro-Guerrero, S. Heinrich, S. Wermter, NICO - Neuro-Inspired COmpanion: A tal humanoid robot platform for multimodal interaction, Proceedings of the IEEE International Symposiumon Robot and Human Interactive Communication (RO-MAN), 2017, 113-12010.1109/ROMAN.2017.8172289Search in Google Scholar

[33] A. Cangelosi, M. Schlesinger, Developmental robotics: From babies to robots, Cambridge, MA: MIT Press, 201510.7551/mitpress/9320.001.0001Search in Google Scholar

[34] M. Kerzel, S. Wermter, Neural end-to-end self-learning of Visuomotor skills by environment interaction, Proceedings of the International Conference on Artificial Neural Networks (ICANN), 2017, 27-3410.1007/978-3-319-68600-4_4Search in Google Scholar

[35] J. L. Elman, Learning and development in neural networks: The importance of starting small, Cognition, 1993, 48(1), 71-9910.1016/0010-0277(93)90058-4Search in Google Scholar

[36] M. Eppe, M. Kerzel, S. Griffiths, H. G. Ng, S. Wermter, Combining deep learning for visuomotor coordination with object identification to realize a high-level interface for robot object-picking, Proceedings of the IEEE-RAS International Conference on Humanoid Robotics (Humanoids), 2017, 612-61710.1109/HUMANOIDS.2017.8246935Search in Google Scholar

Received: 2018-06-06
Accepted: 2018-10-29
Published Online: 2019-01-01

© 2019 Muhammad Burhan Hafez, et al., published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Downloaded on 27.11.2022 from frontend.live.degruyter.dgbricks.com/document/doi/10.1515/pjbr-2019-0005/html
Scroll Up Arrow