Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Paladyn, Journal of Behavioral Robotics

Editor-in-Chief: Schöner, Gregor

Covered by SCOPUS

CiteScore 2018: 2.17

SCImago Journal Rank (SJR) 2018: 0.336
Source Normalized Impact per Paper (SNIP) 2018: 1.707

ICV 2018: 120.52

Open Access
See all formats and pricing
More options …

Deep reinforcement learning using compositional representations for performing instructions

Mohammad Ali Zamani
  • Corresponding author
  • Knowledge Technology, Department of Informatics, University of Hamburg, Vogt-Koelln-Str. 30, Hamburg, Germany
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
/ Sven Magg / Cornelius Weber / Stefan Wermter / Di Fu
  • CAS Key Laboratory of Behavioral Science, Chinese Academy of Sciences, Beijing, China
  • Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
  • Knowledge Technology, Department of Informatics, University of Hamburg, Vogt-Koelln-Str. 30, Hamburg, Germany
  • Email
  • Other articles by this author:
  • De Gruyter OnlineGoogle Scholar
Published Online: 2018-12-06 | DOI: https://doi.org/10.1515/pjbr-2018-0026


Spoken language is one of the most efficientways to instruct robots about performing domestic tasks. However, the state of the environment has to be considered to plan and execute actions successfully. We propose a system that learns to recognise the user’s intention and map it to a goal. A reinforcement learning (RL) system then generates a sequence of actions toward this goal considering the state of the environment. A novel contribution in this paper is the use of symbolic representations for both input and output of a neural Deep Q-network (DQN), which enables it to be used in a hybrid system. To show the effectiveness of our approach, the Tell-Me-Dave corpus is used to train an intention detection model and in a second step an RL agent generates the sequences of actions towards the detected objective, represented by a set of state predicates. We show that the system can successfully recognise command sequences fromthis corpus aswell as train the deep- RL network with symbolic input.We further show that the performance can be significantly increased by exploiting the symbolic representation to generate intermediate rewards.

Keywords: deep reinforcement learning; spoken language instruction


  • [1] S. Schaal, The new robotics - towards human-centered machines, HFSP Journal, 2007, 1(2), 115-126Web of ScienceGoogle Scholar

  • [2] S. Schaal, C. G. Atkeson, Learning control in robotics, IEEE Robotics & Automation Magazine, 2010, 17(2), 20-29CrossrefGoogle Scholar

  • [3] J. Peters, S. Schaal, Learning to control in operational space, The International Journal of Robotics Research, 2008, 27(2), 197-212CrossrefGoogle Scholar

  • [4] S. Lauria, G. Bugmann, T. Kyriacou, E. Klein, Mobile robot programming using natural language, Robotics and Autonomous Systems, 2002, 38(3), 171-181CrossrefGoogle Scholar

  • [5] S. Lauria, G. Bugmann, T. Kyriacou, J. Bos, E. Klein, Converting natural language route instructions into robot executable procedures, In: Proceedings of the 11th IEEE International Workshop on Robot and Human Interactive Communication, IEEE, 2002, 223-228Google Scholar

  • [6] T. Nishizawa, K. Kishita, Y. Takano, Y. Fujita, S. Yuta, Proposed system of unlocking potentially hazardous function of robot based on verbal communication, In: 2011 IEEE/SICE International Symposium on System Integration (SII), IEEE, 2011, 1208-1213Google Scholar

  • [7] W. Hua, Z. Wang, H. Wang, K. Zheng, X. Zhou, Short text understanding through lexical semantic analysis, In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), IEEE, 2015, 495-506Google Scholar

  • [8] A. Abdulkader, A. Lakshmiratan, J. Zhang, Introducing DeepText: Facebook’s text understanding engine, https://code.facebook.com/posts/181565595577955/introducingdeeptext-facebook-s-textunderstanding-engine [Accessed: 2018-01-30]Google Scholar

  • [9] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, Journal of Machine Learning Research, 2011, 12, 2493-2537Google Scholar

  • [10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature, 2015, 521(7553), 436-444Google Scholar

  • [11] I. Sutskever,O. Vinyals, Q. V. Le, Sequence to sequence learning with neural networks, In: NIPS’14 Proceedings of the 27th International Conference on Neural Information Processing Systems, 2014, 2, 3104-3112Google Scholar

  • [12] S. Hochreiter, J. Schmidhuber, long short-term memory, Neural Computation, 1997, 9(8), 1735-1780Google Scholar

  • [13] R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, vol.1, MIT Press Cambridge, 1998Google Scholar

  • [14] A. L. Thomaz, G. Hoffman, C. Breazeal, Real-time interactive reinforcement learning for robots, In: AAAI 2005 Workshop on Human Comprehensible Machine Learning, 2005Google Scholar

  • [15] A. L. Thomaz, C. Breazeal, Teachable robots: understanding human teaching behavior to build more effective robot learners, Artificial Intelligence, 2008, 172(6-7), 716-737Web of ScienceGoogle Scholar

  • [16] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al., Human-level control through deep reinforcement Learning, Nature, 2015, 518(7540), 529-533Web of ScienceGoogle Scholar

  • [17] K. Narasimhan, T. Kulkarni, R. Barzilay, Language understanding for text-based games using deep reinforcement learning, In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, 1-11Google Scholar

  • [18] A. Kumar, T. Oates, Connecting deep neural networks with symbolic knowledge, In: The 2017 International Joint Conference on Neural Networks (IJCNN), May 2017, 3601-3608Google Scholar

  • [19] M. Garnelo, K. Arulkumaran, M. Shanahan, Towards deep symbolic reinforcement learning, arXiv:1609.05518, 2016Google Scholar

  • [20] E. Bastianelli, G. Castellucci, D. Croce, L. Iocchi, R. Basili, D. Nardi, HuRIC: a human robot interaction corpus, In: the Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, 26-31 May, 2014, 4519-4526Google Scholar

  • [21] D. K. Misra, J. Sung, K. Lee, A. Saxena, Tell me Dave: Contextsensitive grounding of natural language to manipulation instructions, The International Journal of Robotics Research, 2016, 35(1-3), 281-300Google Scholar

  • [22] D. K. Misra, K. Tao, P. Liang, A. Saxena, Environment-driven lexicon induction for high-level instructions, In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, July 26-31, 2015, 992-1002Google Scholar

  • [23] D. Rasmussen, A. Voelker, C. Eliasmith, A neural model of hierarchical reinforcement learning, PLOS ONE, 2017, 12(7), 1-39, https://doi.org/10.1371/journal.pone.0180234CrossrefGoogle Scholar

  • [24] E. Kolve, R. Mottaghi, D. Gordon, Y. Zhu, A. Gupta, A. Farhadi, AI2-THOR: An interactive 3D environment for visual AI, arXiv:1712.05474, 2017Google Scholar

  • [25] X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), 2011, 315-323Google Scholar

  • [26] D. Kingma, J. Ba, Adam: a method for stochastic optimization, In: 3rd International Conference for Learning Representations, San Diego, 2015Google Scholar

  • [27] M. Ghallab, A. Howe, C. Knoblock, D. McDermott A. Ram, M. Veloso, et al., PDDL - The Planning Domain Definition Language, Technical Report TR-98-003, Yale Center for Computational Vision and Control, 1998Google Scholar

  • [28] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang,W. Zaremba, OpenAI Gym, arXiv:1606.01540, 2016Google Scholar

  • [29] H. van Hasselt, A. Guez, D. Silver, Deep reinforcement learning with double q-learning, In: AAAI’16 Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, 16, 2094-2100Google Scholar

  • [30] T. Schaul, J. Quan, I. Antonoglou, D. Silver, Prioritized experience replay, In: International Conference on Learning Representations (ICLR), May 2016Google Scholar

  • [31] M. Khamassi, G. Velentzas, T. Tsitsimis, C. Tzafestas, Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task, In: 2017 First IEEE International Conference on Robotic Computing (IRC), April 2017, 28-35, 10.1109/IRC.2017.33Google Scholar

  • [32] J. Pennington, R. Socher, C. D. Manning, GloVe: global vectors for word representation, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014, 1532-1543, ISSN 10495258Google Scholar

  • [33] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, In: NIPS 2014 Workshop on Deep Learning, December 2014.Google Scholar

About the article

Received: 2018-01-31

Accepted: 2018-10-30

Published Online: 2018-12-06

Published in Print: 2018-12-01

Citation Information: Paladyn, Journal of Behavioral Robotics, Volume 9, Issue 1, Pages 358–373, ISSN (Online) 2081-4836, DOI: https://doi.org/10.1515/pjbr-2018-0026.

Export Citation

© by Mohammad Ali Zamani, et al., published by De Gruyter. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in