Jump to ContentJump to Main Navigation
Show Summary Details
More options …

Paladyn, Journal of Behavioral Robotics

Editor-in-Chief: Schöner, Gregor

1 Issue per year


CiteScore 2017: 0.33

SCImago Journal Rank (SJR) 2017: 0.104

Open Access
Online
ISSN
2081-4836
See all formats and pricing
More options …

A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

Nazmul Siddique / Paresh Dhakan / Inaki Rano / Kathryn Merrick
Published Online: 2017-12-07 | DOI: https://doi.org/10.1515/pjbr-2017-0004

Abstract

This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

Keywords: Novelty; intrinsic motivation; reinforcement learning; reward function; habituation; action learning

References

  • [1] M. Markou, S. Singh, Novelty detection: a review part 1: statistical approaches, Signal Processing, 83(12) (2003), 2481-2497Google Scholar

  • [2] S.Marsland, Novelty detection in learning systems, Neural Computing Surveys, 3 (2003), 157-195Google Scholar

  • [3] R. Saunders, J. S. Gero, The importance of being emergent, In: Proc. of the Conference on Artificial Intelligence in Design, 2000Google Scholar

  • [4] V. Chandola, A. Banerjee, V. Kumar, Outlier detection: A survey, Technical report, University of Minnesota, 2007Google Scholar

  • [5] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Computing Surveys, 41(3) (2009), 1-58CrossrefGoogle Scholar

  • [6] V. Hodge, J. Austin, A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2) (2004), 85-126CrossrefGoogle Scholar

  • [7] M. Markou, S. Singh, Novelty detection: a review part 2: neural network based approaches, Signal Processing, 83(12) (2003), 2499-2521Google Scholar

  • [8] S. Marsland, U. Nehmzow, J. Shapiro, Vision-based environmental novelty detection on a mobile robot, In Proc. of the International Conference on Neural Information Processing (ICONIP’01), 2001Google Scholar

  • [9] S. Marsland, U. Nehmzow, J. Shapiro, On-line novelty detection for autonomous mobile robots, Robotics and Autonomous Systems, 51(2-3) (2005), 191-206CrossrefGoogle Scholar

  • [10] H. V. Neto, U. Nehmzow, Automated exploration and inspection: Comparing two visual novelty detectors, International Journal of Advanced Robotic Systems, 2(4) (2005), 355-362Google Scholar

  • [11] H. V. Neto, U. Nehmzow, Incremental PCA: An alternative approach for novelty detection, In: Proc. Towards Autonomous Robotic Systems (TAROS’05), Imperial College, London, 12-14 September, 2005Google Scholar

  • [12] R. M. Ryan, E. L. Deci, Intrinsic and Extrinsic Motivations: Classic Definition and New Direction, Contemporary Educational Psychology, 25 (2000), 54-67CrossrefGoogle Scholar

  • [13] D. E. Berlyne, Conflict, Arousal and Curiosity, McGraw-Hill, New York, 1960Google Scholar

  • [14] J. M. Hunt, Intrinsic Motivation and its Role in Psychological Development, Nebraska Symposium on Motivation, 13 (1965), 189-282Google Scholar

  • [15] E. L. Thorndike, Animal Intelligence, Hafner, Darien, 1911Google Scholar

  • [16] D. E. Berlyne, Curiosity and exploration, Science, 143 (1966), 25-33Google Scholar

  • [17] R. Saunders, J. S. Gero, Curious agents and situated design evaluations, In: J. Gero, F. Brazier (Eds.), Agents in Design 2002, Sydney, Key Centre of Design Computing and Cognition, University of Sydney, (2002), 133-149Google Scholar

  • [18] S. Wehmeier, Oxford Advanced Learner’s Dictionary, edt., Oxford University Press, 7th edition, 2005Google Scholar

  • [19] J. Schmidhuber, Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes, In Anticipatory Behavior in Adaptive Learning Systems, Springer Berlin Heidelberg, (2009), 48-76Google Scholar

  • [20] F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics, 11, (1969), 1-21CrossrefGoogle Scholar

  • [21] E. Del Rosal, L. Alonso, R. Moreno, M. Vazquez, J. Santacreu, Simulation of habituation to simple and multiple stimuli, Behavioural Processes, 73 (2006), 272-277Google Scholar

  • [22] S. Sirois, D. Mareschal, Models of habituation in infancy, TRENDS in Cognitive Science, 6(7) (2002), 293-298Google Scholar

  • [23] P. M. Groves, R. F. Thompson, Habituation: a Dual-process Theory, Psychol. Rev., 77 (1970), 419-450Google Scholar

  • [24] R. F. Thompson, W. A. Spencer, Habituation: A model phenomenon for the study of neuronal substrates of behaviour, Psychological Review, 73(1) (1966), 16-43Google Scholar

  • [25] E. N. Sokolov, Higher nervous functions: The orienting reflex, Annual Review of Physiology, 25(1) (1963), 545-580CrossrefGoogle Scholar

  • [26] J. C. Stanley, Computer simulation of a model of habituation, Nature, 261(5556) (1976),146-148Google Scholar

  • [27] N. K. Innis, J. E. R Staddon,What should comparative psychology compare?, International Journal of Computational Psychology, 2 (1989), 145-156Google Scholar

  • [28] D. L. Wang, A neural model of synaptic plasticity underlying short-term and long-term habituation, Adaptive Behavior, 2 (1994), 111-129Google Scholar

  • [29] D. Wang, M. A. Arbib, Modeling the dishabituation hierarchy: The role of the primordial hippocampus, Biological Cybernetics, 67 (1992), 535-544Google Scholar

  • [30] D. Wang, C. Hsu, SLONN: A simulation language for modeling of neural networks, Simulation, 55 (1990), 69-83CrossrefGoogle Scholar

  • [31] M. A. Hunter, et al., Effects of stimulus complexity and familiarization time on infant preferences for novel and familiar stimuli, Dev. Psychol., 19 (1983), 338-352Google Scholar

  • [32] M. Velasco, F. Velasco, J. Machado, A. Olvera, Effects of Novelty, Habituation, Attention and Distraction on the Amplitude of the Various Components of the Somatic Evoked Responses, International Journal of Neuroscience, 5(3) (1973), 101-111CrossrefGoogle Scholar

  • [33] C. Hutt, Degrees of Novelty and Their Effects on Children’s Attention and Preference, British Journal of Psychology, 66(4) (1975), 487-492CrossrefGoogle Scholar

  • [34] J. F. Ferreira, J. Davis, Attentional Mechanisms for Socially Interactive Robots - A Survey, IEEE Transaction on Autonomous Mental Development, 6(2) (2014), 110-125Google Scholar

  • [35] H. V. Neto, U. Nehmzow, Real-time automated visual inspection using mobile robots, Journal of Intelligent and Robotic Systems, 49(3) (2007), 293-307Google Scholar

  • [36] H. V. Neto, U. Nehmzow, Visual novelty detection with automatic scale selection, Robotics and Autonomous Systems, 55(9) (2007), 693-701CrossrefGoogle Scholar

  • [37] P.-Y. Oudeyer, F. Kaplan, What is intrinsic motivation? A typology of computational approaches, Frontiers in Neurorobotics, 1(6) (2007), 1-14Google Scholar

  • [38] C. L. Hull, Principle of Behavior: An Introduction to Behavior Theory, Appleton-century-Croft, New York, 1943Google Scholar

  • [39] R. W. White, Motivation reconsidered: The concept of competence, Psychological Review, 66 (1959), 297-333Google Scholar

  • [40] H. Harlow, Learning and Satiation of Response in Intrinsically Motivated Complex Puzzle Performance by Monkeys, Journal of Comp. Physiol. Psychology, 43 (1950), 289-294CrossrefGoogle Scholar

  • [41] L. Festinger, A Theory of Cognitive Dissonance, Stanford University Press, Stanford, 1957Google Scholar

  • [42] J. Kagan, Motives and development, J. Pers. Soc. Psychol., 22 (1972), 51-66CrossrefGoogle Scholar

  • [43] R. De Charms, Personal Causation: The Internal Affective Determinants of Behavior Academic, New York, 1968Google Scholar

  • [44] E. L. Deci, R. M. Ryan, Intrinsic Motivation and Selfdetermination in Human Behavior, Plenum, New York, 1985Google Scholar

  • [45] P. Redgrave, K. Gurney, The short-latency dopamine signal: a role in discovering novel actions, Nature Review, Neuroscience, 7 (2006), 967-975CrossrefGoogle Scholar

  • [46] G. Baldassarre, What are Intrinsic Motivations? A Biological Perspective, International Conference on Developmental Learning (ICDL-2011), 2011Google Scholar

  • [47] M. Mirolli, G. Baldassarre, Functions and mechanisms of intrinsic motivations: The knowledge vs. competence distinction, Chapter of book: G. Baldassarre, M. Mirolli (Eds.), Intrinsically Motivated Learning in Natural and Artificial Systems, (2012), 49-72Google Scholar

  • [48] E. Deci, Intrinsic Motivation, Plenum Press, New York, 1975Google Scholar

  • [49] F. Kaplan, P.-Y. Oudeyer, In search of the neural circuits of intrinsic motivation, Frontiers in Neuroscience, 1 (2007), 225-236Google Scholar

  • [50] M. Schembri, M. Mirolli, G. Baldassarre, Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot, In Y. Demiris, B. Scassellati, D. Mareschal (Eds.), The 6th IEEE International Conference on Development and Learning (ICDL2007), (2007), 282-287Google Scholar

  • [51] J. Schmidhuber, Formal Theory of Creativity & Intrinsic Motivation (1990-2010), IEEE Transaction on Autonomous Mental Development, (2010), 1-42Google Scholar

  • [52] J. Schmidhuber, Adaptive Confidence and Adaptive Curiosity, Technische Universitat Munchen, Technical Report FKI-149-91, 1991Google Scholar

  • [53] D. Hebb, Drives and the conceptual nervous system, Psychology Review, 62 (1955),243-254Google Scholar

  • [54] W. Dember, R. Earl, Analysis of exploratory, manipulatory and curiosity behaviors, Psychology Review, 64 (1957), 1-96Google Scholar

  • [55] M. Csikszentmihalyi, Flow: The Psychology of Optimal Experience, Harper Perennial, New York, 1991Google Scholar

  • [56] F. Kaplan, P.-Y. Oudeyer, Intrinsically Motivated Machines, M. Lungarella et al. (Eds.): 50 Years of AI, Festschrift, LNAI 4850, Springer-Verlag Berlin Heidelberg, (2007), 303-314Google Scholar

  • [57] A. G. Barto, O. Simsek, Intrinsically Motivation for Reinforcement Learning Systems, In: Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems, Yale University, New Haven, CT, (2005), 113-118Google Scholar

  • [58] A. Bonarini, A. Lazaric, M. Restelli, Self-development frame work for reinforcement learning agents, Proceedings of the Fifth International Conference on Development and Learning, Bloomington, IN, USA, 2006Google Scholar

  • [59] X. Huang, J. Weng, Novelty and Reinforcement Learning in the Value System of Developmental Robots, in C. G. Prince, Y. Demiris, Y. Marom, H. Kozima, C. Balkenius, (Eds.), Proceedings of the Second International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems 94, Edinburgh, Scotland, (2002), 47-55Google Scholar

  • [60] F. Kaplan, P.-Y. Oudeyer, Motivational principles for visual knowhowdevelopment, In: Proceedings of the 3rd International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, Vol. 101, C. Prince, L. Berthouze, H. Kozima, D. Bullock, G. Stojanov, C. Balkenius (Eds.), Boston, USA, (2003) 73-80Google Scholar

  • [61] F. Kaplan, P.-Y. Oudeyer, The progress-drive hypothesis: an interpretation of early imitation, In Models and Mechanisms of Imitation and Social Learning: Behavioural, Social and Communication Dimensions, C. Nehaniv, K. Dautenhahn (Eds.), New York, Cambridge University Press, (2007), 361-377Google Scholar

  • [62] J. Marshall, D. Blank, L. Meeden, An emergent framework for self-motivation in developmental robotics, In Proceedings of the Third International Conference on Development and Learning (ICDL 2004), La Jolla, CA, (2004), 104-111Google Scholar

  • [63] K. Merrick, M. L. Maher, Motivated learning from interesting events: Adaptive, multitask learning agents for complex environments, Adaptive Behavior, 17(1) (2009), 7-27Google Scholar

  • [64] P.-Y. Oudeyer, F. Kaplan, V. V. Hafner, A. Whyte, The playground experiment: task-independent development of a curious robot. In Proceedings of the AAAI Spring Symposiumon Developmental Robotics, 2005, D. Bank, L. Meeden (Eds.) (Stanford, AAAI), (2005), 42-47Google Scholar

  • [65] P.-Y. Oudeyer, F. Kaplan, V. V. Hafner, Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Trans. Evolutionary Computation, 11(2) (2007), 265-286CrossrefGoogle Scholar

  • [66] J. Schmidhuber, Curious Model-Building Control Systems, International Joint Conference on Neural Networks, 1991Google Scholar

  • [67] P.-Y. Oudeyer, F. Kaplan, Discovering communication, Connection Science, 18(2) (2006), 189-206CrossrefGoogle Scholar

  • [68] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998Google Scholar

  • [69] V. Fedorov, Theory of Optimal Experiment, Academic Press, New York, NY, 1972Google Scholar

  • [70] X. Huang, J. Weng, Motivational system for human-robot interaction, In: Proceedings of the ECCV International Workshop on Human-Computer Interaction, Prague, 2004Google Scholar

  • [71] N. Roy, A. McCallum, Towards optimal active learning through sampling estimation of error reduction, In Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, Morgan Kaufmann Publishers Inc., 2001Google Scholar

  • [72] J. Gottlieb, P.-Y. Oudeyer, M. Lopes, A. Baranes, Informationseeking, curiosity, and attention: computational and neural mechanisms, Trends in Cognitive Sciences, 17(11) (2013), 585- 596CrossrefGoogle Scholar

  • [73] C. Kidd, B. Y. Hayden, The psychology and neuroscience of curiosity, Neuron, 88(3) (2015), 449-460 Google Scholar

  • [74] G. Lowenstein, The Psychology of Curiosity: A Review and Reinterpretation, Psychological Bulletin, 116 (1994), 75-98Google Scholar

  • [75] L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: A Survey, Journal of Artificial Intelligence Research, 4 (1996), 237-285CrossrefGoogle Scholar

  • [76] S. Singh, R. L. Lewis, A. G. Barto, J. Sorg, Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective, IEEE Transactions on Autonomous Mental Development, 2(2) (2010), 70-82Google Scholar

  • [77] W. Schultz,Multiple Reward Signals in the Brain, Nature Review: Neuroscience, 1 (2000), 199-207Google Scholar

  • [78] J. Schmidhuber, S. Heil, Sequential neural text compression, IEEE Transactions on Neural Networks, 7(1) (1996), 142-146CrossrefGoogle Scholar

  • [79] R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957Google Scholar

  • [80] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987Google Scholar

  • [81] R. A. Howard, Dynamic Programming and Markov Processes, The MIT Press, Cambridge, MA, 1960Google Scholar

  • [82] M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, NY, 1994Google Scholar

  • [83] A. G. Barto, R. S. Sutton, C. W. Anderson, Neuronlike adaptive elements that can solve diflcult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, 13(5) (1983), 834-846CrossrefGoogle Scholar

  • [84] R. S. Sutton, Learning to predict by the method of temporal differences. Machine Learning, 3(1) (1988), 9-44CrossrefGoogle Scholar

  • [85] C. J. C. H.Watkins, Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK, 1989Google Scholar

  • [86] C. J. C. H. Watkins, P. Dayan, Q-learning, Machine Learning, 8(3) (1992), 279-292CrossrefGoogle Scholar

  • [87] P. I. Pavlov, Conditioned Reflexes, Oxford University Press, 1927Google Scholar

  • [88] M. G. Baxter, E. A. Murray, The Amygdala and Reward, Nature reviews: Neuroscience, 3 (2002), 563-573CrossrefGoogle Scholar

  • [89] W. Schultz, P. Dayan, P. R. Montague, A Neural Substrate of Prediction and Reward, Science, 275 (1997), 1593-1599Google Scholar

  • [90] W. Schultz, Getting Formal with Dopamine and Reward, Neuron, 36 (2002), 241-263CrossrefGoogle Scholar

  • [91] P. N. Tobler, J. P. O’Doherty, R. J. Dolan,W. Schultz, Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm, Journal of Neurophysiology, 95 (2006), 301-310Google Scholar

  • [92] P.Waelti, A. Dickinson,W. Schultz, Dopamine responses comply with basic assumptions of formal learning theory, Nature, 412 (2001), 43-48Google Scholar

  • [93] W. Schultz, Predictive Reward Signal of Dopamine Neurons, Journal of Neurophysiology, 80 (1998), 1-27CrossrefGoogle Scholar

  • [94] B. Brembs, F. D. Lorenzetti, F. D., Reyes, D. A. Baxter, J. H. Byrne, Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms, Science, 296 (2002), 1706-1709Google Scholar

  • [95] C. D. Fiorillo, P. N. Tobler, W. Schultz, Discrete Coding of reward Probability and Uncertainty by Dopamine Neurons, Science, 299 (2003), 1898-1902Google Scholar

  • [96] J. R. Hollerman, W. Schultz, Dopamine Neurons Report an Error in the Temporal Prediction of Reward during Learning, Nature Neuroscience, 1(4) (1998), 304-309Google Scholar

  • [97] E. S. Bromberg-Martin, O. Hikosaka, Lateral habenula neurons signal errors in the prediction of reward information, Nature Neuroscience, 14(9) (2011), 1209-1216CrossrefGoogle Scholar

  • [98] K. Ligaya, G. W. Story, Z. Kurth-Nelson, R. J. Dolan, P. Dayan, The Modulation of Savouring by Prediction Error and its Effects on Choice, eLIFE, 5, e13747, 2016Google Scholar

  • [99] D. M. Wolpert, J. Diedrichsen, J. R. Flanagan, Principles of sensorimotor learning, Nature reviews. Neuroscience, 12 (2011), 739-751CrossrefGoogle Scholar

  • [100] D. M. Wolpert, J. R. Flanagan, Computations underlying sensorimotor learning, Current Opinion in Neurobiology, 37 (2016), 7-11Google Scholar

  • [101] H. Lalazar, E. Vaadia, Neural basis of sensorimotor learning: modifying internal models, Current Opinion in Neurobiology, 18 (2008), 1-9CrossrefGoogle Scholar

  • [102] P. Lanillos, E. Dean-Leon, G. Cheng, Yielding Self-Perception in Robots Through Sensorimotor Contingencies, IEEE Transactions on Cognitive and Developmental Systems, 9(2) (2017), 100-112Google Scholar

  • [103] A. Y. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, In Proceedings of the Sixteenth International Conference on Machine Learning, Morgan Kaufmann, (1999), 278-287Google Scholar

  • [104] V. Gullapalli, Reinforcement Learning and Its Application to Control, PhD Thesis, University of Massachusetts, COINS Technical Report 92-10, 1992Google Scholar

  • [105] M. J. Mataric, Reward functions for accelerated learning, In: W. W. Cohen, H. Hirsh (Eds.), Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, CA, 1994Google Scholar

  • [106] J. Randløv, P. Alstrøm, Learning to drive a bicycle using reinforcement learning and shaping, In: Proceedings of the Fifteenth International Conference onMachine Learning, Morgan Kaufmann, (1998), 463-471Google Scholar

  • [107] N. Daddaoua, M. Lopes, J. Gottlieb, Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates, Nature Scientific Reports, 6(20202) (2016)Google Scholar

About the article

Received: 2016-05-03

Accepted: 2017-10-04

Published Online: 2017-12-07

Published in Print: 2017-11-27


Citation Information: Paladyn, Journal of Behavioral Robotics, Volume 8, Issue 1, Pages 58–69, ISSN (Online) 2081-4836, DOI: https://doi.org/10.1515/pjbr-2017-0004.

Export Citation

© 2017 Nazmul Siddique et al. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. BY-NC-ND 4.0

Comments (0)

Please log in or register to comment.
Log in