Skip to content
BY-NC-ND 4.0 license Open Access Published by De Gruyter Open Access December 7, 2017

A Review of the Relationship between Novelty, Intrinsic Motivation and Reinforcement Learning

  • Nazmul Siddique EMAIL logo , Paresh Dhakan , Inaki Rano and Kathryn Merrick

Abstract

This paper presents a review on the tri-partite relationship between novelty, intrinsic motivation and reinforcement learning. The paper first presents a literature survey on novelty and the different computational models of novelty detection, with a specific focus on the features of stimuli that trigger a Hedonic value for generating a novelty signal. It then presents an overview of intrinsic motivation and investigations into different models with the aim of exploring deeper co-relationships between specific features of a novelty signal and its effect on intrinsic motivation in producing a reward function. Finally, it presents survey results on reinforcement learning, different models and their functional relationship with intrinsic motivation.

References

[1] M. Markou, S. Singh, Novelty detection: a review part 1: statistical approaches, Signal Processing, 83(12) (2003), 2481-249710.1016/j.sigpro.2003.07.018Search in Google Scholar

[2] S.Marsland, Novelty detection in learning systems, Neural Computing Surveys, 3 (2003), 157-195Search in Google Scholar

[3] R. Saunders, J. S. Gero, The importance of being emergent, In: Proc. of the Conference on Artificial Intelligence in Design, 2000Search in Google Scholar

[4] V. Chandola, A. Banerjee, V. Kumar, Outlier detection: A survey, Technical report, University of Minnesota, 2007Search in Google Scholar

[5] V. Chandola, A. Banerjee, V. Kumar, Anomaly detection: A survey, ACM Computing Surveys, 41(3) (2009), 1-5810.1145/1541880.1541882Search in Google Scholar

[6] V. Hodge, J. Austin, A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2) (2004), 85-12610.1023/B:AIRE.0000045502.10941.a9Search in Google Scholar

[7] M. Markou, S. Singh, Novelty detection: a review part 2: neural network based approaches, Signal Processing, 83(12) (2003), 2499-252110.1016/j.sigpro.2003.07.019Search in Google Scholar

[8] S. Marsland, U. Nehmzow, J. Shapiro, Vision-based environmental novelty detection on a mobile robot, In Proc. of the International Conference on Neural Information Processing (ICONIP’01), 2001Search in Google Scholar

[9] S. Marsland, U. Nehmzow, J. Shapiro, On-line novelty detection for autonomous mobile robots, Robotics and Autonomous Systems, 51(2-3) (2005), 191-20610.1016/j.robot.2004.10.006Search in Google Scholar

[10] H. V. Neto, U. Nehmzow, Automated exploration and inspection: Comparing two visual novelty detectors, International Journal of Advanced Robotic Systems, 2(4) (2005), 355-36210.5772/5770Search in Google Scholar

[11] H. V. Neto, U. Nehmzow, Incremental PCA: An alternative approach for novelty detection, In: Proc. Towards Autonomous Robotic Systems (TAROS’05), Imperial College, London, 12-14 September, 2005Search in Google Scholar

[12] R. M. Ryan, E. L. Deci, Intrinsic and Extrinsic Motivations: Classic Definition and New Direction, Contemporary Educational Psychology, 25 (2000), 54-6710.1006/ceps.1999.1020Search in Google Scholar PubMed

[13] D. E. Berlyne, Conflict, Arousal and Curiosity, McGraw-Hill, New York, 196010.1037/11164-000Search in Google Scholar

[14] J. M. Hunt, Intrinsic Motivation and its Role in Psychological Development, Nebraska Symposium on Motivation, 13 (1965), 189-282Search in Google Scholar

[15] E. L. Thorndike, Animal Intelligence, Hafner, Darien, 1911Search in Google Scholar

[16] D. E. Berlyne, Curiosity and exploration, Science, 143 (1966), 25-3310.1126/science.153.3731.25Search in Google Scholar PubMed

[17] R. Saunders, J. S. Gero, Curious agents and situated design evaluations, In: J. Gero, F. Brazier (Eds.), Agents in Design 2002, Sydney, Key Centre of Design Computing and Cognition, University of Sydney, (2002), 133-149Search in Google Scholar

[18] S. Wehmeier, Oxford Advanced Learner’s Dictionary, edt., Oxford University Press, 7th edition, 2005Search in Google Scholar

[19] J. Schmidhuber, Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes, In Anticipatory Behavior in Adaptive Learning Systems, Springer Berlin Heidelberg, (2009), 48-76Search in Google Scholar

[20] F. E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics, 11, (1969), 1-2110.1080/00401706.1969.10490657Search in Google Scholar

[21] E. Del Rosal, L. Alonso, R. Moreno, M. Vazquez, J. Santacreu, Simulation of habituation to simple and multiple stimuli, Behavioural Processes, 73 (2006), 272-27710.1016/j.beproc.2006.06.007Search in Google Scholar

[22] S. Sirois, D. Mareschal, Models of habituation in infancy, TRENDS in Cognitive Science, 6(7) (2002), 293-29810.1016/S1364-6613(02)01926-5Search in Google Scholar

[23] P. M. Groves, R. F. Thompson, Habituation: a Dual-process Theory, Psychol. Rev., 77 (1970), 419-45010.1037/h0029810Search in Google Scholar PubMed

[24] R. F. Thompson, W. A. Spencer, Habituation: A model phenomenon for the study of neuronal substrates of behaviour, Psychological Review, 73(1) (1966), 16-4310.1037/h0022681Search in Google Scholar PubMed

[25] E. N. Sokolov, Higher nervous functions: The orienting reflex, Annual Review of Physiology, 25(1) (1963), 545-58010.1146/annurev.ph.25.030163.002553Search in Google Scholar PubMed

[26] J. C. Stanley, Computer simulation of a model of habituation, Nature, 261(5556) (1976),146-14810.1038/261146a0Search in Google Scholar PubMed

[27] N. K. Innis, J. E. R Staddon,What should comparative psychology compare?, International Journal of Computational Psychology, 2 (1989), 145-156Search in Google Scholar

[28] D. L. Wang, A neural model of synaptic plasticity underlying short-term and long-term habituation, Adaptive Behavior, 2 (1994), 111-12910.1177/105971239300200201Search in Google Scholar

[29] D. Wang, M. A. Arbib, Modeling the dishabituation hierarchy: The role of the primordial hippocampus, Biological Cybernetics, 67 (1992), 535-54410.1007/BF00198760Search in Google Scholar PubMed

[30] D. Wang, C. Hsu, SLONN: A simulation language for modeling of neural networks, Simulation, 55 (1990), 69-8310.1177/003754979005500203Search in Google Scholar

[31] M. A. Hunter, et al., Effects of stimulus complexity and familiarization time on infant preferences for novel and familiar stimuli, Dev. Psychol., 19 (1983), 338-35210.1037/0012-1649.19.3.338Search in Google Scholar

[32] M. Velasco, F. Velasco, J. Machado, A. Olvera, Effects of Novelty, Habituation, Attention and Distraction on the Amplitude of the Various Components of the Somatic Evoked Responses, International Journal of Neuroscience, 5(3) (1973), 101-11110.3109/00207457309149461Search in Google Scholar PubMed

[33] C. Hutt, Degrees of Novelty and Their Effects on Children’s Attention and Preference, British Journal of Psychology, 66(4) (1975), 487-49210.1111/j.2044-8295.1975.tb01484.xSearch in Google Scholar

[34] J. F. Ferreira, J. Davis, Attentional Mechanisms for Socially Interactive Robots - A Survey, IEEE Transaction on Autonomous Mental Development, 6(2) (2014), 110-12510.1109/TAMD.2014.2303072Search in Google Scholar

[35] H. V. Neto, U. Nehmzow, Real-time automated visual inspection using mobile robots, Journal of Intelligent and Robotic Systems, 49(3) (2007), 293-30710.1007/s10846-007-9146-9Search in Google Scholar

[36] H. V. Neto, U. Nehmzow, Visual novelty detection with automatic scale selection, Robotics and Autonomous Systems, 55(9) (2007), 693-70110.1016/j.robot.2007.05.012Search in Google Scholar

[37] P.-Y. Oudeyer, F. Kaplan, What is intrinsic motivation? A typology of computational approaches, Frontiers in Neurorobotics, 1(6) (2007), 1-1410.3389/neuro.12.006.2007Search in Google Scholar PubMed PubMed Central

[38] C. L. Hull, Principle of Behavior: An Introduction to Behavior Theory, Appleton-century-Croft, New York, 1943Search in Google Scholar

[39] R. W. White, Motivation reconsidered: The concept of competence, Psychological Review, 66 (1959), 297-33310.1037/h0040934Search in Google Scholar PubMed

[40] H. Harlow, Learning and Satiation of Response in Intrinsically Motivated Complex Puzzle Performance by Monkeys, Journal of Comp. Physiol. Psychology, 43 (1950), 289-29410.1037/h0058114Search in Google Scholar PubMed

[41] L. Festinger, A Theory of Cognitive Dissonance, Stanford University Press, Stanford, 195710.1515/9781503620766Search in Google Scholar

[42] J. Kagan, Motives and development, J. Pers. Soc. Psychol., 22 (1972), 51-6610.1037/h0032356Search in Google Scholar PubMed

[43] R. De Charms, Personal Causation: The Internal Affective Determinants of Behavior Academic, New York, 1968Search in Google Scholar

[44] E. L. Deci, R. M. Ryan, Intrinsic Motivation and Selfdetermination in Human Behavior, Plenum, New York, 198510.1007/978-1-4899-2271-7Search in Google Scholar

[45] P. Redgrave, K. Gurney, The short-latency dopamine signal: a role in discovering novel actions, Nature Review, Neuroscience, 7 (2006), 967-97510.1038/nrn2022Search in Google Scholar PubMed

[46] G. Baldassarre, What are Intrinsic Motivations? A Biological Perspective, International Conference on Developmental Learning (ICDL-2011), 201110.1109/DEVLRN.2011.6037367Search in Google Scholar

[47] M. Mirolli, G. Baldassarre, Functions and mechanisms of intrinsic motivations: The knowledge vs. competence distinction, Chapter of book: G. Baldassarre, M. Mirolli (Eds.), Intrinsically Motivated Learning in Natural and Artificial Systems, (2012), 49-72Search in Google Scholar

[48] E. Deci, Intrinsic Motivation, Plenum Press, New York, 197510.1007/978-1-4613-4446-9Search in Google Scholar

[49] F. Kaplan, P.-Y. Oudeyer, In search of the neural circuits of intrinsic motivation, Frontiers in Neuroscience, 1 (2007), 225-23610.3389/neuro.01.1.1.017.2007Search in Google Scholar PubMed PubMed Central

[50] M. Schembri, M. Mirolli, G. Baldassarre, Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot, In Y. Demiris, B. Scassellati, D. Mareschal (Eds.), The 6th IEEE International Conference on Development and Learning (ICDL2007), (2007), 282-28710.1109/DEVLRN.2007.4354052Search in Google Scholar

[51] J. Schmidhuber, Formal Theory of Creativity & Intrinsic Motivation (1990-2010), IEEE Transaction on Autonomous Mental Development, (2010), 1-4210.1109/TAMD.2010.2056368Search in Google Scholar

[52] J. Schmidhuber, Adaptive Confidence and Adaptive Curiosity, Technische Universitat Munchen, Technical Report FKI-149-91, 1991Search in Google Scholar

[53] D. Hebb, Drives and the conceptual nervous system, Psychology Review, 62 (1955),243-25410.1037/h0041823Search in Google Scholar PubMed

[54] W. Dember, R. Earl, Analysis of exploratory, manipulatory and curiosity behaviors, Psychology Review, 64 (1957), 1-9610.1037/h0046861Search in Google Scholar PubMed

[55] M. Csikszentmihalyi, Flow: The Psychology of Optimal Experience, Harper Perennial, New York, 1991Search in Google Scholar

[56] F. Kaplan, P.-Y. Oudeyer, Intrinsically Motivated Machines, M. Lungarella et al. (Eds.): 50 Years of AI, Festschrift, LNAI 4850, Springer-Verlag Berlin Heidelberg, (2007), 303-314Search in Google Scholar

[57] A. G. Barto, O. Simsek, Intrinsically Motivation for Reinforcement Learning Systems, In: Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems, Yale University, New Haven, CT, (2005), 113-118Search in Google Scholar

[58] A. Bonarini, A. Lazaric, M. Restelli, Self-development frame work for reinforcement learning agents, Proceedings of the Fifth International Conference on Development and Learning, Bloomington, IN, USA, 2006Search in Google Scholar

[59] X. Huang, J. Weng, Novelty and Reinforcement Learning in the Value System of Developmental Robots, in C. G. Prince, Y. Demiris, Y. Marom, H. Kozima, C. Balkenius, (Eds.), Proceedings of the Second International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems 94, Edinburgh, Scotland, (2002), 47-55Search in Google Scholar

[60] F. Kaplan, P.-Y. Oudeyer, Motivational principles for visual knowhowdevelopment, In: Proceedings of the 3rd International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, Vol. 101, C. Prince, L. Berthouze, H. Kozima, D. Bullock, G. Stojanov, C. Balkenius (Eds.), Boston, USA, (2003) 73-80Search in Google Scholar

[61] F. Kaplan, P.-Y. Oudeyer, The progress-drive hypothesis: an interpretation of early imitation, In Models and Mechanisms of Imitation and Social Learning: Behavioural, Social and Communication Dimensions, C. Nehaniv, K. Dautenhahn (Eds.), New York, Cambridge University Press, (2007), 361-377Search in Google Scholar

[62] J. Marshall, D. Blank, L. Meeden, An emergent framework for self-motivation in developmental robotics, In Proceedings of the Third International Conference on Development and Learning (ICDL 2004), La Jolla, CA, (2004), 104-111Search in Google Scholar

[63] K. Merrick, M. L. Maher, Motivated learning from interesting events: Adaptive, multitask learning agents for complex environments, Adaptive Behavior, 17(1) (2009), 7-2710.1177/1059712308100236Search in Google Scholar

[64] P.-Y. Oudeyer, F. Kaplan, V. V. Hafner, A. Whyte, The playground experiment: task-independent development of a curious robot. In Proceedings of the AAAI Spring Symposiumon Developmental Robotics, 2005, D. Bank, L. Meeden (Eds.) (Stanford, AAAI), (2005), 42-47Search in Google Scholar

[65] P.-Y. Oudeyer, F. Kaplan, V. V. Hafner, Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Trans. Evolutionary Computation, 11(2) (2007), 265-28610.1109/TEVC.2006.890271Search in Google Scholar

[66] J. Schmidhuber, Curious Model-Building Control Systems, International Joint Conference on Neural Networks, 199110.1109/IJCNN.1991.170605Search in Google Scholar

[67] P.-Y. Oudeyer, F. Kaplan, Discovering communication, Connection Science, 18(2) (2006), 189-20610.1080/09540090600768567Search in Google Scholar

[68] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 199810.1109/TNN.1998.712192Search in Google Scholar

[69] V. Fedorov, Theory of Optimal Experiment, Academic Press, New York, NY, 1972Search in Google Scholar

[70] X. Huang, J. Weng, Motivational system for human-robot interaction, In: Proceedings of the ECCV International Workshop on Human-Computer Interaction, Prague, 200410.1007/978-3-540-24837-8_3Search in Google Scholar

[71] N. Roy, A. McCallum, Towards optimal active learning through sampling estimation of error reduction, In Proceedings of the 18th International Conference on Machine Learning, Williamstown, MA, USA, Morgan Kaufmann Publishers Inc., 2001Search in Google Scholar

[72] J. Gottlieb, P.-Y. Oudeyer, M. Lopes, A. Baranes, Informationseeking, curiosity, and attention: computational and neural mechanisms, Trends in Cognitive Sciences, 17(11) (2013), 585- 59610.1016/j.tics.2013.09.001Search in Google Scholar PubMed PubMed Central

[73] C. Kidd, B. Y. Hayden, The psychology and neuroscience of curiosity, Neuron, 88(3) (2015), 449-460 10.1016/j.neuron.2015.09.010Search in Google Scholar PubMed PubMed Central

[74] G. Lowenstein, The Psychology of Curiosity: A Review and Reinterpretation, Psychological Bulletin, 116 (1994), 75-9810.1037/0033-2909.116.1.75Search in Google Scholar

[75] L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: A Survey, Journal of Artificial Intelligence Research, 4 (1996), 237-28510.1613/jair.301Search in Google Scholar

[76] S. Singh, R. L. Lewis, A. G. Barto, J. Sorg, Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective, IEEE Transactions on Autonomous Mental Development, 2(2) (2010), 70-8210.1109/TAMD.2010.2051031Search in Google Scholar

[77] W. Schultz,Multiple Reward Signals in the Brain, Nature Review: Neuroscience, 1 (2000), 199-20710.1038/35044563Search in Google Scholar PubMed

[78] J. Schmidhuber, S. Heil, Sequential neural text compression, IEEE Transactions on Neural Networks, 7(1) (1996), 142-14610.1109/72.478398Search in Google Scholar

[79] R. Bellman, Dynamic Programming, Princeton University Press, Princeton, NJ, 1957Search in Google Scholar

[80] D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987Search in Google Scholar

[81] R. A. Howard, Dynamic Programming and Markov Processes, The MIT Press, Cambridge, MA, 1960Search in Google Scholar

[82] M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, Inc., New York, NY, 199410.1002/9780470316887Search in Google Scholar

[83] A. G. Barto, R. S. Sutton, C. W. Anderson, Neuronlike adaptive elements that can solve diflcult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, 13(5) (1983), 834-84610.1109/TSMC.1983.6313077Search in Google Scholar

[84] R. S. Sutton, Learning to predict by the method of temporal differences. Machine Learning, 3(1) (1988), 9-4410.1007/BF00115009Search in Google Scholar

[85] C. J. C. H.Watkins, Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge, UK, 1989Search in Google Scholar

[86] C. J. C. H. Watkins, P. Dayan, Q-learning, Machine Learning, 8(3) (1992), 279-29210.1023/A:1022676722315Search in Google Scholar

[87] P. I. Pavlov, Conditioned Reflexes, Oxford University Press, 1927Search in Google Scholar

[88] M. G. Baxter, E. A. Murray, The Amygdala and Reward, Nature reviews: Neuroscience, 3 (2002), 563-57310.1038/nrn875Search in Google Scholar

[89] W. Schultz, P. Dayan, P. R. Montague, A Neural Substrate of Prediction and Reward, Science, 275 (1997), 1593-159910.1126/science.275.5306.1593Search in Google Scholar

[90] W. Schultz, Getting Formal with Dopamine and Reward, Neuron, 36 (2002), 241-26310.1016/S0896-6273(02)00967-4Search in Google Scholar

[91] P. N. Tobler, J. P. O’Doherty, R. J. Dolan,W. Schultz, Human Neural Learning Depends on Reward Prediction Errors in the Blocking Paradigm, Journal of Neurophysiology, 95 (2006), 301-31010.1152/jn.00762.2005Search in Google Scholar PubMed PubMed Central

[92] P.Waelti, A. Dickinson,W. Schultz, Dopamine responses comply with basic assumptions of formal learning theory, Nature, 412 (2001), 43-4810.1038/35083500Search in Google Scholar PubMed

[93] W. Schultz, Predictive Reward Signal of Dopamine Neurons, Journal of Neurophysiology, 80 (1998), 1-2710.1152/jn.1998.80.1.1Search in Google Scholar PubMed

[94] B. Brembs, F. D. Lorenzetti, F. D., Reyes, D. A. Baxter, J. H. Byrne, Operant Reward Learning in Aplysia: Neuronal Correlates and Mechanisms, Science, 296 (2002), 1706-170910.1126/science.1069434Search in Google Scholar PubMed

[95] C. D. Fiorillo, P. N. Tobler, W. Schultz, Discrete Coding of reward Probability and Uncertainty by Dopamine Neurons, Science, 299 (2003), 1898-190210.1126/science.1077349Search in Google Scholar PubMed

[96] J. R. Hollerman, W. Schultz, Dopamine Neurons Report an Error in the Temporal Prediction of Reward during Learning, Nature Neuroscience, 1(4) (1998), 304-30910.1038/1124Search in Google Scholar PubMed

[97] E. S. Bromberg-Martin, O. Hikosaka, Lateral habenula neurons signal errors in the prediction of reward information, Nature Neuroscience, 14(9) (2011), 1209-121610.1038/nn.2902Search in Google Scholar PubMed PubMed Central

[98] K. Ligaya, G. W. Story, Z. Kurth-Nelson, R. J. Dolan, P. Dayan, The Modulation of Savouring by Prediction Error and its Effects on Choice, eLIFE, 5, e13747, 201610.7554/eLife.13747Search in Google Scholar PubMed PubMed Central

[99] D. M. Wolpert, J. Diedrichsen, J. R. Flanagan, Principles of sensorimotor learning, Nature reviews. Neuroscience, 12 (2011), 739-75110.1038/nrn3112Search in Google Scholar PubMed

[100] D. M. Wolpert, J. R. Flanagan, Computations underlying sensorimotor learning, Current Opinion in Neurobiology, 37 (2016), 7-1110.1016/j.conb.2015.12.003Search in Google Scholar PubMed PubMed Central

[101] H. Lalazar, E. Vaadia, Neural basis of sensorimotor learning: modifying internal models, Current Opinion in Neurobiology, 18 (2008), 1-910.1016/j.conb.2008.11.003Search in Google Scholar PubMed

[102] P. Lanillos, E. Dean-Leon, G. Cheng, Yielding Self-Perception in Robots Through Sensorimotor Contingencies, IEEE Transactions on Cognitive and Developmental Systems, 9(2) (2017), 100-11210.1109/TCDS.2016.2627820Search in Google Scholar

[103] A. Y. Ng, D. Harada, S. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, In Proceedings of the Sixteenth International Conference on Machine Learning, Morgan Kaufmann, (1999), 278-287Search in Google Scholar

[104] V. Gullapalli, Reinforcement Learning and Its Application to Control, PhD Thesis, University of Massachusetts, COINS Technical Report 92-10, 1992Search in Google Scholar

[105] M. J. Mataric, Reward functions for accelerated learning, In: W. W. Cohen, H. Hirsh (Eds.), Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, CA, 1994Search in Google Scholar

[106] J. Randløv, P. Alstrøm, Learning to drive a bicycle using reinforcement learning and shaping, In: Proceedings of the Fifteenth International Conference onMachine Learning, Morgan Kaufmann, (1998), 463-471Search in Google Scholar

[107] N. Daddaoua, M. Lopes, J. Gottlieb, Intrinsically motivated oculomotor exploration guided by uncertainty reduction and conditioned reinforcement in non-human primates, Nature Scientific Reports, 6(20202) (2016)10.1038/srep20202Search in Google Scholar PubMed PubMed Central

Received: 2016-05-03
Accepted: 2017-10-04
Published Online: 2017-12-07
Published in Print: 2017-11-27

© 2017 Nazmul Siddique et al

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Downloaded on 30.3.2024 from https://www.degruyter.com/document/doi/10.1515/pjbr-2017-0004/html
Scroll to top button