Skip to main content

OPINION article

Front. Robot. AI, 10 February 2023
Sec. Humanoid Robotics

Intrinsic motivation learning for real robot applications

  • Institut für Fördertechnik und Logistiksysteme, Karlsruher Institut für Technologie, Karlsruhe, Germany

1 Introduction

Humanoid robots are built to resemble the human body and mimic human motion and interaction (Hirai et al., 1998; Tikhanoff et al., 2010; Kajita et al., 2014). The recent research in this field aims to integrate these robots in our daily life, e.g., collaborative robots (Asfour et al., 2019; Ogenyi et al., 2021), social robots (Sandini et al., 2018), and service robots (Van Pinxteren et al., 2019). However, integrating such robots in our daily life is challenging because pre-programmed tasks and traditional control methods restrict the robots’ adaptability and flexibility. This shifts the research focus toward new machine learning methods for lifelong learning which enable autonomous online adaptation and continuous data-driven learning (Nguyen and Oudeyer, 2014; Parisi et al., 2019). Since humanoid’s design is closely related to humans, it is, therefore, essential to incorporate cognitive capabilities, learning skills and human-like abilities, e.g., curiosity and self-learning in these robots.

Recent developments in robotics and cognitive science may lead to a new generation of more versatile and adaptive robots, named Developmental Robots (Asada et al., 2001; Lungarella et al., 2003). Developmental robotics is a highly interdisciplinary research field linking natural and artificial systems. On the one hand, it aims to develop learning approaches for humanoids inspired by developmental aspects and learning mechanisms observed in children (Kim et al., 2008; Asada et al., 2009; Cangelosi et al., 2015). On the other hand, humanoids also serve as experimental platforms for better understanding of biological development (Asada et al., 2001; Asada et al., 2009; Cangelosi et al., 2015; Asano et al., 2017).

Developmental robots must autonomously develop, adapt and acquire their skills through their life-time, i.e., lifelong learning (Lungarella et al., 2003; Mai, 2013; Forestier, 2019). In contrast to industrial robots, which accomplish predefined tasks, developmental robots must solve unpredictable tasks, learn new skills, adapt to new environments, and cope with unforeseen challenges. Intrinsic motivation methods tackle these challenges through driving the robot’s learning and exploration autonomously by internally generated signals in an open-ended (i.e., unbounded) environment (Schmidhuber, 2010; Baranes and Oudeyer, 2013; Santucci et al., 2016; Baldassarre, 2019; Rayyes et al., 2020a; Rayyes, 2020; Rayyes et al., 2021). However, the high sample-complexity of these methods, i.e., the dense sampling required to approximate the learned function with a reasonable accuracy, restrict their real-world applications. Therefore, the majority of previous work has been demonstrated only in simulation as a proof of concept, and only a few were demonstrated in real robot experiments, e.g., (Tanneberg et al., 2018; Huang et al., 2019; Rayyes et al., 2020a; Rayyes et al., 2021).

In my opinion, increasing the sample-efficiency and the applicability of the intrinsic motivation methods can be done by combining them with mental replay (Andrychowicz et al., 2017; Rayyes et al., 2020b) and goal-directed methods, e.g., Goal Babbling (Rolf et al., 2011), as shown in the literature so far.

2 Intrinsic motivation

Intrinsic motivation in robotics has been inspired by developmental psychology, in which curiosity-driven behavior has been observed in children. Children get easily bored by known items and seek new ones driven by their curiosity to improve their knowledge and gain new experience (Schmidhuber, 2010). Intrinsic motivation methods in the literature can be sorted into two categories (Oudeyer and Kaplan, 2007; Santucci et al., 2013; Forestier, 2019; Rayyes, 2020): 1) knowledge-based, where the intrinsic motivation signal is devised based on the error between the prediction of the robot and its real outcome; 2) competence-based, where the intrinsic motivation signal is devised based on the learning progress of the robot. However, an experiment in (Baranes et al., 2014) showed that humans learn by maximizing their knowledge of a task and their competence. Accordingly, a recent intrinsic motivation method named “Interest Measurement” (Rayyes et al., 2020b) combined both knowledge-based and competence-based signals.

2.1 Knowledge-based intrinsic motivation

The knowledge-based intrinsic motivation methods in the literature are either novelty-based or prediction-based (Barto et al., 2013; Baldassarre, 2019) Novelty-based learning refers to learning from novel information and the intrinsic motivation signal is generated by comparing newly acquired knowledge with previously gained one (Baldassarre, 2019; Forestier, 2019), e.g., comparing observed scenes (Huang and Weng, 2004) to guide the robot’s exploration to discover new ones. Other examples are the intrinsic motivation signal in (Benureau and Oudeyer, 2016), which maximizes the diversity of the robot’s behaviors, and the intrinsic motivation signal in (Frank et al., 2014), which maximizes information gain by comparing (action-state) distribution before and after learning update. In (Oudeyer and Kaplan, 2007), novelty is detected based on a specific error threshold (Oudeyer and Kaplan, 2007).

Prediction-based learning refers to learning from prediction errors of the robot (Forestier, 2019), where high prediction errors indicate a good opportunity to learn from (Chentanez et al., 2005; Zhang et al., 2014). For example, the learning signal in (Rayyes et al., 2020a) measures the error between the robot’s performance and the robot’s prediction for reaching objects. The higher the error is, the more interesting the object becomes. Other authors named a prediction-based intrinsic motivation as surprise (Oudeyer and Kaplan, 2007; Barto et al., 2013). Other examples are the penalty signal (Huang et al., 2019), which is a dynamics-based surprise signal to avoid applying high forces during learning, and Bayesian surprise (Storck et al., 1994), which is used as a curiosity reward. In contrast, the free energy principle (Schwartenbeck et al., 2013; Kaplan and Friston, 2018; Ahmadi and Tani, 2019) assumes that humans try to minimize the long-term average of surprise. Minimizing surprise leads to maximizing model-evidence for intrinsically motivated agents in the context of decision-making.

The difference between prediction-based and novelty-based signals has been experimentally investigated (Caligiore et al., 2015). The results showed that novelty-based signals were more effective to drive the human learning. Still, there is no clear border between these two categories since high prediction errors indicate novel situations to learn from as shown recently in the novelty detection method (Rayyes et al., 2021). Similarly, (Barto et al., 2004; Oudeyer and Kaplan, 2007) considered high prediction error as a novelty-based signal.

2.2 Competence-based intrinsic motivation

Competence-Based methods measure the robot’s performance over time instead of instantaneous measures of the prediction errors (Schmidhuber, 1991; Baranes and Oudeyer, 2013; Rayyes et al., 2020a). For example, (Baranes and Oudeyer, 2013) monitored the performance error over a sliding window during the robot’s exploration. The most interesting regions of the workspace for the robots are where the robot demonstrates high changes in the error prediction regardless whether the error increases or decreases. In other words, the robot’s exploration is guided through the intrinsic motivation signal toward the regions where the robot’s performance changes drastically, whether the robot’s performance enhancing (learning) or deteriorating (forgetting). The Learning progress method in (Santucci et al., 2016) considered only when the error decreases over a sliding window, i.e., when the robot learns. This method drives the robot’s explorations toward easily learn-able tasks and avoids to learn near the border of the workspace as shown in (Rayyes, 2020). However, the main advantage of this method is that, it can avoid unreachable/unlearn-able objects/tasks (Santucci et al., 2016). In contrast, the forgetting factor method in (Rayyes et al., 2020a) monitors if the robot’s performance deteriorates. This allows the robot to refocus on the previously learned forgotten experiences, which might happens in the lifelong learning (Rayyes, 2020) due to continuous model update.

Most recent intrinsic motivation methods are competence-based methods (Schmidhuber, 2010; Baranes and Oudeyer, 2013; Santucci et al., 2013; Nguyen and Oudeyer, 2014; Forestier and Oudeyer, 2016; Santucci et al., 2016). (Santucci et al., 2013) showed that competence-based methods often lead to better performance than knowledge-based ones. A comparison between the methods was demonstrated for learning several reaching tasks using a simulated robot manipulator. However, how to transfer these results to more complex real-world robot applications remains an open question.

2.3 Combining knowledge-based with competence-based intrinsic motivation

While most intrinsic motivation methods in the literature are either knowledge-based or competence-based, an experimental study (Baranes et al., 2014) showed that humans tend to learn by trying to improve their knowledge about the tasks in hand and their competence. Interest Measurement (Rayyes et al., 2020b) is a recent intrinsic motivation method which combines knowledge-based with competence-based signals. The knowledge-based signal, named relative error, drives the robot’s exploration toward difficult to attain goals/tasks, e.g., goals near the border of the robot’s workspace. The competence-based signal is the forgetting factor which monitors where the robot’s performance deteriorates during lifelong learning. This combination of different learning signals led to high sample-efficiency which facilitates online data-driven direct learning on real robots without any pre-training in simulation as shown in (Rayyes et al., 2020a; Rayyes et al., 2020b; Rayyes, 2020).

3 Intrinsic motivation in real applications

The main challenge for intrinsic motivation is the applicability due to the high sample-complexity of the proposed methods. Therefore, only a few methods have been demonstrated on real robots, e.g., (Oudeyer et al., 2007; Hart and Grupen, 2011; Duminy et al., 2016; Forestier et al., 2017; Tanneberg et al., 2018; Huang et al., 2019). However, not all of these methods have demonstrated efficient learning or goal-directed exploration. For instance, in (Forestier et al., 2017; Seepanomwan et al., 2017) the robot preformed random movements during the exploration which is inefficient and incompatible with humans’motion (von Hofsten, 2004). In contrast, (Tanneberg et al., 2018; Huang et al., 2019; Rayyes et al., 2021; 2020a) have demonstrated high sample-efficiency and goal-directed motion. The only methods with notable high sample-efficiency are the methods which integrated intrinsic motivation with mental replay methods (Andrychowicz et al., 2017; Rayyes et al., 2020b).

3.1 Mental replay

Mental replay is an essential component in human learning (Foster and Wilson, 2006). Mental replay methods have been proposed for robotics to reduce sampling complexity and to speed up the learning process (Lin, 1993; Mnih et al., 2013; Andrychowicz et al., 2017; Riedmiller et al., 2018; Tanneberg et al., 2018; Gerken and Spranger, 2019; Rayyes et al., 2020b). Therefore, these methods are essential for deploying data-driven learning methods on real robots, since sampling in real robot applications is very costly regarding time and hardware. Additionally, Mental Replay has been used to overcome forgetting in lifelong learning (Parisi et al., 2019).

4 Discussion

Intrinsic motivation is very promising to integrate humanoids in our daily life. It is compatible with online and lifelong learning, and it provides adaptability and flexibilities for the robots. Since the main challenge of intrinsic motivation methods are the high sample complexity for real robot applications due to tear and wear. The question is how to increase the potential of these methods to be applied in real-world scenarios. The only solution to pave the way for real robot applications is to increase the sample-efficiency. On the one hand, the mental replay methods play a significant role to decrease drastically the amount of required data to learn a model with a reasonable accuracy. On the other hand, the learning and the exploration should be organized as goal-directed motion, e.g., Goal Babbling (Rolf and Steil, 2014), active learning (Baranes and Oudeyer, 2013), interest-driven Goal Babbling (Rayyes et al., 2020b), etc. Random exploration to collect data is unrealistic for robots with many degrees of freedom. The respective high-dimensional spaces, e.g., for motor commands, cannot be exhausted through random or systematic exploration owing to a combinatorial explosion. Additionally, studies on infants have shown that neonates do not behave randomly, but rather demonstrate goal-directed motion a few days after birth (von Hofsten, 2004). Hence, combining purely goal directed methods with mental replay and intrinsic motivation can increase the sample-efficiency remarkably and accordingly can be deployed on real robots.

Author contributions

The author confirms being the sole contributor of this work and has approved it for publication.

Funding

The author´s position is funded by InnovationsCampus Mobilität der Zukunft (ICM).

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ahmadi, A., and Tani, J. (2019). A novel predictive-coding-inspired variational RNN model for online prediction and recognition. Neural Comput. 31, 2025–2074. doi:10.1162/neco_a_01228

PubMed Abstract | CrossRef Full Text | Google Scholar

Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., et al. (2017). “Hindsight experience replay,” in Advances in Neural Information Processing Systems. Editors I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Curran Associates, Inc.), 30. Available at: https://proceedings.neurips.cc/paper/2017/file/453fadbd8a1a3af50a9df4df899537b5-Paper.pdf.

Google Scholar

Asada, M., Hosoda, K., Kuniyoshi, Y., Ishiguro, H., Inui, T., Yoshikawa, Y., et al. (2009). Cognitive developmental robotics: A survey. IEEE Trans. Aut. Ment. Dev. 1, 12–34. doi:10.1109/TAMD.2009.2021702

CrossRef Full Text | Google Scholar

Asada, M., MacDorman, K. F., Ishiguro, H., and Kuniyoshi, Y. (2001). Cognitive developmental robotics as a new paradigm for the design of humanoid robots. Robotics Aut. Syst. 37, 185–193. doi:10.1016/S0921-8890(01)00157-9

CrossRef Full Text | Google Scholar

Asano, Y., Okada, K., and Inaba, M. (2017). Design principles of a human mimetic humanoid: Humanoid platform to study human intelligence and internal body system. Sci. Robotics 2, eaaq0899. doi:10.1126/scirobotics.aaq0899

PubMed Abstract | CrossRef Full Text | Google Scholar

Asfour, T., Wächter, M., Kaul, L., Rader, S., Weiner, P., Ottenhaus, S., et al. (2019). ARMAR-6: A high-performance humanoid for human-robot collaboration in real world scenarios. IEEE Robotics Automation Mag. 26, 108–121. doi:10.1109/MRA.2019.2941246

CrossRef Full Text | Google Scholar

Baldassarre, G. (2019). Intrinsic motivations and open-ended learning. arXiv.

Google Scholar

Baranes, A. F., Oudeyer, P.-Y., and Gottlieb, J. (2014). The effects of task difficulty, novelty and the size of the search space on intrinsically motivated exploration. Front. Neurosci. 8, 317. doi:10.3389/fnins.2014.00317

PubMed Abstract | CrossRef Full Text | Google Scholar

Baranes, A., and Oudeyer, P. (2013). Active learning of inverse models with intrinsically motivated goal exploration in robots. Robot. Auton. Syst. 61, 49–73. doi:10.1016/j.robot.2012.05.008

CrossRef Full Text | Google Scholar

Barto, A., Mirolli, M., and Baldassarre, G. (2013). Novelty or surprise? Front. Psychol. 4, 907. doi:10.3389/fpsyg.2013.00907

PubMed Abstract | CrossRef Full Text | Google Scholar

Barto, A. G., Singh, S., and Chentanez, N. (2004). “Intrinsically motivated learning of hierarchical collections of skills,” in Proceedings of the 3rd International Conference on Development and Learning, 112.

Google Scholar

Benureau, F., and Oudeyer, P.-Y. (2016). Behavioral diversity generation in autonomous exploration through reuse of past experience. Front. Robotics AI 3, 8. doi:10.3389/frobt.2016.00008

CrossRef Full Text | Google Scholar

Caligiore, D., Magda Mustile, D. C., Redgrave, P., Triesch, J., Marsico, M. D., Baldassarre, G., et al. (2015). Intrinsic motivations drive learning of eye movements: An experiment with human adults. PLOS ONE 10, e0118705. doi:10.1371/journal.pone.0118705

PubMed Abstract | CrossRef Full Text | Google Scholar

Cangelosi, A., Schlesinger, M., and Smith, L. B. (2015). Developmental robotics: From babies to robots. Massachusetts: MIT Press.

Google Scholar

Chentanez, N., Barto, A. G., and Singh, S. P. (2005). “Intrinsically motivated reinforcement learning,” in Advances in neural information processing systems (Massachusetts: MIT Press), 1281–1288.

CrossRef Full Text | Google Scholar

Duminy, N., Nguyen, S. M., and Duhaut, D. (2016). “Strategic and interactive learning of a hierarchical set of tasks by the poppy humanoid robot,” in 2016 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Cergy-Pontoise, France, 19-22 September 2016 (IEEE), 204–209.

CrossRef Full Text | Google Scholar

Forestier, S. (2019). Intrinsically motivated goal exploration in child development and artificial intelligence: Learning and development of speech and tool use. France: Université Bordeaux.

Google Scholar

Forestier, S., and Oudeyer, P. (2016). “Modular active curiosity-driven discovery of tool use,” in 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea (South), 09-14 October 2016 (IEEE), 3965–3972.

CrossRef Full Text | Google Scholar

Forestier, S., Portelas, R., Mollard, Y., and Oudeyer, P-Y. (2017). Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv e-prints, arXiv:1708.02190.

Google Scholar

Foster, D., and Wilson, M. (2006). Reverse replay of behavioural sequences in hippocampal place cells during the awake state. Nature 440, 680–683. doi:10.1038/nature04587

PubMed Abstract | CrossRef Full Text | Google Scholar

Frank, M., Leitner, J., Stollenga, M., Förster, A., and Schmidhuber, J. (2014). Curiosity driven reinforcement learning for motion planning on humanoids. Front. Neurorobotics 7, 25. doi:10.3389/fnbot.2013.00025

CrossRef Full Text | Google Scholar

Gerken, A., and Spranger, M. (2019). “Continuous value iteration (CVI) reinforcement learning and imaginary experience replay (IER) for learning multi-goal, continuous action and state space controllers,” in 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20-24 May 2019 (IEEE).

CrossRef Full Text | Google Scholar

Hart, S., and Grupen, R. (2011). Learning generalizable control programs. IEEE Trans. Aut. Ment. Dev. 3, 216–231. doi:10.1109/tamd.2010.2103311

CrossRef Full Text | Google Scholar

Hirai, K., Hirose, M., Haikawa, Y., and Takenaka, T. (1998). “The development of honda humanoid robot,” in Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146), Leuven, Belgium, 20-20 May 1998 (IEEE), 1321–1326. doi:10.1109/ROBOT.1998.677288

CrossRef Full Text | Google Scholar

Huang, S. H., Zambelli, M., Kay, J., Martins, M. F., Tassa, Y., Pilarski, P. M., et al. (2019). Learning gentle object manipulation with curiosity-driven deep reinforcement learning. CoRR abs/1903.08542.

Google Scholar

Huang, X., and Weng, J. (2004). “Motivational system for human-robot interaction,” in Computer vision in human-computer interaction (Berlin, Heidelberg: Springer Berlin Heidelberg), 17–27.

CrossRef Full Text | Google Scholar

Kajita, S., Hirukawa, H., Harada, K., and Yokoi, K. (2014). Introduction to humanoid robotics. Berlin: Springer.

Google Scholar

Kaplan, R., and Friston, K. J. (2018). Planning and navigation as active inference. Biol. Cybern. 112, 323–343. doi:10.1007/s00422-018-0753-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Kim, H., Jasso, H., Deak, G., and Triesch, J. (2008). “A robotic model of the development of gaze following,” in 2008 7th IEEE International Conference on Development and Learning, Monterey, CA, USA, 09-12 August 2008 (IEEE), 238–243.

CrossRef Full Text | Google Scholar

Lin, L-J. (1993). Reinforcement learning for robots using neural networks (Technical report, DTIC Document). Pittsburgh: Carnegie Mellon University.

Google Scholar

Lungarella, M., Metta, G., Pfeifer, R., and Sandini, G. (2003). Developmental robotics: A survey. Connect. Sci. 15, 151–190. doi:10.1080/09540090310001655110

CrossRef Full Text | Google Scholar

Mai, N. S. (2013). A curious robot learner for interactive goal-babbling:: Strategically choosing what, how, when and from whom to learn. Nouvelle-Aquitaine: Universite de Bordeaux.

Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al. (2013). Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.

Google Scholar

Nguyen, S. M., and Oudeyer, P. (2014). Socially guided intrinsic motivation for robot learning of motor skills. Aut. Robots 36 (3), 273–294. abs/1804.07269. doi:10.1007/s10514-013-9339-y

CrossRef Full Text | Google Scholar

Ogenyi, U. E., Liu, J., Yang, C., Ju, Z., and Liu, H. (2021). Physical human–robot collaboration: Robotic systems, learning methods, collaborative strategies, sensors, and actuators. IEEE Trans. Cybern. 51, 1888–1901. doi:10.1109/TCYB.2019.2947532

PubMed Abstract | CrossRef Full Text | Google Scholar

Oudeyer, P.-Y., and Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Front. Neurorobotics 1, 6. doi:10.3389/neuro.12.006.2007

PubMed Abstract | CrossRef Full Text | Google Scholar

Oudeyer, P., Kaplan, F., and Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Trans. Evol. Comput. 11, 265–286. doi:10.1109/tevc.2006.890271

CrossRef Full Text | Google Scholar

Parisi, G., Kemker, R., Part, J., Kanan, C., and Wermter, S. (2019). Continual lifelong learning with neural networks: A review. Neural Netw. 113, 54–71. doi:10.1016/j.neunet.2019.01.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Rayyes, R., Donat, H., and Steil, J. (2020a). Efficient online interest-driven exploration for developmental robots. IEEE Trans. Cognitive Dev. Syst. 14, 1367–1377. doi:10.1109/TCDS.2020.3001633

CrossRef Full Text | Google Scholar

Rayyes, R., Donat, H., and Steil, J. (2020b). “Hierarchical interest-driven goal babbling for efficient bootstrapping of sensorimotor skills,” in 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May 2020 - 31 August 2020 (IEEE), 1336–1342.

CrossRef Full Text | Google Scholar

Rayyes, R., Donat, H., Steil, J., and Spranger, M. (2021). Interest-driven exploration with observational learning for developmental robots. IEEE Trans. Cognitive Dev. Syst. 1. doi:10.1109/TCDS.2021.3057758

CrossRef Full Text | Google Scholar

Rayyes, R. (2020). Efficient and stable online learning for developmental robots. Ph.D. thesis, Dissertation (Braunschweig: Technische Universität Braunschweig).

Google Scholar

Riedmiller, M., Hafner, R., Lampe, T., Neunert, M., Degrave, J., van de Wiele, T., et al. (2018). “Learning by playing solving sparse reward tasks from scratch,” in Proceedings of the 35th International Conference on Machine Learning. (PMLR) 80, 4344–4353. Available at: http://proceedings.mlr.press/v80/riedmiller18a/riedmiller18a.pdf.

Google Scholar

Rolf, M., and Steil, J. (2014). Efficient exploratory learning of inverse kinematics on a bionic elephant trunk. IEEE Trans. Neural Netw. Learn. Syst. 25, 1147–1160. doi:10.1109/TNNLS.2013.2287890

CrossRef Full Text | Google Scholar

Rolf, M., Steil, J. J., and Gienger, M. (2011). “Online goal babbling for rapid bootstrapping of inverse models in high dimensions,” in IEEE Int. Conf. Development and Learning and on Epigenetic Robotics, Frankfurt am Main, Germany, 24-27 August 2011 (IEEE), 1–8.

CrossRef Full Text | Google Scholar

Sandini, G., Mohan, V., Sciutti, A., and Morasso, P. (2018). Social cognition for human-robot symbiosis—Challenges and building blocks. Front. neurorobotics 12, 34. doi:10.3389/fnbot.2018.00034

CrossRef Full Text | Google Scholar

Santucci, V., Baldassarre, G., and Mirolli, M. (2013). Which is the best intrinsic motivation signal for learning multiple skills? Front. Neurorobotics 7, 22. doi:10.3389/fnbot.2013.00022

PubMed Abstract | CrossRef Full Text | Google Scholar

Santucci, V. G., Baldassarre, G., and Mirolli, M. (2016). Grail: A goal-discovering robotic architecture for intrinsically-motivated learning. IEEE Trans. Cognitive Dev. Syst. 8, 214–231. doi:10.1109/TCDS.2016.2538961

CrossRef Full Text | Google Scholar

Schmidhuber, J. (1991). “Curious model-building control systems,” in IEEE International Joint Conference on Neural Networks, Singapore, 18-21 November 1991 (IEEE), 1458–1463.

CrossRef Full Text | Google Scholar

Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990 - 2010). IEEE Trans. Aut. Ment. Dev. 2, 230–247. doi:10.1109/TAMD.2010.2056368

CrossRef Full Text | Google Scholar

Schwartenbeck, P., Fitzgerald, T., Dolan, R. J., and Friston, K. (2013). Exploration, novelty, surprise, and free energy minimization. Front. Psychol. 4, 710–715. doi:10.3389/fpsyg.2013.00710

PubMed Abstract | CrossRef Full Text | Google Scholar

Seepanomwan, K., Santucci, V. G., and Baldassarre, G. (2017). “Intrinsically motivated discovered outcomes boost user’s goals achievement in a humanoid robot,” in 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), Lisbon, Portugal, 18-21 September 2017 (IEEE), 178–183.

Google Scholar

Storck, J., Hochreiter, S., and Schmidhuber, J. (1994). Reinforcement driven information acquisition in non-deterministic environments. California: ICANN, 159–164.

Google Scholar

Tanneberg, D., Peters, J., and Rueckert, E. (2018). Intrinsic motivation and mental replay enable efficient online adaptation in stochastic recurrent networks. CoRR abs/1802.08013.

Google Scholar

Tikhanoff, V., Cangelosi, A., and Metta, G. (2010). Integration of speech and action in humanoid robots: Icub simulation experiments. IEEE Trans. Aut. Ment. Dev. 3, 17–29. doi:10.1109/TAMD.2010.2100390

CrossRef Full Text | Google Scholar

Van Pinxteren, M. M., Wetzels, R. W., Rüger, J., Pluymaekers, M., and Wetzels, M. (2019). Trust in humanoid robots: Implications for services marketing. J. Serv. Mark. 33, 507–518. doi:10.1108/JSM-01-2018-0045

CrossRef Full Text | Google Scholar

von Hofsten, C. (2004). An action perspective on motor development. Trends CogSci 8, 266–272. doi:10.1016/j.tics.2004.04.002

CrossRef Full Text | Google Scholar

Zhang, C., Zhao, Y., Triesch, J., and Shi, B. E. (2014). “Intrinsically motivated learning of visual motion perception and smooth pursuit,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May 2014 - 07 June 2014 (IEEE), 1902–1908.

CrossRef Full Text | Google Scholar

Keywords: intrinsic motivation (IM), lifelong learnig, mental replay, goal directed learning, open-ended learning, active learning

Citation: Rayyes R (2023) Intrinsic motivation learning for real robot applications. Front. Robot. AI 10:1102438. doi: 10.3389/frobt.2023.1102438

Received: 18 November 2022; Accepted: 13 January 2023;
Published: 10 February 2023.

Edited by:

Giovanni Stellin, Danieli Telerobot Labs, Italy

Reviewed by:

Christian Balkenius, Lund University, Sweden

Copyright © 2023 Rayyes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Rania Rayyes, rania.rayyes@kit.edu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.