AUTHOR=Madani Omid TITLE=An information theoretic score for learning hierarchical concepts JOURNAL=Frontiers in Computational Neuroscience VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2023.1082502 DOI=10.3389/fncom.2023.1082502 ISSN=1662-5188 ABSTRACT=How do humans learn the regularities of their complex noisy world in a robust manner? There is ample evidence that much of this learning and development occurs in an unsupervised fashion via interactions with the environment. Both the structure of the world as well as the brain appear hierarchical in a number of ways, and structured hierarchical representations offer potential benefits for efficient learning and organization of knowledge, such as concepts (patterns) sharing parts (subpatterns). A major question arises: what drives the processes behind acquiring such hierarchical spatiotemporal concepts. We posit that the goal of advancing one's predictions is a major driver for learning such hierarchies and introduce an information-theoretic objective that shows promise in guiding the processes. We have been exploring the challenges of building an integrated learning and developing system by implementing one that works on raw text: the system begins at the low level of characters, the hardwired or primitive concepts, and acquires higher-level concepts over time (strings of characters in our current realization). The system is self-supervised: it learns to predict the concepts, as targets of prediction, and to predict using them, as predictors. The learning is scalable and open ended. For instance, tens of thousands of concepts are learned after hundreds of thousands of episodes. We give an overview of the current implementation, with a focus on the objective, named \corep. \core is based on comparing the prediction performance of the system to a simple baseline system that is limited to predicting with the primitives. We explain how \core can handle approximate matching. More generally, \core incorporates a tradeoff between how strongly a concept is predicted (or how well it fits its context, \ie nearby concepts), \vs how well it matches the (ground) "reality", \ie the lowest-level observations (the characters in the input episode). \core is applicable to generative models such as probabilistic finite state machines (not just strings). We highlight a few properties of \core with examples, and touch on a variety of challenges and promising future directions in advancing the approach, in particular, the challenge of learning concepts with more sophisticated structure.