AUTHOR=Madani Omid 

TITLE=An information theoretic score for learning hierarchical concepts

JOURNAL=Frontiers in Computational Neuroscience

VOLUME=Volume 17 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/computational-neuroscience/articles/10.3389/fncom.2023.1082502

DOI=10.3389/fncom.2023.1082502

ISSN=1662-5188

ABSTRACT=How do humans learn the regularities of their complex noisy world in a
robust manner?  There is ample evidence that much of this learning and
development occurs in an unsupervised fashion via interactions with
the environment.  Both the structure of the world as well as the brain
appear hierarchical in a number of ways, and structured hierarchical
representations offer potential benefits for efficient learning and
organization of knowledge, such as concepts (patterns) sharing parts
(subpatterns).  A major question arises: what drives the processes
behind acquiring such hierarchical spatiotemporal concepts.  We posit
that the goal of advancing one's predictions is a major driver for
learning such hierarchies and introduce an information-theoretic
objective that shows promise in guiding the processes.

We have been exploring the challenges of building an integrated
learning and developing system by implementing one that works on raw
text: the system begins at the low level of characters, the hardwired
or primitive concepts, and acquires higher-level concepts over time
(strings of characters in our current realization). The system is
self-supervised: it learns to predict the concepts, as targets of
prediction, and to predict using them, as predictors. The learning is
scalable and open ended. For instance, tens of thousands of concepts
are learned after hundreds of thousands of episodes. We give an
overview of the current implementation, with a focus on the objective,
named \corep.  \core is based on comparing the prediction performance
of the system to a simple baseline system that is limited to
predicting with the primitives.  We explain how \core can handle
approximate matching.  More generally, \core incorporates a tradeoff
between how strongly a concept is predicted (or how well it fits its
context, \ie nearby concepts), \vs how well it matches the (ground)
"reality", \ie the lowest-level observations (the characters in the
input episode).  \core is applicable to generative models such as
probabilistic finite state machines (not just strings). We highlight a
few properties of \core with examples, and touch on a variety of
challenges and promising future directions in advancing the approach,
in particular, the challenge of learning concepts with more
sophisticated structure.