Eight challenges in developing theory of intelligence

Huang, Haiping

doi:10.3389/fncom.2024.1388166

PERSPECTIVE article

Front. Comput. Neurosci., 24 July 2024

Volume 18 - 2024 | https://doi.org/10.3389/fncom.2024.1388166

This article is part of the Research TopicNeural Inspired Computational Intelligence and Its ApplicationsView all 6 articles

Eight challenges in developing theory of intelligence

Haiping Huang^*

PMI Lab, School of Physics, Sun Yat-sen University, Guangzhou, China

A good theory of mathematical beauty is more practical than any current observation, as new predictions about physical reality can be self-consistently verified. This belief applies to the current status of understanding deep neural networks including large language models and even the biological intelligence. Toy models provide a metaphor of physical reality, allowing mathematically formulating the reality (i.e., the so-called theory), which can be updated as more conjectures are justified or refuted. One does not need to present all details in a model, but rather, more abstract models are constructed, as complex systems such as the brains or deep networks have many sloppy dimensions but much less stiff dimensions that strongly impact macroscopic observables. This type of bottom-up mechanistic modeling is still promising in the modern era of understanding the natural or artificial intelligence. Here, we shed light on eight challenges in developing theory of intelligence following this theoretical paradigm. Theses challenges are representation learning, generalization, adversarial robustness, continual learning, causal learning, internal model of the brain, next-token prediction, and the mechanics of subjective experience.

1 Introduction

Brain is one of the most challenging subjects to understand. The brain is complex with many levels of temporal and spatial complexities (Gerstner et al., 2014), allowing for coarse-grained descriptions at different levels, especially in theoretical studies. More abstract models lose the ability to generate predictions on low-level details but bring the conceptual benefits of explaining precisely how the system works, and the mathematical description may be universal, independent of details (or sloppy variables) (Levenstein et al., 2023). One seminal example is the Hopfield model (Hopfield, 1982), where the mechanism underlying the associative memory observed in the brain was precisely isolated (Amit et al., 1987; Griniasty et al., 1993). There is a resurgence of research interests in Hopfield networks in recent years due to the large language models (Krotov and Hopfield, 2020; Ramsauer et al., 2020).

In Marr's viewpoint (Marr, 1982), understanding a neural system can be divided into three levels: computation (which task the brain solves), algorithms (how the brain solves the task, i.e., information processing level), and implementation (neural circuit level). Following the first two levels, researchers designed artificial neural networks to solve challenging real-world problems, such as powerful deep learning (LeCun et al., 2015; Schmidhuber, 2015). However, biological details are also being incorporated into models of neural networks (Abbott et al., 2016; Marblestone et al., 2016; Richards et al., 2019; Lillicrap et al., 2020) and even used to design new learning rules (Schmidgall et al., 2023). Indeed, neuroscience studies of biological mechanisms of perception, cognition, memory, and action have already provided a variety of fruitful insights inspiring the empirical or scientific studies of artificial neural networks, which, in turn, inspire the neuroscience researchers to design mechanistic models to understand the brain (Yamins and DiCarlo, 2016; Hassabis et al., 2017; Saxe et al., 2020). Therefore, it is promising to integrate physics, statistics, computer science, psychology, neuroscience, and engineering to reveal inner working of deep (biological) networks and intelligence with testable predictions (Ma et al., 2022), rather than using a black box (e.g., deep artificial neural networks) to understand the other black boxes (e.g, the brain or mind). In fact, the artificial intelligence may follow different principles from the natural intelligence, but both can inspire each other, which may lead to the establishment of a coherent mathematical physics foundation for either artificial intelligence or biological intelligence.

The goal of providing a unified framework for neural computation is very challenging and even impossible. Due to re-boosted interests in neural networks, there appears a lot of important yet unsolved scientific questions. We shall detail these challenging questions below¹ and provide our personal viewpoints toward a statistical mechanics theory, solving these fundamental questions, based on the first principles in physics. These open scientific questions toward theory of intelligence are presented in Figure 1.

Figure 1

Figure 1. Schematic illustration of eight open challenging problems toward the theory of intelligence.

2 Challenge I—Representation learning

Given raw data (or input–output pairs in supervised learning), one can ask what a good representation is and how the meaningful representation is achieved in deep neural networks. We have not yet satisfied answers for these questions. A promising argument is that entangled manifolds at earlier layers of a deep hierarchy are gradually disentangled into linearly separable features at output layers (DiCarlo and Cox, 2007; Bengio et al., 2013; Brahma et al., 2016; Huang, 2018; Cohen et al., 2020). This manifold separation perspective is also promising in system neuroscience studies of associative learning by separating overlapping patterns of neural activities (Cayco-Gajic and Silver, 2019). However, an analytic theory of the manifold transformation is still lacking, prohibiting us from a full understanding of which key network parameters control the geometry of manifold and how learning reshapes the manifold. For example, the correlation among synapses (e.g., arising during learning) will attenuate the decorrelation process along the network depth but encourage dimension reduction compared with their orthogonal counterparts (Huang, 2018; Zhou and Huang, 2021). This result is derived by using mean-field approximation and coincides with empirical observations (Zhou and Huang, 2021). In addition, there may exist other biological plausible factors such as normalization, attention, and homeostatic control impacting the manifold transformation (Turrigiano and Nelson, 2004; Reynolds and Heeger, 2009), which can be incorporated into a toy model in future to test the manifold transformation hypothesis.

Another argument from information theoretic viewpoints demonstrates that the input information is maximally compressed into a hidden representation, whose task-related information should be maximally retrieved at the output layers, according to the information bottleneck theory (Achille and Soatto, 2017; Shwartz-Ziv and Tishby, 2017). In this sense, an optimal representation must be invariant to nuisance variability, and their components must be maximally independent, which may be related to causal factors (latent causes) explaining the sensory inputs (see the following fifth challenge). In a physics language, a coarse-grained (or more abstract) representation is formed in deeper layers compared with the fine-grained representation in shallower layers. How microscopic interactions among synapses determine this representation transformation remains elusive and thus deserves future studies; a few recent studies started to address the clustering structure in the deep hierarchy (Li and Huang, 2020, 2023; Alemanno et al., 2023; Xie et al., 2024). To conclude, the bottom-up mechanistic modeling would be fruitful in dissecting mechanisms of representation transformation.

3 Challenge II—Generalization

Studying any neural learning system must consider three ingredients: data, network, and algorithm (or DNA of neural learning). The generalization ability refers to the computational performance that the network is able to implement the rule in unseen examples. Therefore, intelligence can be considered to some extent as the ability of generalization, especially given very few examples for learning. Therefore, the generalization is also a hot topic in current studies of deep learning. Traditional statistical learning theory claims that over-fitting effects should be strong when the number of examples is much less than the number of parameters to learn, which could not explain the current success of deep learning. A promising perspective is to study the causal connection between the loss landscape and the generalization properties (Huang and Kabashima, 2014; Baldassi et al., 2016; Spigler et al., 2019; Zou and Huang, 2021). For a single layered perceptron, a statistical mechanics theory can be systematically derived and revealed a discontinuous transition from poor to perfect generalization (Gyorgyi, 1990; Sompolinsky et al., 1990). In contrast to the classical bias-variance trade-off (U-shaped curve of the test error vs. increasing model complexity) (Mehta et al., 2019), the modern deep learning achieves the state-of-the-art performance in the over-parameterized regime (Belkin et al., 2019; Spigler et al., 2019), a regime of the number of parameters much larger than the training data size. However, how to provide an analytic argument about the over-fitting effects vs. different parameterization regimes (e.g., under-, over-, and super-parameterization) for this empirical observation becomes a non-trivial task (Adlam and Pennington, 2020). A recent study of one-hidden-layer networks shows that the first transition occurs at the interpolation point, where perfect fitting becomes possible. This transition reflects the properties of hard-to-sample typical solutions. Increasing the model complexity, the second transition occurs with the discontinuous appearance of atypical solutions. They are wide minima of good generalization properties. This second transition sets an upper bound for the effectiveness of learning algorithms (Baldassi et al., 2022). This statistical mechanics analysis focuses on the average case (average of all realizations of data, network, and algorithm) rather than the worst case. The worst case determines the computational complexity category, while the average case explains us the universal properties of learning, and the statistical mechanics links the computational hardness to a few order parameters in physics (Huang, 2022), and these previous studies show strong evidence (Huang and Kabashima, 2014; Baldassi et al., 2016, 2022; Spigler et al., 2019; Hou and Huang, 2020).

For an infinitely wide neural network, there exists a lazy learning regime, where the overparameterized neural networks can be well approximated by a linear model corresponding to a first-order Taylor expansion around the initialization, and the complex learning dynamics is simply training a kernel machine (Belkin, 2021). However, in a practical training, the dynamics is prone to escape the lazy regime, which has no satisfied theory yet. Therefore, clarifying which of lazy-learning (or neural tangent kernel limit) and feature-learning (or mean-field limit) may explain the success of deep supervised learning remains open and challenging (Jacot et al., 2018; Bartlett et al., 2021; Fang et al., 2021). The mean-field limit can be studied in the field theoretic framework, characterizing how the solution of learning deviates from the initialization through a systematic perturbation of the action in the framework (Segadlo et al., 2022). Another related challenge is out-of-distribution generalization, which can also be studied using statistical mechanics, e.g., in a recent study, a kernel regression was analyzed (Canatar et al., 2021). In addition, the field theoretic method is also promising to write the learning problem of out-of-distribution prediction into propagating correlations and responses (Segadlo et al., 2022).

4 Challenge III—Adversarial vulnerability

Adversarial examples are defined by those inputs with human-imperceptible modifications, leading to unexpected errors in a deep learning decision making system. The test accuracy drops as the perturbation grows; the perturbation can either rely on the trained network or be an independent noise (Szegedy et al., 2014; Goodfellow et al., 2015; Jiang et al., 2021). The current deep learning is argued to learn predictive yet non-robust features in the data (Geirhos et al., 2020). This adversarial vulnerability of deep neural networks poses a significant challenge in the practical applications of both real-world problems and AI4S (artificial intelligence for science) studies. Adversarial training remains the most effective solution to the problem (Madry et al., 2018), in contrast to human learning. However, the training sacrifices the standard discrimination. A recent study applied the physics principle that the hidden representation is clustered-like replica symmetry breaking in the spin glass theory (Mézard et al., 1987), which leads to contrastive learning that is local and adversarial robust, resolving the trade-off between standard accuracy and adversarial robustness (Xie et al., 2024). Furthermore, the adversarial robustness can be theoretically explained in terms of a cluster separation distance. In physics, systems with a huge number of degrees of freedom are able to be captured by a low-dimensional macroscopic description, such as Ising ferromagnetic model. Explaining the layered computation in terms of geometry may finally help to crack the mysterious property of the networks' susceptibility to adversarial examples (Bortolussi and Sanguinetti, 2018; Gilmer et al., 2018; Li and Huang, 2023). Although some recent efforts were devoted to this direction (Bortolussi and Sanguinetti, 2018; Kenway, 2018), more exciting results are expected in near future studies.

5 Challenge IV—Continual learning

A biological brain is good at adapting the acquired knowledge from similar tasks to domains of new tasks, even though only a handful examples are available in the new task domain. This type of learning is called continual learning or multi-task learning (McCloskey and Cohen, 1989; Kirkpatrick et al., 2017), an ability to learn many tasks in sequence, while transfer learning refers to the process of exploiting the previously acquired knowledge from a source task, to improve the generalization performance in a target task (Parisi et al., 2019). However, the stable adaptation to changing environments, an essence of lifelong learning, remains a significant challenge for modern artificial intelligence (Parisi et al., 2019). More precisely, neural networks are, in general, poor at the multi-task learning, although impressive progresses have been achieved in recent years. For example, during learning, a diagonal Fisher information term is computed to measure importances of weights (then a rapid change is not allowed for those important weights) for previous tasks (Kirkpatrick et al., 2017). A later refinement was also proposed by allowing synapses, accumulating task-relevant information over time (Zenke et al., 2017). To reduce the catastrophic-forgetting effects, more machine learning techniques were summarized in the review (Parisi et al., 2019). However, we still do not know the exact mechanisms for principally mitigating the catastrophic-forgetting effects, which calls for theoretical studies of deep learning in terms of adaptation to domain-shift training, i.e., connection weights trained in a solution to one task are transformed to benefit learning on a related task.

Using asymptotic analysis, a recent article studying transfer learning identified a phase transition in the quality of the knowledge transfer (Dhifallah and Lu, 2021). This study reveals how the related knowledge contained in a source task can be effectively transferred to boost the performance in a target task. Other recent theoretical studies interpreted the continual learning with a statistical mechanics framework using Franz-Parisi potential (Li et al., 2023) or as an on-line mean-field dynamics of weight updates (Lee et al., 2021). The Franz-Parisi potential is a thermodynamic potential used to study glass transition (Franz and Parisi, 1995). The recent study assumes that the knowledge from the previous task behaves as a reference configuration (Li et al., 2023), where the previously acquired knowledge serves as an anchor for learning new knowledge. This framework also connects to elastic weight consolidation (Kirkpatrick et al., 2017), heuristic weight-uncertainty modulation (Ebrahimi et al., 2020), and neuroscience-inspired metaplasticity (Laborieux et al., 2021), providing a theory-grounded method for the real-world multi-task learning with deep networks.

6 Challenge V—Causal learning

Deep learning is criticized as being nothing but a fancy curve-fitting tool, making a naive association between inputs and outputs. In other words, this tool could not distinguish correlation from causation. What the deep network learns is not a concept but merely a statistical correlation, prohibiting the network from counterfactual inference (a hallmark ability of intelligence). A human-like AI must be good at retrieving causal relationship among feature components in sensory inputs, thereby carving relevant information from a sea of irrelevant noise (Pearl and Mackenzie, 2018; Schölkopf, 2019; Schölkopf et al., 2021). Therefore, understanding cause and effect in deep learning systems is particularly important for the next-generation artificial intelligence. The question whether the current deep learning algorithm is able to do causal reasoning remains open. Hence, how to design a learning system that can infer the effect of an intervention becomes a key to address this question, although it would be very challenging to make deep learning extract causal structure from observations by applying simple physics principles due to both architecture and learning complexities. This challenge is now intimately related to the astonishing performances of large language models (see the following seventh challenge), and the key question is whether the self-attention mechanism is sufficient for capturing the causal relationships in the training data.

7 Challenge VI—Internal model of the brain

The brain is argued to learn to build an internal model of the outside world, which is reflected by spontaneous neural activities as a reservoir for computing (e.g., sampling) (Ringach, 2009). The agreement between spontaneous activity and stimulus-evoked one increases during development, especially for natural stimuli (Berkes et al., 2011), while the spontaneous activity outlines the regime of evoked neural responses (Luczak et al., 2009). The relationship between the spontaneous fluctuation and task-evoked response causes recent interests in studying brain dynamics (Deco et al., 2023). This can be formulated by the fluctuation–dissipation theorem in physics, and the violation can be a measure of deviation from equilibrium, although a non-equilibrium stationary state exists.

In addition, the stimuli were shown to carve a clustered neural space (Huang and Toyoizumi, 2016; Berry and Tkačik G, 2020). Then, an interesting question is what the spontaneous neural space looks like, and how the space dynamically evolves, especially in the adaptation to changing environments. Furthermore, how sensory inputs combined with the ongoing asynchronous cortical activity determine the animal behavior remains open and challenging. If the reward-mediated learning is considered, reinforcement learning was used to build world models of structured environments (Ha and Schmidhuber, 2018). In the reinforcement learning, observations are used to drive actions, which are evaluated based on reward signals the agent receives from the environment after taking the actions. It is thus interesting to reveal which type of internal models the agent establishes through learning from interactions with the environments. This can be connected to aforementioned representation and generalization challenges. Moreover, a recent study showed a connection between the reinforcement learning and statistical physics (Rahme and Adams, 2019), suggesting that a statistical mechanics theory could be potentially established to understand how an optimal policy is found to maximize the long-term accumulated reward, with an additionally potential impact on studying reward-based neural computations in the brain (Neftci and Averbeck, 2019).

Another angle to look at the internal model of the brain is through the lens of neural dynamics (Buonomano and Maass, 2009; Deco et al., 2009; Sussillo and Abbott, 2009; Vyas et al., 2020), which is placed onto a low-dimensional surface, robust to variations in detailed properties of individual neurons or circuits. The representation of stimuli, tasks, or contexts can be retrieved for deriving experimentally testable hypotheses (Jazayeri and Ostojic, 2021). Although previous theoretical studies were carried out in recurrent rate or spiking activity neural networks (Sompolinsky et al., 1988; Brunel, 2000), a challenging issue remains to address how neural activity and synaptic plasticity interact with each other to yield a low-dimensional internal representation for cognitive functions. The recent development of synaptic plasticity combining connection probability, local synaptic noise, and neural activity can realize a dynamic network in the adaptation to time-dependent inputs (Zou et al., 2023). This study interprets learning as a variational inference problem, making optimal learning under uncertainty possible in a local circuit. Both learning and neural activity are placed on low-dimensional subspaces. Future studies must include more biological plausible factors to test the hypothesis in neurophysiological experiments. Another recent exciting achievement is using dynamical mean-field theory to uncover rich dynamical regimes of coupled neuronal-synaptic dynamics (Clark and Abbott, 2024).

Brain states can be considered as an ensemble of dynamical attractors (von der Malsburg, 2018). The key challenge is how learning shapes the stable attractor landscape. One can interpret the learning as a Bayesian inference in an unsupervised way but not the autoregressive manner (see the next section). The learning can then be driven by synaptic weight symmetry breaking (Hou et al., 2019; Hou and Huang, 2020), separating two phases of recognizing the network itself and the rule hidden in sensory inputs. It is very interesting to observe if this picture still holds in recurrent learning supporting neural trajectories on dynamical attractors, and even predictive learning minimizes a free energy of belief and synaptic weights (the belief leads to error neurons) (Jiang and Rao, 2024). New methods must be developed based on the recently proposed quasi-potential method to study non-equilibrium steady neural dynamics (Qiu and Huang, 2024) or dynamical mean-field theory for learning (Zou and Huang, 2024).

8 Challenge VII—Large language models

The impressive problem-solving capabilities of Chat-GPT, where GPT is a shorthand of generative pretrained transformer, are driving the fourth industrial revolution. The Chat-GPT is based on large language models (LLMs) (OpenAI, 2023), which represent linguistic information as vectors in high-dimensional state space, as trained with a large text corpus in an autoregressive way [in analogy to the hypothesis that the brain is a prediction machine (Clark, 2013)], resulting in a complex statistical model of how the tokens in the training data correlate (Vaswani et al., 2017). The computational model thus shows strong formal linguistic competence (Mahowald et al., 2024). The LLM is also a few-shot or zero-shot learner (Brown et al., 2020; Kojima et al., 2022), i.e., the language model can perform a wide range of computationally challenging tasks with prompting alone [e..g, chain-of-thought prompting (Wei et al., 2022)]. Remarkably, the LLMs display a qualitative leap in capability as the model complexity and sample complexity are both scaled up (Kaplan et al., 2020), akin to phase transitions in thermodynamic systems.

In contrast to the formal linguistic competence, the functional linguistic competence is argued to be weak (Mahowald et al., 2024). This raises a fundamental question what the nature of intelligence is or whether a single next-token context conditional prediction is a standard model of artificial general intelligence (Gerven, 2017; Lake et al., 2017; Sejnowski, 2023). Human's reasoning capabilities in real-world problems rely on non-linguistic information, e.g., it is unpredictable when a creative idea for a scientist would come to a challenging problem at hand, which relies on reasoning about the implications along a sequence of thought. In a biological implementation, the language modules are separated from the other modules involving high-level cognition (Mahowald et al., 2024). The LLM explains hierarchical correlations in word pieces in the training corpora rather than hidden casual dependencies. In other words, the neural network has not constructed a mental model of the world, which requires heterogeneous modular networks, thereby unlike humans. Therefore, the LLM does not know what it generates (as a generative model). Even if some key patterns of statistical regularities are absent in the training data, the model can generate perfect texts in terms of syntax. However, the texts may be different from the truth. Knowing what they know is a crucial hallmark of intelligent systems (Gerven, 2017). In this sense, the inner workings of the LLM are largely opaque, requiring a great effort to mathematically formulate the formal linguistic competence and further identify key elements that must be included to develop a robust model of the world. Mechanisms behind the currently observed false-positive such as hallucination (Chomsky et al., 2023) could then be revealed, which may be related to interpolation between modes of token distributions. A recent study interpreting the attention used in transformer-based LLM as a generalized Potts model in physics seems inspiring (Rende et al., 2024), i.e., tokens as Potts spin vectors.

Most importantly, we currently do not have any knowledge on how to build an additional network that is able to connect performance with awareness (Cleeremans, 2014), which is linked to what makes us conscious (see the last challenge). Following Marr's framework, both computational and neural correlates of consciousness remain unknown (Crick and Koch, 2003; Blum and Blum, 2022; Dwarakanath et al., 2023). A current physical way is to consider a Lyapunov function governing complex neural computation underlying LLMs (Krotov and Hopfield, 2020; Ramsauer et al., 2020). In this way, the Lyapunov function perspective will open the door of many degrees of freedom to control how information is distilled via not only the self-attention but also other potential gating mechanisms, based on the dynamical system theory.

9 Challenge VIII—Theory of consciousness

One of the most controversial questions is the origin of consciousness—whether the consciousness is an emergent behavior of highly heterogeneous and modular brain circuits with various carefully designed regions [e.g., ~10¹⁴ connections for the human brain and many functionally specific modular structures, such as the prefrontal cortex, hippocampus, and cerebellum (Harris and Shepherd, 2015; Luo, 2021)]. The subjectivity of the conscious experience is in contradiction with the objectivity of a scientific explanation. According to Damasio's model (Damasio, 2001), the ability to identify one-self in the world and its relationship with the world is considered to be a central characteristic of conscious state. Whether a machine algorithm can achieve the self-awareness remains elusive. The self-monitoring ability [or meta-cognition (Dehaene et al., 2017)] may endow the machine (such as LLMs) to know what they generate. It may be important to clarify how the model of one-self is related to the internal model of the brain [e.g., through recurrent or predictive processing (Storm et al., 2024)]. For example, Karl Friston argued that the conscious processing can be interpreted as a statistical inference problem of inferring causes of sensory observations. Therefore, minimizing the surprise (negative log probability of an event) may lead to self-consciousness (Friston, 2018), which is consistent with the hypothesis that the brain is a prediction machine (Clark, 2013; Gerven, 2017).

There are currently two major cognitive theories of consciousness. One is the global workspace framework (Dehaene et al., 1998), which relates consciousness to the widespread and sustained propagation of cortical neural activities by demonstrating that consciousness arises from an ignition that leads to global information broadcast among brain regions. This computational functionalism was recently leveraged to discuss possibility of consciousness in non-organic artificial systems (Bengio, 2017; Butlin et al., 2023). The other is the integrated information theory that provides a quantitative characterization of conscious state by integrated information (Tononi, 2004). In this second theory, unconscious states have a low information content, while conscious states have a high information content. The second theory emphasizes the phenomenal properties of consciousness (Albantakis et al., 2023), i.e., the function performed by the brain is not subjective experience. Both theories follow a top-down approach, which is in stark contrast to the statistical mechanics approach following a bottom-up manner building the bridge from microscopic interactions to macroscopic behavior. These hypotheses are still under intensive criticism despite some cognitive experiments they can explain (Koch et al., 2016). We remark that conscious states may be an emergent property of neural activities, lying at a higher level than neural activities. It is currently unknown how to connect these two levels, for which a new statistical mechanics theory is required. An exciting route is to link the spontaneous fluctuation to stimulus-evoked response, and a maximal response is revealed in a recurrent computational model (Qiu and Huang, 2024), which can be thought of as a necessary condition for consciousness, as information-richness of cortical electrodynamics was also observed to be peaked at the edge-of-chaos (dynamics marginal stability) (Toker et al., 2022). This peak thus distinguishes the conscious from unconscious brain states. From an information-theoretic argument, the conscious state may require a diverse range of configurations of interactions between brain networks, which can be linked to the entropy concept in physics (Guevara Erra et al., 2016). The large entropy leads to optimal segregation and integration of information (Zhou et al., 2015).

Taken together, whether the consciousness can be created from an interaction of local dynamics within complex neural substrate is still unsolved (Krauss and Maier, 2020). A statistical mechanics theory, if possible, is always promising in the sense that one can make theoretical predictions from just a few physics parameters (Huang, 2022), which may be possible from a high degree of abstraction, and thus a universal principle could be expected.

10 Conclusion

To sum up, in this viewpoint, we provide some naive thoughts about fundamental important questions related to neural networks, for which building a good theory is different from being completed. The traditional research studies of statistical physics of neural networks bifurcate to two main streams: one is to the engineering side, developing theory-grounded algorithms; and the other is to the neuroscience side, formulating brain computation by mathematical models solved by physics methods. In physics, we have the principle of least action, from which we can deduce the classical mechanics or electrodynamics laws. We are not sure whether in physics of neural networks (and even the brain), there exists general principles that can be expressed in a concise form of mathematics. It is exciting yet challenging to promote the interplay between physics theory and neural computations along these eight open problems discussed in the perspective paper. The advances will undoubtedly lead to a human-interpretable understanding of underlying mechanisms of the artificial intelligent systems, the brain and mind, especially in the era of big experimental data in brain science and rapid progress in AI studies.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

HH: Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by the National Natural Science Foundation of China for Grant numbers 12122515.

Acknowledgments

The author would like to thank all PMI members for discussions lasting for 5 years. This perspective also benefits from discussions with students during the on-line course of statistical mechanics of neural networks (from September 2022 to June 2023). The author are also grateful to invited speakers in the INTheory on-line seminar during the COVID-19 pandemic. The author enjoyed a lot of interesting discussions with Adriano Barra, Yan Fyodorov, Sebastian Goldt, Pulin Gong, Moritz Helias, Kamesh Krishnamurthy, Yi Ma, Alexander van Meegen, Remi Monasson, Srdjan Ostojic, and Riccardo Zecchina.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^Most of them were roughly provided in the book of statistical mechanics of neural networks (Huang, 2022). Here we give a significantly expanded version.

References

Abbott, L. F., DePasquale, B., and Memmesheimer, R.-M. (2016). Building functional networks of spiking model neurons. Nat. Neurosci. 19, 350–355. doi: 10.1038/nn.4241

PubMed Abstract | Crossref Full Text | Google Scholar

Achille, A., and Soatto, S. (2017) A separation principle for control in the age of deep learning. arXiv [Preprint]. arXiv:1711.03321. doi: 10.48550/arXiv.1711.03321