Abstract
Inference of the gene regulation mechanism from gene expression patterns has become increasingly popular, in recent years, with the advent of microarray technology. Obtaining the states of genes and their regulatory relationships would greatly enable the scientists to investigate and understand the mechanisms of the diseases. However, it is still a big challenge to determine relationships from several thousands of genes. Here, we simplify the above complex gene state determination problem as an inference of the distribution of the ensemble Boolean networks (BNs). In order to investigate and calculate the distribution of the BNs’ states, we first compute the probabilities of the different BNs’ states and obtain the number of states . Then, we find the maximum possible distribution of the number of the BNs’ states and calculate the fluctuation of the distribution. Finally, two representative experiments are conducted, and the efficiency of the obtained results is verified. The proposed algorithm is conceptually concise and easily applicable to many other realistic models; furthermore, it is highly extensible for various situations.
1 Introduction
Gene network is an important tool to study the biological system from the molecular level. Gene network is an interaction network formed by DNA, RNA, protein, and metabolic intermediates involved in gene regulation. Gene network research is expected to reveal the function and behavior of genome from a systematic perspective. It is helpful in explaining the life process in detail from the genomic level, so as to achieve the goal of systematically explaining cell activity, life activity, disease, and treatment. Therefore, gene network has attracted great attention in the study of biological growth, development, and diseases. The research results of gene network have important theoretical significance and application value.
Genetic regulatory network (GRN) has aroused lots of interests over the past years [1–3]. There exists a large proportion of genes regulating or interacting with the other genes through proteins, which can be modeled by the GRN. Various types of GRNs, such as Boolean networks (BNs) and extended probabilistic Boolean networks, stochastic Boolean networks, and multiple-valued networks [4–6], have been developed for different applications. For example, BNs were first proposed by Kauffpman [7, 8] to model the complex and nonlinear biological systems. Furthermore, various factors, such as gene perturbation, context-sensitive, and asynchronous, are also thoroughly investigated [9, 10].
However, the research results of BNs are relatively limited, due to the difficulties for solving logical dynamic systems with a systematic tool [11]. In the viewpoint of biology, considering there are a huge number of genes expression states at the same time, this incurs the difficulties of inferring the states of the gene expression at a given time stamp.
Recently, Cheng et al. [11] proposed the semi-tensor product (STP) of matrices, which can only represent the logical equation as an algebraic equation, but also convert the dynamics of a BCN into a linear discrete-time control system. Based on such reformation, many interesting properties have been obtained for BCN [12–16]. The optimal control is an interesting topic in system control theory. Other than the STP technique, they developed statistical methods for solving problems in BNs. A Mayer-type optimal control problem for BCNs with multi-input and single-input has been well studied in Refs. [17] and [18], respectively. The states of biological networks and electronic networks are often influenced by instantaneous disturbances. In addition, they may still experience abrupt changes at certain points, because of the switching phenomenon and sudden noise, that is, impulsive effects. Impulsive dynamical networks have attracted the interests of many researchers for their various applications in information science, bioinformatics, and automated control systems.
There are many cells with the same function in an organ. However, it is hard to get the states of every single cell. Here in this study, we find that the states of a proportion of cells share one particular distribution. Thus, it is useful for biologists to conclude whether the illness is caused by the changes of the cell state distribution or not.
From a biological standpoint, inference of gene regulation mechanism from expression patterns is becoming increasingly important, along with the invent of DNA microarray technology. Thus, we need to get the ensemble distribution of the BNs and determine the states of genes, which is the key for further exploration of the expression profiles of thousands of genes. Specifically, in this study, we proposed an algorithm for inferring the distribution states of the BNs. First, we compute the probability of different BNs’ states and get the value of . Second, we find the maximum possible distribution of the number of BNs’ states, as well as the function of this distribution. Finally, two representative experiments are conducted to verify the efficiency of the obtained results. Although the practical genetic networks are different from the BNs in this study, the theoretical and practical results can be extended easily to the real-world scenarios. Moreover, the proposed algorithm is highly extensible in various scenarios because of the computational simpleness.
2 The Finite Number of Boolean Networks
2.1 The States of Boolean Networks
This section provides a base knowledge for Section 2.1. is the only hypothesis. In this section, we assume that the probability of each state is equivalent, which is used for the next efficiency.
First, we suppose that there are many Boolean networks in one group, and the probability of different BNs’ states is P.where is the number of BNs.
We assume that is however the jth state, is the number of in the BNs, and is the weight of . Evidently, the number of states is M, which is calculated as follows:and the value of the cells is gives as
Although we know the number of cells, it is difficult to determine, even if a distribution is given, what the specific state of each cell is. For example, suppose there are three cells in state 1 and five cells in state 2, we do not know which three cells are in state 1 and which five cells are in state 2. So the theorem 1 is given as follows in order to solve this problem.
We know the number of BNs in the ensemble is M and the value of the ensemble . Given a distribution {}, it is easy to determine the number of states as
Proof: The system consists of M number of identical transforms, which have permutations. Given the condition that the total number of states do not change, if there exists n transforms , denoted as the state 1 switching to the state 2, the number of states will increase by n, while the number of states will decrease by n. Therefore, the state permutation number is , and
Two specific examples are given to illustrate
Theorem 1, while there is an ensemble with 5 BNs. Thus,
.
(1) We assume that there are three Boolean networks in state and two Boolean networks in state , then is
(2) We assume that there are four Boolean networks in state and one Boolean network in state , then is
2.2 The Maximum Probabilistic Distribution of Boolean Networks
In this section, we study and prove the maximum probabilistic distribution of the Boolean network. The maximum probabilistic distribution is a Gaussian distribution, and then the cells’ states distribution can be determined as shown in Figure 1.
FIGURE 1
Although given , M, and the distribution , it is not easy to figure out the particular states where the BNs are. The best probability of the distribution needs further calculation. Given Eq. 1, we can find that the more states in the system, the larger probability the states are. The probability of each distribution of the ensemble networks is proportional to the number of the BN state . Thus, when determining the maximum probability, the maximum should be specified. Under the constrained conditions (2) and (3), we can use the differentiation to calculate the maximum value of the states. Two Lagrange multipliers α and ß will be utilized, and the condition of the peak can be written as follows:
To determine the probability, we need to assume that the number of M is relatively large. In contrast, the data of BNs do not need be large. When M goes to infinite, also goes to infinite. For , we can use the Stirring’s approximation to simulate
Using Eq. 8 (the specific calculation process is shown in the Appendix), we can get the following equation:
When we compute the partial derivative of , there are two ways to solve this problem (Eq. (8)), that is, one is fixing the M, while the other does not fix the M. The difference between the two solutions is a constant. In order to boost the computation, the second way for solving Eq. 9 is used.
Substituting Eqs 10–12 into Eq. 7, we can get the following equations:
So there is
When given the number of BN M, represented as the scale, we can get the distribution , given that the parameters α and ß should be specified in advance. To prove Theorem 2, two definitions are given as follows.
Definition II1 When is the best probability distribution, the probability of system in the state j is
Definition II 2 Partition function [19] iswhere Q indicates the sum of the probability of all the states. The partition of Eq. 17 plays an important role as a normalization constant. and is the definition of E for succinctly, and the latter one in terms of formula expression is good for clarity and following computation.After the computation of , α can be eliminated, and ß can be expressed by the mean value E:From Eq. 3, it can be rewritten asReplacing Eq. 19 with Eq. 17, we can getFrom the result, we can get the information about that in a canonical ensemble. When E is given, M tends to infinite, and ß do not have any relationship with M.
When H and E are given and M tends to infinite, the best of the distribution M is the true distribution. In other words, the fluctuation is equal to 0.
ProofWe need to talk about a function,However,Since the second term and the third term of f are the linear functions, the second derivative of equals to zero, which means the peak is stable.Using the Taylor series which starts at point , the equation can be obtained as follows:The peak of f is as follows:Substituting Eqs 21 and 24 into Eq. 23, we can getIgnoring the term , we can getSo there isThus, we complete the proof of this theorem.
2.3 The Fluctuation of the Distribution
This section is aiming to prove that cells are impossible in the same states, when the number of cells goes to infinity.
It is easy to find that Eq. 27 is a Gaussian distribution. Now, we need to prove the function Eq. 7 is a δ function. We need to prove the fluctuation would be eliminated when . Here, Theorem three is provided as follows:
When , the value of fluctuation tends to be 0, that is,
Proof: There is a distribution thatObviously, there isandThen Eq. 29 can be rewritten asComparing Eq. 27 with Eq. 30, we can getand substituting it into Eq. 28, there iswhere . Hence, the proof of the theorem is completed.Until now, the proof of Theorem three is finished. When H and E are fixed and , the distribution with the maximum probability is the true distribution.
3 Experiments
In this section, we perform analysis of the cells’ states distribution model, that is, Eq 26. We establish that two experiments are conducted in order to illustrate the distribution of the BNs’ states, which can be used to verify our conclusions. Since there are no practical data for the state changes of the same type of cells, we can only simulate the transformation process of these cells through Boolean network, and then we also perform extensive analyses of the data of the state changes of these cells.
3.1 A Boolean Network with 100 Cells
In this example, we choose the state change function [17]. While the number of cells is 100, the number of the same Boolean is 1,000. And the Boolean network’s state change rule is illustrated as follows:where indicates the cell’s state, while or 0, and the function indicates the state change rule. Hence, in this example, there are four states in 100 cells, and the state change rule is shown in Figure 2B.
FIGURE 2
Assume that the number of four initial states in the cells is shown in Table 1.
TABLE 1
| State of cell | ||||
|---|---|---|---|---|
| Number | 198 | 182 | 319 | 301 |
Number of initial states of cells.
From Theorem 1, we can obtain the k combinations.
We generate the particular network relationship between cells in a random manner, where each node represents a cell, and the edge indicates a connection between two cells. The probability of connecting the two cells is initialized as 0.05. The indicators of the association network between the cells are shown in the following table.
Through Figure 3; Table 2, we get the basic characteristics of this cellular network; there are 1,000 nodes, 2,781 edges, and so on. The visualization of the network is shown in Figure 2B. In this figure, different colors of the nodes are expressed as different states of the cells.
TABLE 2
| Node | Edge | Average degree | Clustering coefficient |
|---|---|---|---|
| 1,000 | 2,781 | 2.45 | 0.04 |
Cellular network statistical characteristics.
FIGURE 3
When the cell states change, they will be initialized with a random state, and the influence of states by other states is modeled as well. Assuming that the number of identical states between the connected cells is greater than 10, the other cells directly skip the changed state, and switch directly to the next state. Thus, the function can be obtained as follows:where indicates the distribution, is the probability of the cells state, M is the number of cells, and is the number of jth states.
The state change rule as shown in Figure 2A demonstrates the end state is , meaning the cells getting the state twice. In addition, the state of the cells will be randomly assigned, in Figure 2B, and it is easy to find that when , the distribution reaches its mode, showing that when all the states of the cells are equal, the state in the collection of cells is the most prominent.
3.2 A Boolean Network with 150 Cells
In this example, we choose the state change function similar to the previous reported one [12]. Here, the number of cells is 500, meaning the number of the same Boolean is 500. Along with the Boolean network’s state change, the mathematical rules can be formatted aswhere mean the cell’s state, and or 0, and the function is the state change rule, so in this example, there are eight states in 450 cells, and the state change rule is shown in Figure 4B.
FIGURE 4
Assume that the number of four initial states in the cells is as shown in Table 3.
TABLE 3
| State of cell | ||||
|---|---|---|---|---|
| Number | 54 | 60 | 57 | 48 |
| State of cell | ||||
| Number | 51 | 54 | 66 | 42 |
Number of initial states of cells.
Form Theorem 1, we can get there are about k combinations.
We generate a network relationship among cells in a random manner, where each node represents the cell, and the edge indicates that there is a connection between the two cells, and the probability of connecting the two cells is 0.05. The indicators of the association network between the cells are shown in the following Table 4.
TABLE 4
| Node | Edge | Average degree | Clustering coefficient |
|---|---|---|---|
| 450 | 1890 | 3.73 | 0.046 |
Cellular network statistical characteristics.
Through Figure 4; Table 3, we get the basic characteristics of this cellular network; there are 450 nodes, 1890 edges, and so on. The visualization of the network is shown in Figure 5B. Here, different colors of the nodes are expressed as different states of the cells.
FIGURE 5
The state change rule as shown in Figure 5A, and the end state is , meaning the cells get the state twice, and the state of the cells will be randomly assigned, in Figure 5B; it is easy to find that when , the distribution reaches the peak. It means that when all the states of the cells are equal and the number of the eight states is approximately equal to 18, the collection of cells is the most prominent state.
From these two experiments, we verify that the distribution of these states is a Gaussian distribution, and these cells cannot be in the same state when the number of cells approaches to the infinity. Thus, the above theorems are right.
4 Conclusion
In this article, we study and calculate the distribution of the Boolean networks’ states. First, we compute the probability of different BNs’ states and get the value of , then we find the maximum possible distribution of the number of BNs’ states. Furthermore, we calculate the fluctuation of the distribution. Finally, two representative experiments are conducted to verify the efficiency of the obtained results. Although the real genetic networks are different from the BNs, the theoretical and practical results in this study may be extended for more realistic models. Since the proposed algorithm is conceptually concise and efficient, it is highly extensible for various situations.
Statements
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
5 Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material; further inquiries can be directed to the corresponding author.
6 Author Contributions
XC drafted the idea. ZL did the derivation, while BR drafted the manuscript. All authors have read through the manuscript.
7 Funding
This work was supported in part by National Natural Science Foundation of China (Grant Nos. 62003273, 62073263), Natural Science Foundation of Shaanxi Province (Grant No. 2020JQ-217), Fundamental Research Funds for the Central Universities (Grant No. 3102019HHZY03002).
References
1.
IdekerTGalitskiTHoodL. A NEWAPPROACH TODECODINGLIFE: Systems Biology. Annu Rev Genom Hum Genet (2001) 2:343–72. 10.1146/annurev.genom.2.1.343
2.
KimJParkS-MChoK-H. Discovery of a Kernel for Controlling Biomolecular Regulatory Networks. Sci Rep (2013) 3:2223. 10.1038/srep02223
3.
ZhangZXiaCChenZ. On the Stabilization of Nondeterministic Finite Automata via Static Output Feedback. Appl Math Comput (2020) 365:124687. 10.1016/j.amc.2019.124687
4.
ShmulevichIDoughertyERKimSZhangW. Probabilistic Boolean Networks: a Rule-Based Uncertainty Model for Gene Regulatory Networks. Bioinformatics (2002) 18(2):261–74. 10.1093/bioinformatics/18.2.261
5.
LiangJHanJ. Stochastic Boolean Networks: An Efficient Approach to Modeling Gene Regulatory NetworksBMC Syst Biol (2012) 6:113. 10.1186/1752-0509-6-113
6.
Peican ZhuPJie HanJ. Stochastic Multiple-Valued Gene Networks. IEEE Trans Biomed Circuits Syst (2014) 8(1):42–53. 10.1109/tbcas.2013.2291398
7.
KauffmanSA. Metabolic Stability and Epigenesis in Randomly Constructed Genetic Nets. J Theor Biol (1969) 22:437–67. 10.1016/0022-5193(69)90015-0
8.
KauffmanSA. The Origins of Order. Self-Organization and Selection in Evolution. Oxford University Press (1993).
9.
ZhuPLiangJHanJ. Gene Perturbation and Intervention in Context-Sensitive Stochastic Boolean Networks. BMC Syst Biol (2014) 8–60. 10.1186/1752-0509-8-60
10.
ZhuPHanJ. Asynchronous Stochastic Boolean Networks as Gene Network Models[J]. J Compu. Biol. (2014) 21(10):771–83. 10.1089/cmb.2014.0057
11.
ChengD. Analysis and Control of Boolean Networks: A Semi-Tensor Product Approach[M]. Berlin: Springer (2010).
12.
ChengDQiH. A Linear Representation of Dynamics of Boolean Networks. IEEE Trans Automat Contr (2010) 55:2251–8. 10.1109/tac.2010.2043294
13.
LiRYangMChuT. State Feedback Stabilization for Boolean Control Networks. IEEE Trans Automat Contr (2013) 58:1853–7. 10.1109/tac.2013.2238092
14.
LiBLuJLiuYWuZ-G. The Outputs Robustness of Boolean Control Networks via Pinning Control. IEEE Trans Control Netw Syst (2020) 7(1):201–9. 10.1109/tcns.2019.2913543
15.
LiuALiH. On Feedback Invariant Subspace of Boolean Control Networks. Sciece China Inf Sci (2020) 63(12):229201. 10.1007/s11432-019-9869-6
16.
ZhangZXiaCChenSYangTChenZ. Reachability Analysis of Networked Finite State Machine with Communication Losses: A Switched Perspective. IEEE J Select Areas Commun (2020) 38(5):845–53. 10.1109/jsac.2020.2980920
17.
LaschovDMargaliotM. Observability of Boolean Networks: A Graph-Theoretic Approach. Cambridge, U.K.: Cambridge Scientific Publishers, Cambridge (2013).
18.
LaschovDMargaliotM. A Maximum Principle for Single-Input Boolean Control Networks. IEEE Trans Automat Contr (2011) 56:913–7. 10.1109/tac.2010.2101430
19.
BaxterRJ. Partition Function of the Eight-Vertex Lattice Model. Ann Phys (2000) 281(1-2):187–222. 10.1006/aphy.2000.6010
Summary
Keywords
network, Gaussian distribution, state pattern, gene expression, Boolean
Citation
Cui X, Ren B and Li Z (2021) Determining the Maximum States of the Ensemble Distribution of Boolean Networks. Front. Phys. 9:690748. doi: 10.3389/fphy.2021.690748
Received
04 April 2021
Accepted
28 May 2021
Published
12 November 2021
Volume
9 - 2021
Edited by
Chengyi Xia, Tianjin University of Technology, China
Reviewed by
Jinling Liang, Southeast University, China
Zhipeng Zhang, Tianjin University of Technology, China
Updates
Copyright
© 2021 Cui, Ren and Li.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Xiaodong Cui, xdchoi@gmail.com
This article was submitted to Social Physics, a section of the journal Frontiers in Physics
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.