Network structure indexes to forecast epidemic spreading in real-world complex networks

Bellingeri, Michele; Bevacqua, Daniele; Turchetto, Massimiliano; Scotognella, Francesco; Alfieri, Roberto; Nguyen, Ngoc-Kim-Khanh; Le, Thi Trang; Nguyen, Quang; Cassi, Davide

doi:10.3389/fphy.2022.1017015

ORIGINAL RESEARCH article

Front. Phys. , 02 November 2022

Sec. Social Physics

Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.1017015

This article is part of the Research Topic Epidemic Models on Networks View all 5 articles

Network structure indexes to forecast epidemic spreading in real-world complex networks

Michele Bellingeri^1,2,3*

Daniele Bevacqua⁴

Massimiliano Turchetto^2,3

Francesco Scotognella^1,5

Roberto Alfieri^2,3

Ngoc-Kim-Khanh Nguyen⁶

Thi Trang Le⁷

Quang Nguyen^7,8,9

Davide Cassi^4,5

¹Dipartimento di Fisica, Politecnico di Milano, Milano, Italy
²Dipartimento di Scienze Matematiche, Fisiche e Informatiche, Università di Parma, Parma, Italy
³INFN, Gruppo Collegato di Parma, Parma, Italy
⁴PSH, UR 1115, INRAE, Avignon, France
⁵Center for Nano Science and Technology@PoliMi, Istituto Italiano di Tecnologia, Milan, Italy
⁶Faculty of Fundamental Sciences, Van Lang University, Ho Chi Minh City, Vietnam
⁷John von Neumann Institute, Vietnam National University, Ho Chi Minh City, Vietnam
⁸Institute of Fundamental and Applied Sciences, Duy Tan University, Ho Chi Minh City, Vietnam
⁹Faculty of Natural Sciences, Duy Tan University, Da Nang, Vietnam

Complex networks are the preferential framework to model spreading dynamics in several real-world complex systems. Complex networks can describe the contacts between infectious individuals, responsible for disease spreading in real-world systems. Understanding how the network structure affects an epidemic outbreak is therefore of great importance to evaluate the vulnerability of a network and optimize disease control. Here we argue that the best network structure indexes (NSIs) to predict the disease spreading extent in real-world networks are based on the notion of network node distance rather than on network connectivity as commonly believed. We numerically simulated, via a type-SIR model, epidemic outbreaks spreading on 50 real-world networks. We then tested which NSIs, among 40, could a priori better predict the disease fate. We found that the “average normalized node closeness” and the “average node distance” are the best predictors of the initial spreading pace, whereas indexes of “topological complexity” of the network, are the best predictors of both the value of the epidemic peak and the final extent of the spreading. Furthermore, most of the commonly used NSIs are not reliable predictors of the disease spreading extent in real-world networks.

Introduction

The fundamental role of networks in epidemiology has been recognized in the last years [1–12]. The disease spreading can be modeled as a network where nodes (vertices) represent the individuals (i.e., the hosts) and links (edges) indicate the social contacts among them [1–9]. Real-world complex networks display many structural connectivity patterns, such as the heavy-tailed degree distribution, small-world effect, high clustering coefficient, self-similarity, assortativity, community structures, etc. [1, 13–18]. These network structural connectivity patterns may affect the evolution of the spreading process [1, 5, 18–21]. Knowing the relationship between network structure indexes (NSIs) and the spreading dynamics is crucial to prevent and control diseases [17].

The field measures and analyses of real-world complex networks can be extremely consuming, in terms of both money and time. It is therefore necessary to know which features of the network structure should be first measured to assess the network vulnerability to disease and consequently optimize the control [1, 18–21]. To address this issue, we gathered a dataset of 50 real-world complex systems. They represent archetypical examples of network structures in different domains of reality, ranging from social, computers, internet, transportation, biological, and ecological networks (see Supplementary Materials S1.2 for details). We explicitly simulated a disease spreading over them via a classical compartmental susceptible–infected–recovered (SIR) model [1–5].

We derived three indicators of the speed and magnitude of the disease spread: 1) the time steps needed for the disease to strike 15% of the network nodes, $τ_{15}$ ; 2) the overall number of nodes eventually affected by the disease, $T I$ ; and 3) the maximum disease prevalence, i.e. the maximum number of nodes concurrently infected, $ζ$ . The first is a measure of the speed of the spreading process. The second is a measure of the impact of the disease over the population and it is likely to correlate with the number of severe, and possibly fatal, cases. The third is a measure of the peak and can be used, e.g., to predict the pressure on the care structures.

We considered 40 different NSIs, and we tested them, using 4 different regression models, which were the best predictors of the epidemic vulnerability simulated by the SIR model. We considered both classic NSIs from network science literature, graph theory, chemical graph theory, and original NSIs conceived in the present work (See Table 2 in the Methods and Supplemental Material S1.1). Regarding the type of relationship between the 3 disease spread indicators $y_{i}$ , representing the dependent variable, and the 40 candidate NSI $x_{j}$ , representing the independent variable, we considered 1) linear $y_{i} = a x_{j} + b$ , 2) quadratic $y_{i} = a x_{j}^{2} + b x_{j} + c$ , 3) exponential $y_{i} = a \exp (- b x_{j})$ , and 4) monomolecular $y_{i} = a (1 - b \exp (- c x_{j}))$ regressions.

To select the best, among 40, NSI predictor, and the best, among 4, regression type, we ranked the 40*4 = 160 different models via the Akaike information criterion (AIC). AIC aims to select the model with the best goodness of fit to data while discouraging overparameterization and model complexity [31]. Eventually, for any model, we computed the fraction of variance unexplained (FVU). FVU is a measure of the goodness of fitting of the model, with FVU tending to zero for “ideal” models explaining the entire variability in the observations.

Results

The best results of the model selection procedures and the best model performances are reported in Table 1. The forms and fitting of the best regression models, for different spreading indicators and values of transmissibility, are reported in Figure 1. The spreading indicators vs. NSI scatterplots are in Supplementary Figures S3–S7. All the results of the model selection procedures and performances are in Supplementary Tables S2–S5.

TABLE 1

TABLE 1. The best ten NSIs to predict epidemic spreading for each spreading index.

FIGURE 1

FIGURE 1. The best regression models of the Network Structural indexes (NSI) vs. Spreading Indicators (SI). Left column: the best regression models for SIR parameters β = 0.03 and γ = 0.04. Right column: the best regression models for SIR parameters β = 0.06 and γ = 0.04. Best for $τ_{15}$ : (A) $τ_{15}$ as a function of the average normalized node closeness $n C l o$ ; the relationship is described by an exponential model $τ_{15} = a \cdot e^{- b \cdot n C L O}$ with a = 751.31 and b = −14.38 (FVU = 0.026); (B) $τ_{15}$ as a function of the average node distance $\bar{d}$ ; the relationship is described by a linear model $τ_{15} = a \cdot \bar{d} + b$ with a = 9.66, b = −21.86 (FVU = 0.02). Best for $ζ$ : (C) Non-linear regression of $ζ$ vs. $\bar{k} / \bar{d}$ index; the relationship is described by a mono-molecular function model $ζ = a \cdot (1 - b \cdot e^{- c \cdot (\bar{k} / \bar{d})})$ with a = 0.63, b = 1.05 and c = 0.46 (FVU = 0.078); (D) Non-linear regression of $ζ$ vs. $\bar{k} / \bar{d}$ index; the relationship is described by a mono-molecular function model $ζ = a \cdot (1 - b \cdot e^{- c \cdot (\bar{k} / \bar{d})})$ with a = 0.7, b = 1.02 and c = 0.7 (FVU = 0.091). Best for $T I$ : (E) Non-linear regression of $T I$ vs. $B B$ index; the relationship is described by a mono-molecular function model $T I = a \cdot (1 - b \cdot e^{- c \cdot B B})$ with a = 0.91, 1.13 and c = 0.87 (FVU = 0.091); (F) Non-linear regression of $T I$ vs. $\bar{k_{s}}$ index; the relationship is described by a mono-molecular function model $T I = a \cdot (1 - b \cdot e^{- c \cdot {\bar{k}}_{s}})$ with a = 0.95, b = 2.22 and c = 0.81 (FVU = 0.181). Structural indicators key: $n C l o$ average normalized node closeness; $\bar{d}$ average node distance, $\bar{k} / \bar{d}$ index; $B B$ index, $\bar{k_{s}}$ average node coreness. Spreading indicators key: $τ_{15}$ time to reach the 15% of infected nodes, $T I$ total fraction of infected, $ζ$ normalized infected peak.

The pace of the disease $τ_{15}$

When considering the initial pace of disease ( $τ_{15}$ ), the best models use as explanatory variables the average normalized node closeness $n C l o$ (in an exponential form, Figure 1A), for low epidemic transmission (β = 0.03), and the average node distance $\bar{d}$ (linear relationship, Figure 1B) for high epidemic transmission (β = 0.06). The ‘distance’ $d_{uv}$ between two nodes u and v is the minimum length of a path joining them [14]. In other terms, the “distance” between two nodes u and v is the shortest path length, i.e., the minimum number of links to travel between them [14]. The average node distance $\bar{d}$ , also called characteristic path length, measures the mean number of links to travel along the shortest path among node pairs in the network [14]. Figure 1B shows, for the higher epidemic transmission rate, the strong positive linear relationship between $\bar{d}$ and $τ_{15}$ , indicating that the higher the average node distance $\bar{d}$ , the higher the time to infect the 15% of the network nodes.

The node closeness (or closeness centrality) is a measure of centrality in a network, calculated as the reciprocal of the sum of the distances (shortest paths length) between the node and all other nodes in the network [32]. Usually, the node closeness centrality may be normalized by dividing it by the term $N - 1$ , where $N$ is the network nodes number. It follows that the normalized node closeness of node i is the inverse average distance from node i to all other nodes (See Supplementary Material S1.1). Therefore, the new NSI “average normalized node closeness” $n C l o$ , we propose in this study, can be viewed as a measure of how many close network nodes are to each other, and it is an alternative indicator of evaluating the node distance in the network. For these reasons, even for a lower epidemic transmission rate, it emerges a strong negative relationship between the distance among network nodes ( $n C l o$ ) and the pace of the spreading (lower $τ_{15}$ ) (Figure 1B). Noteworthy, both $\bar{d}$ and $n C l o$ return very high goodness of fitting models, by explaining almost the entire variability in the $τ_{15}$ observations (FVU∼2%, see Table 1). Taking these results together, our analyses show that the most important network structural factor to predict initial spreading speed is the notion of node distance.

The infected peak $ζ$

When considering the maximum number of concurrently infected nodes ( $ζ$ ), the best models use the predictor $\bar{k} / \bar{d}$ in a mono-molecular form for both low and high epidemic transmission (Figures 1C,D). The accuracy of the $\bar{k} / \bar{d}$ regression model is high, by explaining more than the 90% variability in the $ζ$ observations for both low and high epidemic transmission (FVU < 10%, see Table 1). The network infected peak $ζ$ quickly raises with $\bar{k} / \bar{d}$ , and reaches a plateau for higher $\bar{k} / \bar{d}$ values. The $\bar{k} / \bar{d}$ index (originally $A / D$ index), as the ratio of the average node degree $\bar{k}$ (i.e., the average number of links per node) and the average node distance $\bar{d}$ , was introduced in mathematical graph theory to encompass the topological complexity of the network [15]. Thus, the peak of infected individuals in the network $ζ$ , that is the peak prevalence of the epidemic, is positively related to network connectivity (average node degree $\bar{k}$ ), and it decreases as a function of the node distance ( $\bar{d}$ ).

The total infected $T I$

When considering the overall number of nodes that have been infected during an epidemic ( $T I$ ), for low epidemic transmission (β = 0.03) the best predictor is the $B B$ index in a mono-molecular form (Figure 1E). The $B B$ index was introduced by Bonchev and Buck [15] to improve the $\bar{k} / \bar{d}$ measurement, and it follows the same rationale, accounting for the ratio between the node degree and a measure of the node distance (i.e., the farness) in the network. Let’s define the “farness” of the node i as $ν_{i} = \sum_{j = 1}^{N - 1} d_{i j}$ , where $d_{i j}$ is the distance between node i and node j, the $B B$ index is $B B = \sum_{i = 1}^{N} \frac{k_{i}}{ν_{i}}$ where $k_{i}$ is the node degree of i and the $ν_{i}$ is the “farness” of the node i. We find that $T I$ follows a saturating function of $B B$ index, showing how the total number of infected individuals may increase with network connectivity (node degree $k$ ) and decrease as a function of the node distance (here measured by the farness $ν$ ).

For high epidemic transmission (β = 0.06) the best predictor is the average node coreness $\bar{k_{s}}$ , in a mono-molecular form (Figure 1F). Node coreness (or coreness centrality) is a node centrality measure that shares the nodes in different sub-networks called k-core. The k-core of a network is a maximal sub-network in which each node has at least degree k [5]. In other terms, the coreness of a node is k if it belongs to the k-core but not to the (k + 1)-core. Kitsak et al. [5] showed that nodes of higher coreness are “influential spreaders” in the network, i.e., the nodes located in the network core determine a higher speed of network spreading. On the other hand, the epidemic starting in the network core may cover a large number of nodes, and the coreness centrality may be an efficient measure to individuate the nodes acting as efficient spreaders [5]. In this research, we introduce the $\bar{k_{s}}$ index as the average value of node coreness to evaluate the global network spreading. We can interpret networks with higher $\bar{k_{s}}$ as compact structures, where nodes of a higher degree are also located in the core of the network. We find that $T I$ is well fitted by a saturating function of $\bar{k_{s}}$ , showing how the total number of infected individuals may increase in networks of higher average node coreness. Nonetheless, we outline that the performance of $\bar{k_{s}}$ is only slightly better than the $\bar{k} / \bar{d}$ prediction, and the regression models return almost equal goodness of fitting, with almost the same AIC and FVU (Table 1).

Discussion

Our results show that to predict network spreading to consider the distance among nodes is more important than focusing on their connectivity level. The most usual NSI evaluating the connectivity level of the network, i.e. the average node degree $\bar{k}$ [13], return a poor prediction of the network spreading for all the three spreading indicators used in this study (Table 2). In specific, $\bar{k}$ is strongly ineffective to explain the initial speed of the spreading $τ_{15}$ (FVU∼0.5, Supplementary Tables S2, S4).

TABLE 2

TABLE 2. Network structural indexes (NSI) list. For the NSIs from the literature is indicated the reference between square brackets; for the new NSIs is indicated “new” and the NSI number from they are derived.

This seems counter-intuitive, since higher connectivity levels correlate, on average, with lower node distance in the network [1–13].

Focusing $\bar{k} / \bar{d}$ and $B B$ indexes and ideal-types of the network structure we can figure out how the network connectivity level alone (i.e., the density of network links) may induce misleading predictions of the network epidemic spreading.

Both $\bar{k} / \bar{d}$ and $B B$ increase from the chain network (lower complexity), through the star network (medium complexity), to the complete network (maximum complexity) (Figure 2). Following this simplified ideal scheme, it is possible to figure out the classes of real-world networks and their epidemic spreading entity: it would be the lowest in chain-like network owing smallest average node degree $\bar{k}$ and highest average distance $\bar{d}$ (or farness $ν$ ), average in the star network owing $\bar{k}$ similar to the chain network, but lower $\bar{d}$ , and highest in the complete network, that maximize $\bar{k}$ and minimize $\bar{d}$ (or farness $ν$ ).

FIGURE 2

FIGURE 2. Model network examples of increasing complexity following Bonchev and Buck [16] theory of network complexity. When evaluating the complexity of the network with the rationale of the network structural indexes $A / D$ and $B B$ [16], the chain network is of lower complexity and low spreading pace, the star network is of intermediate complexity and medium spreading pace, and the complete network is the structure of maximum complexity, with the fastest spreading pace. The node distance always decreases with increasing complexity, i.e., passing from the chain to the star, and passing from the star to the complete network. Nonetheless, the node connectivity (links per node) holds constant passing from the chain to the star network, whereas increasing from the star to the complete network.

In particular, the higher spreading of the star network with respect to the chain network, hence these ideal-types of network show similar network connectivity, they present very different node distance, allows to explain how the network connectivity alone may not be a reliable predictor of the spreading entity, and networks of similar connectivity level may present very different spreading entity. On the other hand, our outcomes show that the magnitude of the de-correlation between connectivity and node distance of the real-world networks may be higher enough to make the network connectivity alone a scarce predictor of the epidemic spreading.

This outcome is particularly important in the context of the epidemic spreading, such as the SARS-Cov2 research. Important and recent research by Thurner and colleagues [11] focusing SIR epidemic spreading on networks showed that classic epidemiological models formulated as differential equations, and based on the mean-field approximation (assuming that every node/individual in principle can infect any other), can produce a misleading prediction of the real epidemic spreading extent. Consequently, Thurner et al. [11] questioned the applicability of standard compartmental models, which neglect the network structure, to describe the real epidemic spreading and the SARS-Cov2 containment phase.

From one side, the outcomes of our research strongly support the Thurner et al. [11] main statement showing how neglecting the network structure may perform erroneous predictions of the real epidemic spreading. On the other side, our results go further and extend the Thurner et al. [11] research outcomes. Here, we show that the network epidemic models investigating the SARS-Cov2 epidemic spreading focus on the network connectivity density as a main structural feature to parameterize network epidemic spreading, as done by Thurner et al. [11] and many recent network epidemic models [6, 8], may perform incomplete or even unrealistic spreading predictions.

Further, most of the non-pharmaceutical interventions (NPIs) implemented to curb the SARS-Cov2 epidemic follow the rationale to reduce social interactions [33, 34], that is to decrease the number of the network links. Our analyses suggest that implementing NPIs with the aim to space out the nodes, i.e., increasing the node distance in the network, would be a more effective strategy to halt the epidemic. This would translate into a reduced peak of infected individuals ( $ζ$ ) and, consequently, a lower number of infected individuals at the end of the epidemics ( $T I$ ).

Last, we outline that many of the NSIs conceived in complex network science to encompass important network features that may potentially be leading to differently spreading entities, are not able to perform reliable predictions of the SIR epidemic spreading in real-world networks. The modularity ( $Q$ ), the transitivity ( $T$ ), the degree assortativity ( $A$ ), and the different degree heterogeneity indicators ( $σ_{k}^{2}$ , $σ_{k}$ , $A H$ , $E H$ ) of the network, that are assumed to influence network spreading [1, 2, 18, 20], return very low fitting model outcomes (Supplementary Tables S2, S4). For example, the degree assortativity $A$ returns FVU > 0.8 for all the spreading indicators, and the network modularity $Q$ shows FVU > 0.45 for all the spreading indicators. We argue that the weak outcomes of these NSIs may be due to two main reasons. On the one hand, in real-world networks, the aforementioned NSIs may present non-linear relationships among them, with contrasting effects in determining the network spreading entity. For example, Volz et al. [17] showed that the average node transitivity ( $T$ ) alone is not always sufficient to determine the full epidemiological dynamics, since the epidemic spreading depends not only on the node transitivity, but also on the nature of the interactions with other network structural features [17].

On the other hand, our results show that the node distance is the most important factor affecting the network spreading. The aforementioned NSIs may not correlate with node distance, and, as explained above for the relationship between average node degree and node distance (Figure 2), real-world networks with the same value for these NSIs may present different node distance $\bar{d}$ . For example, the relationship between degree assortativity ( $A$ ) and node distance in the network is not linear, with contrasting effects on the epidemic spreading [18]. For this reason, real-world networks with similar values of these NSIs may present different spreading entities.

Materials and methods

Network structural indexes

In Table 2 we list the network structural indexes (NSIs) used in this study, a short definition, and their reference. For the NSIs coming from literature, we indicate the literature reference. For the new ones formulated in the present study by modifying or combining notions or indicators from literature, we list the indicators from which the new ones are derived. In the Supplementary Material S1.1, we furnish the extended definition of each network structural indicator.

Real-world complex networks database

We analyzed a set of 50 high-quality real-world networks from different fields of science (see Supplementary Material S1.2). The number of real-world networks for different areas of science is: road transportation 6, airports transportation 2, cargo-ship transportation 1, biological 4, ecological 2, social 13, citation 2, phone 2, internet 5, financial 1, computers 9, email 3. The complete list of real-world networks with network type and reference is in Supplementary Table S1.

The susceptible–infectious–recovered dynamic epidemics model

We used a susceptible-infected-recovered (SIR) model to numerically simulate the spreading entity over real-world networks. Type SIR models can successfully predict the dynamics of many infectious diseases. See Keeling and Rohani [35] for an overview. When considering SIR models over a network, at any time, a node can be in one of three possible compartments: susceptible (S), infected (I), and recovered (R). If a node/individual is infected, it will infect susceptible nodes linked to it with a transmission rate, β. An infected node/individual stays infectious on average for γ⁻¹ consecutive days, i.e., recovers with a rate equal to γ. Recovered node/individual can no longer infect others and its state will no longer change, which is equivalent to assume that immunization does not vanish in the considered time horizon. We initialized the system by fixing all nodes/individuals as susceptible except one, randomly chosen, whose state is set as infected. The system dynamics can then be solved and permit to model the epidemics evolution over time. To simulate the SIR spreading on a network we used the NDlib Python library presented in Rossetti et al. [36]. We fix the SIR parameters β equals 0.03 or 0.06, and γ = 4. We adopt two different transmission rate values of parameter β to describe low and high epidemic transmission. Higher values of β represent epidemics with higher transmissibility. We chose relatively small values for β, according to Kitsak et al. [5], so that the infected percentage of the population in the network remains small and the simulation can outline the role of the network structure for the spreading. In the case of larger β values, where spreading can reach a large fraction of the population in a few steps, the spreading would cover almost all the network in a few time steps thus hiding the role of topological structure to affect the pace of the spreading. For each real-world network, we implemented 10³ independent SIR simulations each with a different node/individual initially infected.

The pace of the epidemic spreading can also be evaluated by the time to infect a given part of the population [37]. We define the time to reach the 15% of infected nodes in the network. $τ_{15}$ corresponds to the time steps of the SIR simulation necessary to have 15% of infected nodes (both considering the currently infected nodes and the recovered ones). The lower is the time to infect the fixed fraction of nodes/individuals, the faster is epidemic spreading.

Then, we assessed the pace of the epidemic spreading by the total number of individuals that have been infected ( $T I$ ) at the end of the simulation, i.e., when there are no more infected nodes [5, 12] and by the maximum value of infected nodes in a given day ( $ζ$ ) [12]. The $T I$ indicator corresponds to the cumulative sum of new cases, which is equivalent to the number of recovered nodes at the end of the dynamics, when, by model construction, no more nodes can be infected. This is the indicator used to quantify the influence of a given node of the network in a SIR spreading process by Kitsak et al. [5] and to evaluate the efficacy of link removal strategies to curb the SIR spreading in social networks [12]. The $T I$ indicator provides an estimate of the spread of the disease within a population and it is likely to correlate with the number of severe, and possible fatal, cases. The $ζ$ indicator, besides the evaluation of the spreading pace, it provides an estimate of the pressure over the sanitary system which might collapse, thus causing higher mortality probabilities of infected individuals, when a critical threshold is exceeded [12]. Since in epidemiology, “prevalence” is the fraction of a population currently infected [35], $ζ$ can be defined as the maximum prevalence occurring during the epidemic simulations.

The list of the spreading indicators with their definition is in Table 3.

TABLE 3

TABLE 3. Spreading indicators used in this study.

The regression models

To estimate the goodness of the relationship between the spreading indicator value (response variable Y) and the network structural indicator value (independent variable or predictor X) we performed four types of regression models: linear, quadratic, exponential, and monomolecular.

Linear: $Y = a X + b$ . It represents the simplest relationship between two variables i.e. one increases/decreases proportionally to the other.

Exponential: $Y = a \cdot e^{b X}$ . It is used to model situations in which 1) the response of one variable, to the change of another, begins slowly and then accelerates rapidly without bound, or 2) its decay begins rapidly and then slows down to get closer and closer to zero. A multitude of situations can be modeled by exponential functions, such as investment growth, radioactive decay, atmospheric pressure changes, temperatures of a cooling object, etc.

Quadratic: $Y = a X^{2} + b X + c$ . It represents those cases in which the maximum (or minimum) value of a variable is obtained at intermediate values of the independent variable. In biology, the growth rate of organisms is often modelled as a quadratic function of temperature. Such pattern can arise when the disease spread depends on the interaction of two processes which respond differently to the same NSI

Monomolecular (also known as Brody or Mitscherlich function): $Y = a (1 - b \cdot e^{- c X})$ where $b$ and $c$ are growth parameters, and $a$ is the asymptotic size [38]. The monomolecular is a special case of the generalised logistic function and it is a widely used growth curve model for saturating biological phenomena [38]. This typically occurs when other elements of the system interfere with the effect of the considered NSI and smooth its effect

The model selection criterion

We selected the best model using the Akaike information criterion (AIC) [31].

$A I C = 2 k - 2 \ln (\hat{L}) (1)$

where k is the number of estimated parameters in the regression model (2 or 3 according to the regression model), and $\hat{L}$ is the maximum value of the likelihood function for the model [31]. Given a set of candidate models for the data, the best model is the one with the minimum AIC value. Thus, AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discourages overfitting, which is desired because increasing the number of parameters in the model almost always improves the goodness of the fit. Minimization was performed using the R program function nlm (Gauss-Newton algorithm).

Eventually, to provide an easily interpretable measure of the goodness of the fitting model performances over network structural indexes (predictors), we computed the fraction of variance unexplained (FVU), calculated as:

$F V U = \frac{\sum_{i = 1}^{n} (Y_{i}^{o} - Y_{i}^{e})}{\sum_{i = 1}^{n} (Y_{i}^{o} - \bar{Y^{o}})} (2)$

where $Y_{i}^{o}$ is the observed value of the variable $Y_{i}$ (i.e. the observed spreading indicators value for the network i), $Y_{i}^{e}$ is the estimated value of the variable $Y_{i}$ (i.e., the value of the spreading indicators estimated by the fitting model for the network i), $\bar{Y^{o}}$ is the average observed value of the spreading indicators over the all networks set and $n$ is the total number of networks.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: The datasets analysed during the current study are available in the “Netzschleuder” repository [https://networks.skewed.de/], in the “Stanford Large Network Dataset Collection” repository [https://snap.stanford.edu/data/index.html], and in “The Colorado Index of Complex Networks (ICON)” repository [https://icon.colorado.edu/#!/].

Author contributions

BM, CD, AR, and BD conceived the research. BM, AR, and TM performed the analyses. All the authors wrote the manuscript.

Funding

This research is funded by a grant from the Italian Ministry of Foreign Affairs and International Cooperation. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme [grant agreement No. (816313)]. This work is supported by the Vietnam’s Ministry of Science and Technology (MOST) under the Vietnam-Italy scientific and technological cooperation program for the period 2021–2023. This work is supported by the Vietnam National University Ho Chi Minh City (VNU-HCM), Ho Chi Minh city, Vietnam under grant number B2018-42-01.

Acknowledgments

BM, TM, CD, and AR acknowledge the Italian Ministry of Foreign Affairs and International Cooperation. We are greatly thankful to Van Lang University, Vietnam for providing the budget for this study. We thank F. Sartori for helpful discussions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphy.2022.1017015/full#supplementary-material

References

1. Pastor-Satorras R, Castellano C, Van Mieghem P, Vespignani A. Epidemic processes in complex networks. Rev Mod Phys (2015) 87:925–79. doi:10.1103/RevModPhys.87.925

Network structure indexes to forecast epidemic spreading in real-world complex networks

Introduction

Results

The pace of the disease τ15<math id="m102"><mrow><msub><mi>τ</mi><mn>15</mn></msub></mrow></math>

The infected peak ζ<math id="m119"><mrow><mi>ζ</mi></mrow></math>

The total infected TI<math id="m134"><mrow><mi>T</mi><mi>I</mi></mrow></math>

Discussion

Materials and methods

Network structural indexes

Real-world complex networks database

The susceptible–infectious–recovered dynamic epidemics model

The regression models

The model selection criterion

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Publisher’s note

Supplementary material

References

95% of researchers rate our articles as excellent or good

The pace of the disease $τ_{15}$

The infected peak $ζ$

The total infected $T I$