S. Zhang | TU Delft Repository

Estimating nodal spreading influence using partial temporal networks

Journal article (2025) - Tianrui Mao, Shilun Zhang, Alan Hanjalic, Huijuan Wang

Networks facilitate the spread of information and epidemics. The average number of nodes infected via a spreading process on a network starting from a single seed node over a given long period is called the influence of that node. Estimating nodal influence early in time is essential for the epidemic/misinformation mitigation. Influence estimation has been investigated in static networks, which identifies the relation between topological properties of a node and its influence and assumes the networks are completely known. However, the networks underlying spreading processes such as social interactions are not static but temporal networks, whose links are activated or deactivated over time. When predicting nodal influence in the long-term future, the temporal network is usually only observable till the time of prediction and only locally around the node due to data accessibility. To bridge this gap, we address the question of how to utilize the partially observed temporal network (local and of short duration) around each node, to estimate the ranking of nodes in spreading influence on the full network over a long period. This would also enable us to understand which network properties of a node, in its partially observed temporal network determine its influence. Centrality metrics (nodal properties) have been proposed recently in temporal networks. However, using such a metric derived for each node from its partial network to estimate the ranking of nodes in influence is likely to be limiting. This is because the spread of information is possibly through any time-respecting path, beyond the shortest time-respecting path considered by existing metrics. To address this disparity, we systematically propose a set of novel nodal centrality metrics that encode diverse properties of (time-respecting) walks to predict nodal influence rankings. The proposed metrics derived from partial network information, in general, outperform classic centrality metrics utilizing either full or partial temporal network information. It is found that distinct centrality metrics perform the best depending on the infection probability of the spreading process. For a broad range of the infection probability, a node tends to be influential if it can reach many distinct nodes via time-respecting walks and if these nodes can be reached early in time. ...

Networks facilitate the spread of information and epidemics. The average number of nodes infected via a spreading process on a network starting from a single seed node over a given long period is called the influence of that node. Estimating nodal influence early in time is essential for the epidemic/misinformation mitigation. Influence estimation has been investigated in static networks, which identifies the relation between topological properties of a node and its influence and assumes the networks are completely known. However, the networks underlying spreading processes such as social interactions are not static but temporal networks, whose links are activated or deactivated over time. When predicting nodal influence in the long-term future, the temporal network is usually only observable till the time of prediction and only locally around the node due to data accessibility. To bridge this gap, we address the question of how to utilize the partially observed temporal network (local and of short duration) around each node, to estimate the ranking of nodes in spreading influence on the full network over a long period. This would also enable us to understand which network properties of a node, in its partially observed temporal network determine its influence. Centrality metrics (nodal properties) have been proposed recently in temporal networks. However, using such a metric derived for each node from its partial network to estimate the ranking of nodes in influence is likely to be limiting. This is because the spread of information is possibly through any time-respecting path, beyond the shortest time-respecting path considered by existing metrics. To address this disparity, we systematically propose a set of novel nodal centrality metrics that encode diverse properties of (time-respecting) walks to predict nodal influence rankings. The proposed metrics derived from partial network information, in general, outperform classic centrality metrics utilizing either full or partial temporal network information. It is found that distinct centrality metrics perform the best depending on the infection probability of the spreading process. For a broad range of the infection probability, a node tends to be influential if it can reach many distinct nodes via time-respecting walks and if these nodes can be reached early in time.

Spreading Processes on Networks

Roles of Nodes, Links, and Hyperlinks

Doctoral thesis (2025) - S. Zhang, H. Wang, A. Hanjalic

Spreading processes are ubiquitous in nature and society, from the diffusion of information in social platforms to the spread of diseases within populations. Many real-world systems can be represented as networks, where a piece of information or a disease spreads along links connecting nodes. Different nodes and links often differ in their network properties and play distinct roles in a spreading process. Based on network properties of nodes or links, practitioners may be interested in identifying key nodes as the seed nodes to maximally diffuse a piece of information, or removing specific links to mitigate the spreading. In this thesis, we study the roles of a node or a link in a spreading process from three different perspectives and investigate how these roles relate to the properties of nodes and links within the underlying network.

We first explore how the network properties of a node can be used to predict the spreading influence of the node, defined as the average number of nodes that are ultimately infected when this node is the only seed node. Previous studies have shown that combining node properties derived from local and global topological information can better predict nodal influence than using a single metric. In Chapter 2, we investigate whether using relatively local information is sufficient for the prediction. To address this question, we define an iterative metric set by leveraging the iterative process used to derive classical nodal centralities like eigenvector centrality. The iterative metric set progressively incorporates more global information and is used as the feature set in a regression model to predict nodal spreading influence. The iterative metric set is then used as the feature set in a regression model to predict the spreading influence of a node. We find that the model using the iterative metric set that includes relatively local information achieves comparable prediction quality with the method that includes both local and global information, in various networks.

A spreading process can be mitigated by blocking social contacts, i.e., time-specific interactions. In Chapter 3, we investigate how the network properties of a contact are associated with the mitigation effect when the contact is blocked. We develop probabilistic contact blocking strategies, which remove contacts (temporal links) based on their properties in a temporal network, to mitigate the spread of a Susceptible-Infected-Recovered spreading process. The removal probability of a contact depends on a given centrality metric of the corresponding link in the time-aggregated network and the occurring time of the contact. We propose diverse link centrality metrics, and each centrality metric leads to a unique contact blocking strategy. Our results indicate that the spread of the epidemic is most effectively mitigated when contacts between node pairs that have fewer contacts and contacts that occur earlier in time are more likely to be removed.

The role of a link in a spreading process can also be reflected by the extent to which the link is used in the process. Many real-world systems may involve interactions among groups of more than two individuals and can therefore be represented as temporal higher-order networks. Chapter 4 explores the Susceptible-Infected threshold spreading process unfolding on temporal higher-order networks with two objectives: (1) to understand the contribution of each hyperlink to the spreading process, defined as the average number of nodes that are directly infected via the activation of the hyperlink starting from an arbitrary seed node, and (2) to investigate hyperlinks with what network properties tend to contribute more to the spreading process. This understanding is crucial for developing effective strategies to mitigate a spreading process. Given a temporal higher-order network, we propose to construct a weighted higher-order network, the so-called diffusion backbone, where the weight of each hyperlink denotes its contribution to the spreading process. We then systematically design centrality metrics for hyperlinks in a temporal higher-order network, where each centrality metric captures a specific property of the hyperlink within a temporal higher-order network and is used to estimate the ranking of hyperlinks by their weights in the backbone. We find and explain why certain centrality metrics can better estimate the contributions of hyperlinks under different parameters of the spreading process.

The last chapter reflects on the insights of this thesis and discusses possible future directions related to our research. ...

Spreading processes are ubiquitous in nature and society, from the diffusion of information in social platforms to the spread of diseases within populations. Many real-world systems can be represented as networks, where a piece of information or a disease spreads along links connecting nodes. Different nodes and links often differ in their network properties and play distinct roles in a spreading process. Based on network properties of nodes or links, practitioners may be interested in identifying key nodes as the seed nodes to maximally diffuse a piece of information, or removing specific links to mitigate the spreading. In this thesis, we study the roles of a node or a link in a spreading process from three different perspectives and investigate how these roles relate to the properties of nodes and links within the underlying network.

We first explore how the network properties of a node can be used to predict the spreading influence of the node, defined as the average number of nodes that are ultimately infected when this node is the only seed node. Previous studies have shown that combining node properties derived from local and global topological information can better predict nodal influence than using a single metric. In Chapter 2, we investigate whether using relatively local information is sufficient for the prediction. To address this question, we define an iterative metric set by leveraging the iterative process used to derive classical nodal centralities like eigenvector centrality. The iterative metric set progressively incorporates more global information and is used as the feature set in a regression model to predict nodal spreading influence. The iterative metric set is then used as the feature set in a regression model to predict the spreading influence of a node. We find that the model using the iterative metric set that includes relatively local information achieves comparable prediction quality with the method that includes both local and global information, in various networks.

A spreading process can be mitigated by blocking social contacts, i.e., time-specific interactions. In Chapter 3, we investigate how the network properties of a contact are associated with the mitigation effect when the contact is blocked. We develop probabilistic contact blocking strategies, which remove contacts (temporal links) based on their properties in a temporal network, to mitigate the spread of a Susceptible-Infected-Recovered spreading process. The removal probability of a contact depends on a given centrality metric of the corresponding link in the time-aggregated network and the occurring time of the contact. We propose diverse link centrality metrics, and each centrality metric leads to a unique contact blocking strategy. Our results indicate that the spread of the epidemic is most effectively mitigated when contacts between node pairs that have fewer contacts and contacts that occur earlier in time are more likely to be removed.

The role of a link in a spreading process can also be reflected by the extent to which the link is used in the process. Many real-world systems may involve interactions among groups of more than two individuals and can therefore be represented as temporal higher-order networks. Chapter 4 explores the Susceptible-Infected threshold spreading process unfolding on temporal higher-order networks with two objectives: (1) to understand the contribution of each hyperlink to the spreading process, defined as the average number of nodes that are directly infected via the activation of the hyperlink starting from an arbitrary seed node, and (2) to investigate hyperlinks with what network properties tend to contribute more to the spreading process. This understanding is crucial for developing effective strategies to mitigate a spreading process. Given a temporal higher-order network, we propose to construct a weighted higher-order network, the so-called diffusion backbone, where the weight of each hyperlink denotes its contribution to the spreading process. We then systematically design centrality metrics for hyperlinks in a temporal higher-order network, where each centrality metric captures a specific property of the hyperlink within a temporal higher-order network and is used to estimate the ranking of hyperlinks by their weights in the backbone. We find and explain why certain centrality metrics can better estimate the contributions of hyperlinks under different parameters of the spreading process.

The last chapter reflects on the insights of this thesis and discusses possible future directions related to our research.

Predicting nodal influence via local iterative metrics

Journal article (2024) - Shilun Zhang, Alan Hanjalic, Huijuan Wang

Nodal spreading influence is the capability of a node to activate the rest of the network when it is the seed of spreading. Combining nodal properties (centrality metrics) derived from local and global topological information respectively has been shown to better predict nodal influence than using a single metric. In this work, we investigate to what extent local and global topological information around a node contributes to the prediction of nodal influence and whether relatively local information is sufficient for the prediction. We show that by leveraging the iterative process used to derive a classical nodal centrality such as eigenvector centrality, we can define an iterative metric set that progressively incorporates more global information around the node. We propose to predict nodal influence using an iterative metric set that consists of an iterative metric from order 1 to K produced in an iterative process, encoding gradually more global information as K increases. Three iterative metrics are considered, which converge to three classical node centrality metrics, respectively. In various real-world networks and synthetic networks with community structures, we find that the prediction quality of each iterative based model converges to its optimal when the metric of relatively low orders (K∼4) are included and increases only marginally when further increasing K. This fast convergence of prediction quality with K is further explained by analyzing the correlation between the iterative metric and nodal influence, the convergence rate of each iterative process and network properties. The prediction quality of the best performing iterative metric set with K=4 is comparable with the benchmark method that combines seven centrality metrics: their prediction quality ratio is within the range [91%,106%] across all three quality measures and networks. In two spatially embedded networks with an extremely large diameter, however, iterative metric of higher orders, thus a large K, is needed to achieve comparable prediction quality with the benchmark. ...

Nodal spreading influence is the capability of a node to activate the rest of the network when it is the seed of spreading. Combining nodal properties (centrality metrics) derived from local and global topological information respectively has been shown to better predict nodal influence than using a single metric. In this work, we investigate to what extent local and global topological information around a node contributes to the prediction of nodal influence and whether relatively local information is sufficient for the prediction. We show that by leveraging the iterative process used to derive a classical nodal centrality such as eigenvector centrality, we can define an iterative metric set that progressively incorporates more global information around the node. We propose to predict nodal influence using an iterative metric set that consists of an iterative metric from order 1 to K produced in an iterative process, encoding gradually more global information as K increases. Three iterative metrics are considered, which converge to three classical node centrality metrics, respectively. In various real-world networks and synthetic networks with community structures, we find that the prediction quality of each iterative based model converges to its optimal when the metric of relatively low orders (K∼4) are included and increases only marginally when further increasing K. This fast convergence of prediction quality with K is further explained by analyzing the correlation between the iterative metric and nodal influence, the convergence rate of each iterative process and network properties. The prediction quality of the best performing iterative metric set with K=4 is comparable with the benchmark method that combines seven centrality metrics: their prediction quality ratio is within the range [91%,106%] across all three quality measures and networks. In two spatially embedded networks with an extremely large diameter, however, iterative metric of higher orders, thus a large K, is needed to achieve comparable prediction quality with the benchmark.

Drug Trafficking in Relation to Global Shipping Network

Conference paper (2023) - Louise Leibbrandt, Shilun Zhang, Marijn Roelvink, Stan Bergkamp, Xinqi Li, Lieselot Bisschop, Karin van Wingerde, Huijuan Wang

This paper aims to understand to what extent the amount of drug (e.g., cocaine) trafficking per country can be explained and predicted using the global shipping network. We propose three distinct network approaches, based on topological centrality metrics, Susceptible-Infected-Susceptible spreading process and a flow optimization model of drug trafficking on the shipping network, respectively. These approaches derive centrality metrics, infection probability, and inflow of drug traffic per country respectively, to estimate the amount of drug trafficking. We use the amount of drug seizure as an approximation of the amount of drug trafficking per country to evaluate our methods. Specifically, we investigate to what extent different methods could predict the ranking of countries in drug seizure (amount). Furthermore, these three approaches are integrated by a linear regression method in which we combine the nodal properties derived by each method to build a comprehensive model for the cocaine seizure data. Our analysis finds that the unweighted eigenvector centrality metric combined with the inflow derived by the flow optimization method best identifies the countries with a large amount of drug seizure (e.g., rank correlation 0.45 with the drug seizure). Extending this regression model with two extra features, the distance of a country from the source of cocaine production and a country’s income group, increases further the prediction quality (e.g., rank correlation 0.79). This final model provides insights into network derived properties and complementary country features that are explanatory for the amount of cocaine seized. The model can also be used to identify countries that have no drug seizure data but are possibly susceptible to cocaine trafficking. ...

Mitigate SIR epidemic spreading via contact blocking in temporal networks

Journal article (2022) - S. Zhang, Xunyi Zhao, H. Wang

Progress has been made in how to suppress epidemic spreading on temporal networks via blocking all contacts of targeted nodes or node pairs. In this work, we develop contact blocking strategies that remove a fraction of contacts from a temporal (time evolving) human contact network to mitigate the spread of a Susceptible-Infected-Recovered epidemic. We define the probability that a contact c(i, j, t) is removed as a function of a given centrality metric of the corresponding link l(i, j) in the aggregated network and the time t of the contact. The aggregated network captures the number of contacts between each node pair. A set of 12 link centrality metrics have been proposed and each centrality metric leads to a unique contact removal strategy. These strategies together with a baseline strategy (random removal) are evaluated in empirical contact networks via the average prevalence, the peak prevalence and the time to reach the peak prevalence. We find that the epidemic spreading can be mitigated the best when contacts between node pairs that have fewer contacts and early contacts are more likely to be removed. A strategy tends to perform better when the average number contacts removed from each node pair varies less. The aggregated pruned network resulted from the best contact removal strategy tends to have a large largest eigenvalue, a large modularity and probably a small largest connected component size. ...