H. Wang
Please Note
42 records found
1
Communications networks such as vehicle and social contact networks are temporal networks, where autos/individuals are connected only when they are close to each other. These networks facilitate the propagation of information. Traffic between any two nodes demanded at any time is routed along the fastest time-respecting path. In this work, we investigate the vulnerability of information transport on temporal networks to link removal. The objective is to understand the removal of which types of links deteriorate the efficiency/speed of information transport the most. Identifying such critical links will enable better intervention to facilitate/prohibit the spread of (mis)information. To identify critical links, we propose link-removal strategies based on transport efficiency between two end nodes of each link, properties of links in the aggregated network, and in routing paths respectively. Each strategy ranks links according to their corresponding properties and removes links with the highest measures. Strategies are evaluated via the relative change in transport efficiency after link removal in real-world networks. We find that the path-based strategy performs the best: links appear more often and occur early in the fastest time-respecting paths tend to be critical. Via comprehensive analysis, we explain this strategy's out-performance and its dependency on network properties.
Temporal networks, whose network topology changes over time, are used to represent, e.g., opportunistic mobile networks, vehicle networks, and social contact networks, where two mobile devices (autos or individuals) are connected only when they are close to (interact with) each other. Such networks facilitate the transfer of information. In this paper, we address the problem of navigation on temporal networks: how to route a traffic demand from a source s to a destination d at time ts, based on the network observed before ts? Whenever the node hosting the information has a contact or interacts with another node, the routing method has to decide whether the information should be forwarded to the contacted node or not. Once the information is forwarded, the contacted node becomes the only node hosting the information. Firstly, we introduce a framework of designing navigation algorithms, in which a distance metric is defined and computed between any node to the target d based on the network observed before ts. Whenever a hosting node has a contact, it forwards the information to the contacted node if the contacted node is closer to the target than the hosting node according to the distance metric. Secondly, we propose systematically distance metrics of a node pair in the temporal network observed, that capture different network properties of a node pair. Thirdly, these metrics or routing strategies are evaluated in empirical contact networks, from the perspective of the time duration of the routing and the probability that the destination can be reached. Their performance is further explained via the correlation between distance metrics and the stability of each metric in ranking nodes’ distance to a target node. This work may serve as inspiration for evaluating and redesigning these strategies in other types of networks beyond physical contact networks.
Networks facilitate the spread of information and epidemics. The average number of nodes infected via a spreading process on a network starting from a single seed node over a given long period is called the influence of that node. Estimating nodal influence early in time is essential for the epidemic/misinformation mitigation. Influence estimation has been investigated in static networks, which identifies the relation between topological properties of a node and its influence and assumes the networks are completely known. However, the networks underlying spreading processes such as social interactions are not static but temporal networks, whose links are activated or deactivated over time. When predicting nodal influence in the long-term future, the temporal network is usually only observable till the time of prediction and only locally around the node due to data accessibility. To bridge this gap, we address the question of how to utilize the partially observed temporal network (local and of short duration) around each node, to estimate the ranking of nodes in spreading influence on the full network over a long period. This would also enable us to understand which network properties of a node, in its partially observed temporal network determine its influence. Centrality metrics (nodal properties) have been proposed recently in temporal networks. However, using such a metric derived for each node from its partial network to estimate the ranking of nodes in influence is likely to be limiting. This is because the spread of information is possibly through any time-respecting path, beyond the shortest time-respecting path considered by existing metrics. To address this disparity, we systematically propose a set of novel nodal centrality metrics that encode diverse properties of (time-respecting) walks to predict nodal influence rankings. The proposed metrics derived from partial network information, in general, outperform classic centrality metrics utilizing either full or partial temporal network information. It is found that distinct centrality metrics perform the best depending on the infection probability of the spreading process. For a broad range of the infection probability, a node tends to be influential if it can reach many distinct nodes via time-respecting walks and if these nodes can be reached early in time.
Quantifying the structural and functional differences of temporal networks remains a fundamental and challenging problem in the era of big data. Traditional network comparison methods, originally developed for static networks, often fall short in capturing the intricate interplay between structural configurations and dynamic temporal patterns inherent in complex systems. This work proposes a temporal dissimilarity measure for temporal network comparison based on the first arrival distance distribution and spectral entropy based Jensen-Shannon divergence. Experimental results on both synthetic and empirical temporal networks show that the proposed measure could discriminate diverse temporal networks with different structures by capturing various topological and temporal properties. Moreover, the proposed measure can discern the functional distinctions and is found effective applications in temporal network classification and spreadability discrimination.
Nodal spreading influence is the capability of a node to activate the rest of the network when it is the seed of spreading. Combining nodal properties (centrality metrics) derived from local and global topological information respectively has been shown to better predict nodal influence than using a single metric. In this work, we investigate to what extent local and global topological information around a node contributes to the prediction of nodal influence and whether relatively local information is sufficient for the prediction. We show that by leveraging the iterative process used to derive a classical nodal centrality such as eigenvector centrality, we can define an iterative metric set that progressively incorporates more global information around the node. We propose to predict nodal influence using an iterative metric set that consists of an iterative metric from order 1 to K produced in an iterative process, encoding gradually more global information as K increases. Three iterative metrics are considered, which converge to three classical node centrality metrics, respectively. In various real-world networks and synthetic networks with community structures, we find that the prediction quality of each iterative based model converges to its optimal when the metric of relatively low orders (K∼4) are included and increases only marginally when further increasing K. This fast convergence of prediction quality with K is further explained by analyzing the correlation between the iterative metric and nodal influence, the convergence rate of each iterative process and network properties. The prediction quality of the best performing iterative metric set with K=4 is comparable with the benchmark method that combines seven centrality metrics: their prediction quality ratio is within the range [91%,106%] across all three quality measures and networks. In two spatially embedded networks with an extremely large diameter, however, iterative metric of higher orders, thus a large K, is needed to achieve comparable prediction quality with the benchmark.
Temporal networks are networks whose topology changes over time. Two nodes in a temporal network are connected at a discrete time step only if they have a contact/interaction at that time. The classic temporal network prediction problem aims to predict the temporal network one time step ahead based on the network observed in the past of a given duration. This problem has been addressed mostly via machine learning algorithms, at the expense of high computational costs and limited interpretation of the underlying mechanisms that form the networks. Hence, we propose to predict the connection of each node pair one step ahead based on the connections of this node pair itself and of node pairs that share a common node with this target node pair in the past. The concrete design of our two prediction models is based on the analysis of the memory property of real-world physical networks, i.e., to what extent two snapshots of a network at different times are similar in topology (or overlap). State-of-the-art prediction methods that allow interpretation are considered as baseline models. In seven real-world physical contact networks, our methods are shown to outperform the baselines in both prediction accuracy and computational complexity. They perform better in networks with stronger memory. Importantly, our models reveal how the connections of different types of node pairs in the past contribute to the connection estimation of a target node pair. Predicting temporal networks like physical contact networks in the long-term future beyond short-term i.e., one step ahead is crucial to forecast and mitigate the spread of epidemics and misinformation on the network. This long-term prediction problem has been seldom explored. Therefore, we propose basic methods that adapt each aforementioned prediction model to address classic short-term network prediction problem for long-term network prediction task. The prediction quality of all adapted models is evaluated via the accuracy in predicting each network snapshot and in reproducing key network properties. The prediction based on one of our models tends to have the highest accuracy and lowest computational complexity.
This paper aims to understand to what extent the amount of drug (e.g., cocaine) trafficking per country can be explained and predicted using the global shipping network. We propose three distinct network approaches, based on topological centrality metrics, Susceptible-Infected-Susceptible spreading process and a flow optimization model of drug trafficking on the shipping network, respectively. These approaches derive centrality metrics, infection probability, and inflow of drug traffic per country respectively, to estimate the amount of drug trafficking. We use the amount of drug seizure as an approximation of the amount of drug trafficking per country to evaluate our methods. Specifically, we investigate to what extent different methods could predict the ranking of countries in drug seizure (amount). Furthermore, these three approaches are integrated by a linear regression method in which we combine the nodal properties derived by each method to build a comprehensive model for the cocaine seizure data. Our analysis finds that the unweighted eigenvector centrality metric combined with the inflow derived by the flow optimization method best identifies the countries with a large amount of drug seizure (e.g., rank correlation 0.45 with the drug seizure). Extending this regression model with two extra features, the distance of a country from the source of cocaine production and a country’s income group, increases further the prediction quality (e.g., rank correlation 0.79). This final model provides insights into network derived properties and complementary country features that are explanatory for the amount of cocaine seized. The model can also be used to identify countries that have no drug seizure data but are possibly susceptible to cocaine trafficking.
Human social interactions are typically recorded as time-specific dyadic interactions, and represented as evolving (temporal) networks, where links are activated/deactivated over time. However, individuals can interact in groups of more than two people. Such group interactions can be represented as higher-order events of an evolving network. Here, we propose methods to characterize the temporal-topological properties of higher-order events to compare networks and identify their (dis)similarities. We analyzed 8 real-world physical contact networks, finding the following: (a) Events of different orders close in time tend to be also close in topology; (b) Nodes participating in many different groups (events) of a given order tend to involve in many different groups (events) of another order; Thus, individuals tend to be consistently active or inactive in events across orders; (c) Local events that are close in topology are correlated in time, supporting observation (a). Differently, in 5 collaboration networks, observation (a) is almost absent; Consistently, no evident temporal correlation of local events has been observed in collaboration networks. Such differences between the two classes of networks may be explained by the fact that physical contacts are proximity based, in contrast to collaboration networks. Our methods may facilitate the investigation of how properties of higher-order events affect dynamic processes unfolding on them and possibly inspire the development of more refined models of higher-order time-varying networks.
Intercity networks and urban performance
A geographical text mining approach
Compared to the burgeoning literature discussing the importance of agglomeration externalities for development, limited attention has been given to network externalities. This is largely due to limited data availability. We propose a general measure to proxy city network externalities based on toponym co-occurrences that indicate the relatedness between cities. This paper extracts intercity relationships based on the co-occurrence of Chinese place names on 2.5 billion webpages. We calculate and map absolute and relative network positions, which we use to explain urban labour productivity. We found that a stronger embeddedness in networks of cities is significantly and positively associated with urban productivity. Smaller cities benefit comparatively more from being well embedded in city networks, suggesting that these relations can compensate for a lack of agglomeration externalities. We also compare the importance for urban performance of city network externalities vis-à-vis agglomeration externalities. City network externalities turn out to be more important in explaining urban performance than agglomeration externalities. This calls for new theorizing on a relational approach to urban and regional development. Rather than stimulating further concentration of urbanization, our findings suggest that fostering relationships between cities is a viable alternative urban development strategy. We conclude with suggestions for a research agenda that delves deeper into city network externalities.
Temporal networks are networks like physical contact networks whose topology changes over time. Predicting future temporal network is crucial e.g., to forecast and mitigate the spread of epidemics and misinformation on the network. Most existing methods for temporal network prediction are based on machine learning algorithms, at the expense of high computational costs and limited interpretation of the underlying mechanisms that form the networks. This motivates us to develop network-based models to predict the temporal network at the next time step based on the network observed in the past. Firstly, we investigate temporal network properties to motivate our network prediction models and to explain how the performance of these models depends on the temporal networks. We explore the similarity between the network topology (snapshot) at any two time steps with a given time lag/interval. We find that the similarity is relatively high when the time lag is small and decreases as the time lag increases. Inspired by such time-decaying memory of temporal networks and recent advances, we propose two models that predict a link’s future activity (i.e., connected or not), based on the past activities of the link itself or also of neighboring links, respectively. Via seven real-world physical contact networks, we find that our models outperform in both prediction quality and computational complexity, and predict better in networks that have a stronger memory. Beyond, our model also reveals how different types of neighboring links contribute to the prediction of a given link’s future activity, again depending on properties of temporal networks.
Community detection of temporal (time-evolving) bipartite networks is challenging because it can be performed either on the temporal bipartite network, or on various projected networks, composed of only one type of nodes, via diverse community detection algorithms. In this paper, we aim to systematically design detection methods addressing both network choices and community detection algorithms, and to compare the community structures detected by different methods. We illustrate our methodology by using a telecommunications network as an example. We find that three methods proposed identify evident community structures: one is performed on each snapshot of the temporal network, and the other two, in temporal projections. We characterise the community structures detected by each method by an evaluation network in which the nodes are the services of the telecommunications network, and the weight of the links between them are the number of snapshots that both services were assigned to the same community. Analysing the evaluation networks of the three methods reveals the similarity and difference among these methods in identifying common node pairs or groups of nodes that often belong to the same community. We find that the two methods that are based on the same projected network identify consistent community structures, whereas the method based on the original temporal bipartite network complements this vision of the community structure. Moreover, we found a non-trivial number of node pairs that belong consistently to the same community in all the methods applied.
Multiple network embedding algorithms have been proposed to perform the prediction of missing or future links in complex networks. However, we lack the understanding of how network topology affects their performance, or which algorithms are more likely to perform better given the topological properties of the network. In this paper, we investigate how the clustering coefficient of a network, i.e., the probability that the neighbours of a node are also connected, affects network embedding algorithms’ performance in link prediction, in terms of the AUC (area under the ROC curve). We evaluate classic embedding algorithms, i.e., Matrix Factorisation, Laplacian Eigenmaps and node2vec, in both synthetic networks and (rewired) real-world networks with variable clustering coefficient. Specifically, a rewiring algorithm is applied to each real-world network to change the clustering coefficient while keeping key network properties. We find that a higher clustering coefficient tends to lead to a higher AUC in link prediction, except for Matrix Factorisation, which is not sensitive to the change of clustering coefficient. To understand such influence of the clustering coefficient, we (1) explore the relation between the link rating (probability that a node pair is the missing link) derived from the aforementioned algorithms and the number of common neighbours of the node pair, and (2) evaluate these embedding algorithms’ ability to reconstruct the original training (sub)network. All the network embedding algorithms that we tested tend to assign higher likelihood of connection to node pairs that share an intermediate or high number of common neighbours, independently of the clustering coefficient of the training network. Then, the predicted networks will have more triangles, thus a higher clustering coefficient. As the clustering coefficient increases, all the algorithms but Matrix Factorisation could also better reconstruct the training network. These two observations may partially explain why increasing the clustering coefficient improves the prediction performance.
Many real-world complex systems including human interactions can be represented by temporal (or evolving) networks, where links activate or deactivate over time. Characterizing temporal networks is crucial to compare different real-world networks and to detect their common patterns or differences. A systematic method that can characterize simultaneously the temporal and topological relations of the time-specific interactions (also called contacts or events) of a temporal network, is still missing. In this article, we propose a method to characterize to what extent contacts that happen close in time occur also close in topology. Specifically, we study the interrelation between temporal and topological properties of the contacts from three perspectives: (1) the correlation (among the elements) of the activity time series which records the total number of contacts in a network that happen at each time step; (2) the interplay between the topological distance and time difference of two arbitrary contacts; (3) the temporal correlation of contacts within the local neighbourhood centred at each link (so-called ego-network) to explore whether such contacts that happen close in topology are also close in time. By applying our method to 13 real-world temporal networks, we found that temporal-Topological correlation of contacts is more evident in virtual contact networks than in physical contact networks. This could be due to the lower cost and easier access of online communications than physical interactions, allowing and possibly facilitating social contagion, that is, interactions of one individual may influence the activity of its neighbours. We also identify different patterns between virtual and physical networks and among physical contact networks at, for example, school and workplace, in the formation of correlation in local neighbourhoods. Patterns and differences detected via our method may further inspire the development of more realistic temporal network models, that could reproduce jointly temporal and topological properties of contacts.
The multiplex relations between cities
A lexicon-based approach to detect urban systems
Cities relate to other cities in many ways, and much scholarly effort goes into uncovering those relationships. Building on the principle that strongly related cities will co-occur frequently in texts, we propose a novel method to classify those toponym co-occurrences using a lexicon-based text-mining method. Millions of webpages are analysed to retrieve how 293 Chinese cities are related in terms of six types: industry, information technology, finance, research, culture and government. Each class displays different network patterns, and this multiplexity is mapped and analysed. Further refinement of this lexicon-based approach can revolutionize the study of inter-urban relationships.
Temporal networks refer to networks like physical contact networks whose topology changes over time. Predicting future temporal network is crucial e.g., to forecast the epidemics. Existing prediction methods are either relatively accurate but black-box, or white-box but less accurate. The lack of interpretable and accurate prediction methods motivates us to explore what intrinsic properties/mechanisms facilitate the prediction of temporal networks. We use interpretable learning algorithms, Lasso Regression and Random Forest, to predict, based on the current activities (i.e., connected or not) of all links, the activity of each link at the next time step. From the coefficients learned from each algorithm, we construct the prediction backbone network that presents the influence of all links in determining each links future activity. Analysis of the backbone, its relation to the link activity time series and to the time aggregated network reflects which properties of temporal networks are captured by the learning algorithms. Via six real-world contact networks, we find that the next step activity of a particular link is mainly influenced by (a) its current activity and (b) links strongly correlated in the time series to that particular link and close in distance (in hops) in the aggregated network.
Small and medium enterprises (SME) are crucial for economy and have a higher exposure rate to default than large corporates. In this work, we address the problem of predicting the default of an SME. Default prediction models typically only consider the previous financial situation of each analysed company. Thus, they do not take into account the interactions between companies, which could be insightful as SMEs live in a supply chain ecosystem in which they constantly do business with each other. Thereby, we present a novel method to improve traditional default prediction models by incorporating information about the insolvency situation of customers and suppliers of a given SME, using a graph-based representation of SME supply chains. We analyze its performance and illustrate how this proposed solution outperforms the traditional default prediction approaches.
In this work, we explore the possibility of using a heterogeneous Susceptible-Infected-Susceptible SIS spreading process on an airline network to model airport congestion contagion with the objective to reproduce airport vulnerability. We derive the vulnerability of each airport from the US Airport Network data as the congestion probability of each airport. In order to capture diverse flight features between airports, e.g. frequency and duration, we construct three types of airline networks. The infection rate of each link in the SIS spreading process is proportional to its corresponding weight in the underlying airline network constructed. The recovery rate of each node is also heterogeneous, dependent on its node strength in the underlying airline network, which is the total weight of the links incident to the node. Such heterogeneous recovery rate is motivated by the fact that large airports may recover fast from congestion due to their well-equipped infrastructures. The nodal infection probability in the meta-stable state is used as a prediction of the vulnerability of the corresponding airport. We illustrate that our model could reproduce the distribution of nodal vulnerability and rank the airports in vulnerability evidently better than the SIS model whose recovery rate is homogeneous. The vulnerability is the largest at airports whose strength in the airline network is neither too large nor too small. This phenomenon can be captured by our heterogeneous model, but not the homogeneous model where a node with a larger strength has a higher infection probability. This explains partially the out-performance of the heterogeneous model. This proposed congestion contagion model may shed lights on the development of strategies to identify vulnerable airports and to mitigate global congestion by e.g. congestion reduction at selected airports.
In this paper, we aim to effectively suppress the spread of epidemic/information via blocking/removing a given fraction of the contacts in a temporal (time evolving) human contact network. We consider the SI (Susceptible- Infected) spreading process, on a temporal contact network to illustrate our methodology: an infected node infects a susceptible node with a probability β when a contact happens between the two nodes. We address the question: which contacts should be blocked in order to minimize the average prevalence over time. We firstly propose systematically a set of link properties (centrality metrics) based on the aggregated network of a temporal network, that captures the number of contacts between each node pair. Furthermore, we define the probability that a contact c(i, j, t) is removed as a function of the centrality of the corresponding link l(i, j) in the aggregated network as well as the time t of the contact. Each of the centrality metrics proposed can be thus regarded as a contact removal strategy. Empirical results on six temporal contact networks show that the epidemic can be better suppressed if contacts between node pairs that have fewer contacts are more likely to be removed and if contacts happened earlier are likely removed. A strategy tends to perform better when the average number contacts removed per node pair has a lower variance. Strategies that lead to a lower largest eigenvalue of the aggregated network after contact removal do not mitigate the spreading better. This contradicts the finding in static networks, that a network with a small largest eigenvalue tends to be robust against epidemic spreading, illustrating the complexity introduced by the underlying temporal networks.