XZ

X. Zhan

info

Please Note

15 records found

Multiple network embedding algorithms have been proposed to perform the prediction of missing or future links in complex networks. However, we lack the understanding of how network topology affects their performance, or which algorithms are more likely to perform better given the topological properties of the network. In this paper, we investigate how the clustering coefficient of a network, i.e., the probability that the neighbours of a node are also connected, affects network embedding algorithms’ performance in link prediction, in terms of the AUC (area under the ROC curve). We evaluate classic embedding algorithms, i.e., Matrix Factorisation, Laplacian Eigenmaps and node2vec, in both synthetic networks and (rewired) real-world networks with variable clustering coefficient. Specifically, a rewiring algorithm is applied to each real-world network to change the clustering coefficient while keeping key network properties. We find that a higher clustering coefficient tends to lead to a higher AUC in link prediction, except for Matrix Factorisation, which is not sensitive to the change of clustering coefficient. To understand such influence of the clustering coefficient, we (1) explore the relation between the link rating (probability that a node pair is the missing link) derived from the aforementioned algorithms and the number of common neighbours of the node pair, and (2) evaluate these embedding algorithms’ ability to reconstruct the original training (sub)network. All the network embedding algorithms that we tested tend to assign higher likelihood of connection to node pairs that share an intermediate or high number of common neighbours, independently of the clustering coefficient of the training network. Then, the predicted networks will have more triangles, thus a higher clustering coefficient. As the clustering coefficient increases, all the algorithms but Matrix Factorisation could also better reconstruct the training network. These two observations may partially explain why increasing the clustering coefficient improves the prediction performance. ...
Journal article (2021) - Jialin Bi, Ji Jin, Cunquan Qu, Xiuxiu Zhan, Guanghui Wang, Guiying Yan
Identifying important nodes in networks is essential to analysing their structure and understanding their dynamical processes. In addition, myriad real systems are time-varying and can be represented as temporal networks. Motivated by classic gravity in physics, we propose a temporal gravity model to identify important nodes in temporal networks. In gravity, the attraction between two objects depends on their masses and distance. For the temporal network, we treat basic node properties (e.g., static and temporal properties) as the mass and temporal characteristics (i.e., fastest arrival distance and temporal shortest distance) as the distance. Experimental results on 10 real datasets show that the temporal gravity model outperforms baseline methods in quantifying the structural influence of nodes. When using the temporal shortest distance as the distance between two nodes, the proposed model is more robust and more accurately determines the node spreading influence than baseline methods. Furthermore, when using the temporal information to quantify the mass of each node, we found that a novel robust metric can be used to accurately determine the node influence regarding both network structure and information spreading. ...
Journal article (2021) - Feng Hu, Lin Ma, Xiu Xiu Zhan, Yinzuo Zhou, Chuang Liu, Haixing Zhao, Zi Ke Zhang
The study of citation networks is of interest to the scientific community. However, the underlying mechanism driving individual citation behavior remains imperfectly understood, despite the recent proliferation of quantitative research methods. Traditional network models normally use graph theory to consider articles as nodes and citations as pairwise relationships between them. In this paper, we propose an alternative evolutionary model based on hypergraph theory in which one hyperedge can have an arbitrary number of nodes, combined with an aging effect to reflect the temporal dynamics of scientific citation behavior. Both theoretical approximate solution and simulation analysis of the model are developed and validated using two benchmark datasets from different disciplines, i.e. publications of the American Physical Society (APS) and the Digital Bibliography & Library Project (DBLP). Further analysis indicates that the attraction of early publications will decay exponentially. Moreover, the experimental results show that the aging effect indeed has a significant influence on the description of collective citation patterns. Shedding light on the complex dynamics driving these mechanisms facilitates the understanding of the laws governing scientific evolution and the quantitative evaluation of scientific outputs. ...
Doctoral thesis (2020) - X. Zhan
As an important carrier of information diffusion, social media has experienced a huge increase in the number of users and also has a big effect on the way of how information diffuses. For example, Facebook and Youtube have attracted more than 1.6 and 1.3 billion users until 2020, respectively. The use of internet and online social network have largely reduced the cost of information propagation and sharing. Besides users and content-based features, social network properties are critical factors that may affect information diffusion. In this thesis, we focus on the influence of temporal network properties on information spreading. As researchers have proved that similar users tend to spread similar content of information, we further propose how to design network representation learning algorithms to better capture node similarity in a network. The first part of the thesis is mainly about how the local properties of nodes and links would affect information spreading on temporal networks. Chapter 2 studies which links are likely to appear in an information diffusion trajectory. We simulate the information diffusion process by a susceptible-infected (SI) model on various empirical temporal networks. An information diffusion backbone is proposed to characterize the probability of a link to appear in the diffusion trajectory. Due to the high complexity of constructing diffusion backbone, we further propose time-scaled weight to identify which links would appear in the diffusion backbone. Compared to the centrality metrics derived from static networks, time-scaled weight shows better identification performance. The conclusions in this chapter may inspire how to maximize information diffusion on temporal networks by deliberately choosing links to transmit information. Chapter 3 investigates which links should be temporally blocked in order to suppress information diffusion on temporal networks. We rank the links by different blocking strategies based on the link properties on static and temporal networks, including the ones derived from information diffusion backbone. We remove the links with high ranking values based on blocking strategies for a given time period. We show that four link blocking strategies outperform the others in suppressing information diffusion. The results show that the effectiveness of the metrics on suppressing information diffusion largely depends on the network properties. In chapter 4, we study how to identify influential nodes, i.e., nodes serving as the seed can spread information widely, on temporal networks. The information diffusion process is simulated by susceptible-infected-recovered (SIR) model on various empirical temporal networks. We propose a temporal information gathering process (Tig-process), which can iteratively gather neighboring information though temporal path, to identify influential nodes. Compared to the benchmark metrics, Tig-process can better identify influential nodes across different temporal networks with a small cost. The experimental designs and results in these three chapters further inspire us to study the local surrounding properties of nodes and links for other spreading processes as well as other types of networks. In the second part of the thesis, we work on designing network embedding algorithms to embed nodes to a low-dimensional space, which can make similar nodes be close in the embedding space. Chapter 5 designs a degree-biased random walk, i.e., DiaRW, to sample walks from a static network. If the source node of a random walk has higher degree, the walk length tends to be longer. Also, if a random walker walks to a low-degree node, the probability of backtracking the former high-degree node is higher. The node pairs generated from walks are further used as input for a learning model, i.e., Skip-Gram model. We unveil that DiaRW shows better performance compared to baseline embedding algorithms on tasks, e.g., link prediction and node classification. Chapter 6 proposes SI-spreading-based network embedding algorithms. We apply SI model on static and temporal networks to sample trajectories. The node pairs generated from trajectories are also used as input for Skip-Gram model. We show SI-spreading-based network embedding algorithms perform better than random-walk-based network embedding algorithms on missing link prediction task. Both of the two chapters consider node heterogeneity in designing embedding algorithms. The last chapter proposes insight of the thesis based on the research questions and provides the possible future directions that is related to our research. ...
Journal article (2020) - Chuang Liu , Nan Zhou, Xiuxiu Zhan, Gui-Quan Sun, Zi-Ke Zhang
There is currently growing interest in modeling the information diffusion on social networks across multi-disciplines, including the prediction of the news popularity, the detection of the rumors and the influence of the epidemiological studies. Following the framework of the epidemic spreading, the information spreading models assume that information can be transmitted from the known individuals (infected) to the un-known individuals (susceptible) through the network interactions. During this process, individuals also always change their interactions which in turn will greatly influence the information spreading. In this work, we propose a mechanism considering the co-evolution between information states and network topology simultaneously, in which the information diffusion was executed as an SIS process and network topology evolved based on the adaptive assumption. The theoretical analyses based on the Markov approach were very consistent with simulation. Both simulation results and theoretical analyses indicated that the adaptive process, in which informed individuals would rewire the links between the informed neighbors to a random non-neighbor node, can enhance information diffusion (leading to much broader spreading). In addition, we obtained that two threshold values exist for the information diffusion on adaptive networks, i.e., if the information propagation probability is less than the first threshold, information cannot diffuse and dies out immediately; if the propagation probability is between the first and second threshold, information will spread to a finite range and die out gradually; and if the propagation probability is larger than the second threshold, information will diffuse to a certain size of population in the network. These results may shed some light on understanding the co-evolution between information diffusion and network topology. ...
Journal article (2020) - Xiu Xiu Zhan, Ziyu Li, Naoki Masuda, Petter Holme, Huijuan Wang
Link prediction can be used to extract missing information, identify spurious interactions as well as forecast network evolution. Network embedding is a methodology to assign coordinates to nodes in a low-dimensional vector space. By embedding nodes into vectors, the link prediction problem can be converted into a similarity comparison task. Nodes with similar embedding vectors are more likely to be connected. Classic network embedding algorithms are random-walk-based. They sample trajectory paths via random walks and generate node pairs from the trajectory paths. The node pair set is further used as the input for a Skip-Gram model, a representative language model that embeds nodes (which are regarded as words) into vectors. In the present study, we propose to replace random walk processes by a spreading process, namely the susceptible-infected (SI) model, to sample paths. Specifically, we propose two susceptible-infected-spreading-based algorithms, i.e., Susceptible-Infected Network Embedding (SINE) on static networks and Temporal Susceptible-Infected Network Embedding (TSINE) on temporal networks. The performance of our algorithms is evaluated by the missing link prediction task in comparison with state-of-the-art static and temporal network embedding algorithms. Results show that SINE and TSINE outperform the baselines across all six empirical datasets. We further find that the performance of SINE is mostly better than TSINE, suggesting that temporal information does not necessarily improve the embedding for missing link prediction. Moreover, we study the effect of the sampling size, quantified as the total length of the trajectory paths, on the performance of the embedding algorithms. The better performance of SINE and TSINE requires a smaller sampling size in comparison with the baseline algorithms. Hence, SI-spreading-based embedding tends to be more applicable to large-scale networks. ...
Conference paper (2020) - Xiuxiu Zhan, Alan Hanjalic, Huijuan Wang
In this paper, we explore how to effectively suppress the diffusion of (mis)information via blocking/removing the temporal contacts between selected node pairs. Information diffusion can be modelled as, e.g., an SI (Susceptible-Infected) spreading process, on a temporal social network: an infected (information possessing) node spreads the information to a susceptible node whenever a contact happens between the two nodes. Specifically, the link (node pair) blocking intervention is introduced for a given period and for a given number of links, limited by the intervention cost. We address the question: which links should be blocked in order to minimize the average prevalence over time? We propose a class of link properties (centrality metrics) based on the information diffusion backbone [19], which characterizes the contacts that actually appear in diffusion trajectories. Centrality metrics of the integrated static network have also been considered. For each centrality metric, links with the highest values are blocked for the given period. Empirical results on eight temporal network datasets show that the diffusion backbone based centrality methods outperform the other metrics whereas the betweenness of the static network, performs reasonably well especially when the prevalence grows slowly over time. ...
Journal article (2019) - Yunyi Zhang, Zhan Shi, Dan Feng, Xiuxiu Zhan
Network embedding aims at learning node representation by preserving the network topology. Previous embedding methods do not scale for large real-world networks which usually contain millions of nodes. They generally adopt a one-size-fits-all strategy to collect information, resulting in a large amount of redundancy. In this paper, we propose DiaRW, a scalable network embedding method based on a degree-biased random walk with variable length to sample context information for learning. Our walk strategy can well adapt to the scale-free feature of real-world networks and extract information from them with much less redundancy. In addition, our method can greatly reduce the size of context information, which is efficient for large-scale network embedding. Empirical experiments on node classification and link prediction prove not only the effectiveness but also the efficiency of DiaRW on a variety of real-world networks. Our algorithm is able to learn the network representations with millions of nodes and edges in hours on a single machine, which is tenfold faster than previous methods. ...
Journal article (2019) - Cunquan Qu, Xiuxiu Zhan, Guanghui Wang, Jianliang Wu, Zi-ke Zhang
Many systems are dynamic and time-varying in the real world. Discovering the vital nodes in temporal networks is more challenging than that in static networks. In this study, we proposed a temporal information gathering (TIG) process for temporal networks. The TIG-process, as a node's importance metric, can be used to do the node ranking. As a framework, the TIG-process can be applied to explore the impact of temporal information on the significance of the nodes. The key point of the TIG-process is that nodes' importance relies on the importance of its neighborhood. There are four variables: temporal information gathering depth n, temporal distance matrix D, initial information c, and weighting function f. We observed that the TIG-process can degenerate to classic metrics by a proper combination of these four variables. Furthermore, the fastest arrival distance based TIG-process (fad-tig) is performed optimally in quantifying nodes' efficiency and nodes' spreading influence. Moreover, for the fad-tig process, we can find an optimal gathering depth n that makes the TIG-process perform optimally when n is small. ...
Journal article (2019) - Xiu-xiu Zhan, Alan Hanjalic, Huijuan Wang
Progress has been made in understanding how temporal network features affect the percentage of nodes reached by an information diffusion process. In this work, we explore further: which node pairs are likely to contribute to the actual diffusion of information, i.e., appear in a diffusion trajectory? How is this likelihood related to the local temporal connection features of the node pair? Such deep understanding of the role of node pairs is crucial to tackle challenging optimization problems such as which kind of node pairs or temporal contacts should be stimulated in order to maximize the prevalence of information spreading. We start by using Susceptible-Infected (SI) model, in which an infected (information possessing) node could spread the information to a susceptible node with a given infection probability β whenever a contact happens between the two nodes, as the information diffusion process. We consider a large number of real-world temporal networks. First, we propose the construction of an information diffusion backbone G B (β) for a SI spreading process with an infection probability β on a temporal network. The backbone is a weighted network where the weight of each node pair indicates how likely the node pair appears in a diffusion trajectory starting from an arbitrary node. Second, we investigate the relation between the backbones with different infection probabilities on a temporal network. We find that the backbone topology obtained for low and high infection probabilities approach the backbone G B (β → 0) and G B (β = 1), respectively. The backbone G B (β → 0) equals the integrated weighted network, where the weight of a node pair counts the total number of contacts in between. Finally, we explore node pairs with what local connection features tend to appear in G B (β = 1), thus actually contribute to the global information diffusion. We discover that a local connection feature among many other features we proposed, could well identify the (high-weight) links in G B (β = 1). This local feature encodes the time that each contact occurs, pointing out the importance of temporal features in determining the role of node pairs in a dynamic process. ...
Conference paper (2018) - Nan Zhou, Xiuxiu Zhan, Qiang Ma, Song Lin, Jun Zhang, Zi-Ke Zhang
The rapid development of World Wide Web accelerates information spreading in various ways. Thanks to the emergence of multiple social platforms, some events which are not much attractive in the past can become social hot spots nowadays. In this paper, we study the information diffusion process of “IP MAN3 box office fraud”, which is widely diffused in the largest Chinese microblogging system, namely Sina Weibo, in March 2016. Based on the temporal metric we have proposed, we succeed in finding out the sources of the information, and constructing the panorama of the diffusion process. In addition, a portion of nodes that promote the diffusion are identified by using the node importance algorithms. Finally, the users with abnormal behaviors in the process of event development are identified. ...
Journal article (2018) - Xiuxiu Zhan, Chuang Liu , Gui-Quan Sun, Zi-Ke Zhang
Research on the interplay between the dynamics on the network and the dynamics of the network has attracted much attention in recent years. In this work, we propose an information-driven adaptive model, where disease and disease information can evolve simultaneously. For the information-driven adaptive process, susceptible (infected) individuals who have abilities to recognize the disease would break the links of their infected (susceptible) neighbors to prevent the epidemic from further spreading. Simulation results and numerical analyses based on the pairwise approach indicate that the information-driven adaptive process can not only slow down the speed of epidemic spreading, but can also diminish the epidemic prevalence at the final state significantly. In addition, the disease spreading and information diffusion pattern on the lattice as well as on a real-world network give visual representations about how the disease is trapped into an isolated field with the information-driven adaptive process. Furthermore, we perform the local bifurcation analysis on four types of dynamical regions, including healthy, a continuous dynamic behavior, bistable and endemic, to understand the evolution of the observed dynamical behaviors. This work may shed some lights on understanding how information affects human activities on responding to epidemic spreading. ...
Journal article (2018) - Xiuxiu Zhan, Chuang Liu , Ge Zhou, Zi-Ke Zhang , Gui-Quan Sun, Jonathan J. H. Zhu, Zhen Jin
The interaction between disease and disease information on complex networks has facilitated an interdisciplinary research area. When a disease begins to spread in the population, the corresponding information would also be transmitted among individuals, which in turn influence the spreading pattern of the disease. In this paper, firstly, we analyze the propagation of two representative diseases (H7N9 and Dengue fever) in the real-world population and their corresponding information on Internet, suggesting the high correlation of the two-type dynamical processes. Secondly, inspired by empirical analyses, we propose a nonlinear model to further interpret the coupling effect based on the SIS (Susceptible-Infected-Susceptible) model. Both simulation results and theoretical analysis show that a high prevalence of epidemic will lead to a slow information decay, consequently resulting in a high infected level, which shall in turn prevent the epidemic spreading. Finally, further theoretical analysis demonstrates that a multi-outbreak phenomenon emerges via the effect of coupling dynamics, which finds good agreement with empirical results. This work may shed light on the in-depth understanding of the interplay between the dynamics of epidemic spreading and information diffusion. ...
Journal article (2017) - Nan Zhou, Xiuxiu Zhan, Song Lin, Shang-Hui Yang, Chuang Liu , Gui-Quan Sun, Zi-Ke Zhang
Purpose - Information carriers (including mass media and We-Media) play important roles in information diffusion on social networks. The purpose of this paper is to investigate changes in the dissemination of information combing with data analysis. Design/methodology/approach - This work analyzed nearly 200 years of coverage of different information carriers during different periods of human society, from the period of only mouth-to-mouth communication to the period of modern society. Information diffusion models are built to illustrate how the information dynamic changes with time and combined box office data of several movies to predict the process of information diffusion. In addition, a metric is defined to identify which information would become news in the future. Findings - Results show that with the development of information carriers, information spreads faster and wider nowadays. The correctness of the metric proposed has been validated. Research limitations/implications - The structure of social networks influences the dissemination of information. There are an enormous number of factors that influence the formation of hotspots. Practical implications - The results and conclusion of this work will benefit by predicting the evolution of information carriers. The metric proposed will aid in searching hot news in the future. Originality/value - This work may shed some light on a better understanding of information diffusion, spreading not only on social networks but also on the carriers used for the information spreading. ...
Review (2016) - Zi-Ke Zhang , Chuang Liu , Xiuxiu Zhan, Xin Lu, Chu-Xu Zhang, Yi-Cheng Zhang
The ongoing rapid expansion of the Word Wide Web (WWW) greatly increases the information of effective transmission from heterogeneous individuals to various systems. Extensive research for information diffusion is introduced by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and empirical studies, unification and comparison of different theories and approaches are lacking, which impedes further advances. In this article, we review recent developments in information diffusion and discuss the major challenges. We compare and evaluate available models and algorithms to respectively investigate their physical roles and optimization designs. Potential impacts and future directions are discussed. We emphasize that information diffusion has great scientific depth and combines diverse research fields which makes it interesting for physicists as well as interdisciplinary researchers. ...