Topological Properties of Semantic Networks

More Info
expand_more

Abstract

The main goal of this thesis is to understand the topological properties of semantic networks, to find language-specific patterns, and to investigate their connection principles. Interpreting unstructured texts in natural language is a crucial task for computers. Natural Language Processing (NLP) applications rely on semantic networks for structured knowledge representation. Although NLP technologies have been applied to various domains with some degree of success, they still face many challenges due to the ambiguity of human language. To inform better algorithms, we need to pay attention to fundamental structures of semantic networks in different languages. However, these remain to be investigated properly. In this thesis we extract semantic networks with 7 distinct relations for 11 languages from ConceptNet. We systematically analyze the degree distribution, degree correlation and clustering of these networks. We also measure their structural similarity and complementarity coefficients. Our findings show that semantic networks have universalities in basic structures: they have high sparsity, high clustering, and power-law degree distributions. Our findings also show that the majority of the considered networks are scale-free. In addition, our results show that networks in different languages exhibit different properties, which are determined by grammatical rules. For example, the networks of highly inflected languages show peaks in the degree distributions that deviate from a power-law. Furthermore, we find that depending on the type of semantic relation and the language, the connection principles of networks are different. Some networks are more similarity-based, while others are more complementarity-based. We conclude the thesis by demonstrating how the knowledge of similarity and complementarity can better inform NLP in link prediction tasks.