Identifying memes and their interactions in online communities

More Info
expand_more

Abstract

Memes are theorized to be the building blocks of culture. Due to a lack of empirical validation, however, the theory of memes — memetics — remains in its infancy. We argue that one of the missing components for such empirical validation is a method for the large-scale identification of memes.

In this thesis, we develop a method for the identification of scientific memes — ngrams of length 1 through 4, denoting scientific concepts — propagating within online communities. With data extracted from science-oriented correspondence extracted from five communities on the online discussion platform Reddit, and five communities on the online question and answer platform StackExchange, we perform a large-scale automated evaluation in which we find that memes identified in these communities correspond to the titles of Wikipedia articles; and a small-scale human evaluation in which we find that the identified memes represent relevant concepts to the community’s scientific field.

Furthermore, we introduce a slight adaptation of this method to elucidate one of memetics’ predictions: the occurrence of interactions between memes, where the occurrence of one meme has a positive or negative influence on the propagation of another meme. To evaluate this method for the identification of meme interactions, we construct meme interaction networks, in which we find that the most central memes correspond to the most relevant scientific concepts.

We find that our methods are able to extract key concepts within online communities, identifying thousands of relevant concepts from millions of candidate ngrams. Thus, our method may contribute to contemporary text mining research, and could be used in place of, or in conjunction with current approaches, such as TF-IDF or LDA.