A Knowledge Discovery Framework for Understanding Energy Consumption Behavior using Social Data

More Info


Understanding energy consumption behavior provide an insightful knowledge to improve energy efficiency, promote energy conservation, and importantly sustain the human life. However, currently energy consumption data are being gathered by (smart) energy meters at the household level or through surveys. While gathering data using smart meter is highly reliable, it lacks semantic information about how energy is consumed (e.g. using appliance). On the other hand, survey allow to gather semantically rich data, but the acquisition of the data is labor-intensive.

In this context, social media data data (e.g. twitter, instagram) which are semantically rich and publicly available can be used as an alternative source of data about energy consumption behavior. However, due to the noisy and ambiguous nature of social media data, the extraction of energy related information from micro posts is very challenging. The aim of this thesis is to introduce a general framework to discover knowledge about energy consumption behaviors from social media data. The framework explores the suitable of social media data as an alternative data source for capturing energy consumption behaviors, and thus to be used to complement conventional data sources. Using the state-of-the-art methods and approaches in social media data analytics field, we compose the framework which structured into three main stages: data collection, data enrichment & processing, and data analysis & visualization.

To study the performance of our framework, we set up an experiment aiming at identifying energy consumption behavior patterns in two different world cities: Jakarta (Indonesia) and Amsterdam (The Netherlands). On data collection stage, we collected 1,306,336 tweets from both cities. Next, on data enrichment & processing stage, we pre-processed the collected tweets and conduct dictionary-based annotation using our 8,329 energy consumption related terms. As a result, we identified 509,471 tweets (39%) of the corpus as energy consumption related tweets, which categorize into four different energy consumption behaviors: food, dwelling, mobility and leisure. Using the annotated streams as noisy datasets, we implement distant supervision machine learning technique using binomial classifier to identify energy consumption related tweets. Following this approach, we are able to achieve good classifier’s performance on identifying energy consumption related tweets. Finally, on data analysis & visualization stage, we conduct statistical analysis and found strong positive correlation (r = 0.73) between energy consumption data extracted from social media and actual electricity load. Following this result, we show that social media data has the potential to be used as supplementary source of information for energy consumption studies.