Sentiment Analysis

a comparison of feature sets for social data and reviews

Master thesis (2018)

Authors

L.I. Kreuk Electrical Engineering, Mathematics and Computer Science

Contributors

N. Tintarev (supervisor 1)

G.J.P.M. Houben (supervisor 2)

Julián Urbano (supervisor 2)

Faculty

Electrical Engineering, Mathematics and Computer Science

Sentiment Analysis Feature extraction Reviews Social data

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:eca6e7b5-a846-424b-ba44-84c060c29d97

Published Date

02-11-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Consumers share their experiences or opinion about products or brands in various channels nowadays, for example on review websites or social media. Sentiment analysis is used to predict the sentiment of text from consumers about these products or brands in order to understand the tone of customers towards these products or brands. This thesis addresses sentiment analysis in the product domain on sentence level. In this thesis three data types are used which are collected by Unilever, review data which is text that contains the opinion of a customer towards a specific product. Social data, which can be tweets, Facebook messages, Instagram messages etc. and phone data which is a summary of a phone call of a customer about a specific product.

When conducting sentiment analysis one solution is to extract features from the data which can be given to a machine learning algorithm together with sentiment labels given by human annotators. The machine learning algorithm will generate a classifier which can predict a label for sentences.
In sentiment analysis literature it is often not clear why certain features are chosen or for which data type certain features will work well. In this research we compare the differences when using several feature sets for the different data types.

We propose three feature sets for review data and three feature sets for social data. We focus on two aspects, comparing the different feature sets and comparing the data types. In our results we do not find significant differences in performance between the feature sets. The results suggest there might be feature sets which can improve sentiment analysis specifically for the data type, but a general feature set with standard features can be comparable to that result.

Files

Thesis_report_Repository_.pdf

(.pdf | 1.73 Mb)