Using Skip-Gram Model to Predict from which Show a Given Line is

None, None

Using Skip-Gram Model to Predict from which Show a Given Line is

Bachelor Thesis (2020)

Author(s)

D. Chen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

A. Naseri Jahfari – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Stavros Makrodimitris – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Natural Language Processing Text Classification Skip-Gram Model

To reference this document use:

https://resolver.tudelft.nl/uuid:82350585-b0ba-4664-a6f8-b77a7340114f

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

22-06-2020

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Abstract

Text classification has a wide range of usage such as extracting the sentiment out of a product review, analyzing the topic of a document and spam detection. In this research, the text classification task is to predict from which TV-show a given line is. The skip-gram model, originally used to train the Word2Vec sentence embeddings [Mikolov et al, 2013], is adapted to determine the likelihood of occurrence of a sentence in a TV-show. Based on this feature, a classifier is built to perform the task of this research. The results of the cross-validation show that it reaches an accuracy of 58% when running on the transcript data of 3 shows and 43% on 4 shows, while the accuracies of random guessing are supposed to be 33% and 25%. The difference between the neural networks and the skip-gram model becomes smaller when more shows are added to evaluate the model. Among each 5 fold cross-validation of the two models, the best results appear in the midmost iterations.

Files

Using_Skip_Gram_Dina.pdf

(pdf | 0.231 Mb)

License info not available