Extracting location context from transcripts

a comparison of ELMo and TF-IDF

Bachelor thesis (2020)

Authors

D.V. Happel Electrical Engineering, Mathematics and Computer Science

Contributors

D.M.J. Tax Pattern Recognition and Bioinformatics - (mentor)

M. Loog Pattern Recognition and Bioinformatics - (mentor)

T.J. Viering Pattern Recognition and Bioinformatics - (mentor)

S. Makrodimitris Pattern Recognition and Bioinformatics - (mentor)

A. Naseri Jahfari Pattern Recognition and Bioinformatics - (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Natural Language Processing Text Classification Word embedding TF-IDF ELMo

To reference this document use:

http://resolver.tudelft.nl/uuid:ad4e3624-4f39-4a64-a678-c232e3f8d7da

More Info

expand_more

Published Date

22-06-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Using transcripts of the TV-series FRIENDS, this paper explores the problem of predicting the location in which a sentence was said. The research focuses on using feature extraction on the sentences, and training a logistic regression model on those features. Specifically looking at the differences in performance between using ELMo and TF-IDF for this feature extraction, achieving an accuracy rate of 58\% and 67\% respectively on a binary classification. The paper also explores the effect of several data cleaning techniques on the results.

Git repository containing the source code used in the paper - https://github.com/David-Happel/scene-location-NLP

Files

Research_Paper.pdf

(.pdf | 0.358 Mb)