Extracting location context from transcripts
a comparison of ELMo and TF-IDF
D.V. Happel (TU Delft - Electrical Engineering, Mathematics and Computer Science)
David M. J. Tax – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
M. Loog – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Tom J. Viering – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
S. Makrodimitris – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Arman Naseri – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Git repository containing the source code used in the paper.
https://github.com/David-Happel/scene-location-NLPOther than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Using transcripts of the TV-series FRIENDS, this paper explores the problem of predicting the location in which a sentence was said. The research focuses on using feature extraction on the sentences, and training a logistic regression model on those features. Specifically looking at the differences in performance between using ELMo and TF-IDF for this feature extraction, achieving an accuracy rate of 58\% and 67\% respectively on a binary classification. The paper also explores the effect of several data cleaning techniques on the results.
Git repository containing the source code used in the paper - https://github.com/David-Happel/scene-location-NLP