A Hybrid Approach for Sentence Similarity: Combining Semantic and Structural Similarity Metrics

Bachelor Thesis (2021)
Author(s)

W.G. Haakman (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Pradeep Kumar Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

R. Marroquim – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Wout Haakman
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Wout Haakman
Graduation Date
01-07-2021
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Predicting similarity between sentence pairs is essential for applications such as recommender systems and plagiarism detection. There have been several categories of approaches for predicting sentence similarity. This paper combines approaches from two categories, semantic and structural similarity, to find a hybrid approach that aligns more closely with how humans determine similarity.
Extensive background research is conducted to understand the scope of the problem, to be able to understand human psychology, as well as the advantages and disadvantages of already existing approaches. Based on the insights from the background research, we propose a novel hybrid approach, combining semantic and structural similarity metrics. The proposed approach is evaluated on the STSS-131 and MSRP datasets and compared with other common approaches and SentenceBERT, a deep learning algorithm. The proposed approach does not perform as well as SentenceBERT but makes up for this by being more explainable and outperforming all other traditional machine learning approaches in its accuracy in predicting sentence similarity. This paper also provides a critical conclusion with recommendations for further improvements.

Files

License info not available