A Hybrid Approach for Sentence Similarity: Combining Semantic and Structural Similarity Metrics

None, None

A Hybrid Approach for Sentence Similarity: Combining Semantic and Structural Similarity Metrics

Bachelor Thesis (2021)

Author(s)

W.G. Haakman (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Pradeep Kumar Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

R. Marroquim – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Sentence Similarity Hybrid Structural Semantic Artificial intelligence

To reference this document use:

https://resolver.tudelft.nl/uuid:95d6bd83-a27a-41c3-9531-43145dd57d86

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

01-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Predicting similarity between sentence pairs is essential for applications such as recommender systems and plagiarism detection. There have been several categories of approaches for predicting sentence similarity. This paper combines approaches from two categories, semantic and structural similarity, to find a hybrid approach that aligns more closely with how humans determine similarity.
Extensive background research is conducted to understand the scope of the problem, to be able to understand human psychology, as well as the advantages and disadvantages of already existing approaches. Based on the insights from the background research, we propose a novel hybrid approach, combining semantic and structural similarity metrics. The proposed approach is evaluated on the STSS-131 and MSRP datasets and compared with other common approaches and SentenceBERT, a deep learning algorithm. The proposed approach does not perform as well as SentenceBERT but makes up for this by being more explainable and outperforming all other traditional machine learning approaches in its accuracy in predicting sentence similarity. This paper also provides a critical conclusion with recommendations for further improvements.

Files

Wout_Haakman_Research_Paper_Fi... (pdf)

(pdf | 0.203 Mb)

License info not available