Sem2Vec

Semantic Word Vectors with Bidirectional Constraint Propagations

Journal Article (2021)
Author(s)

Taygun Kekec (TU Delft - Electrical Engineering, Mathematics and Computer Science)

David M.J. Tax (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.1109/TKDE.2019.2942021 Final published version
More Info
expand_more
Publication Year
2021
Language
English
Research Group
Pattern Recognition and Bioinformatics
Issue number
4
Volume number
33
Article number
8840868
Pages (from-to)
1750-1762
Downloads counter
130

Abstract

Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.