Sem2Vec

None, None; None, None

Sem2Vec

Semantic Word Vectors with Bidirectional Constraint Propagations

Journal Article (2021)

Author(s)

Taygun Kekec (TU Delft - Pattern Recognition and Bioinformatics)

David M.J. Tax (TU Delft - Pattern Recognition and Bioinformatics)

Research Group

Pattern Recognition and Bioinformatics

DOI related publication

https://doi.org/10.1109/TKDE.2019.2942021

Constraint propagation Word embeddings Embedding stability Semantic embeddings Thesaurus

To reference this document use:

https://resolver.tudelft.nl/uuid:270bc579-abbf-4cfb-bea9-d7c1c3821318

More Info

expand_more

Publication Year

2021

Language

English

Research Group

Pattern Recognition and Bioinformatics

Issue number

4

Volume number

33

Pages (from-to)

1750-1762

Abstract

Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents.

No files available

Metadata only record. There are no files for this record.