I.T. Kekec | TU Delft Repository

Sem2Vec

Semantic Word Vectors with Bidirectional Constraint Propagations

Journal article (2021) - Taygun Kekec, David M.J. Tax

Word embeddings learn a vector representation of words, which can be utilized in a large number of natural language processing applications. Learning these vectors shares the drawback of unsupervised learning: representations are not specialized for semantic tasks. In this work, we propose a full-fledged formulation to effectively learn semantically specialized word vectors (Sem2Vec) by creating shared representations of online lexical sources such as Thesaurus and lexical dictionaries. These shared representations are treated as semantic constraints for learning the word embeddings. Our methodology addresses size limitation and weak informativeness of these lexical sources by employing a bidirectional constraint propagation step. Unlike raw unsupervised embeddings that exhibit low stability and easily subject to changes under randomness, our semantic formulation learns word vectors that are quite stable. An extensive empirical evaluation on the word similarity task comprised of 11 word similarity datasets is provided where our vectors suggest notable performance gains over state of the art competitors. We further demonstrate the merits of our formulation in document text classification task over large collections of documents. ...

Exponential Word Embeddings: Models and Approximate Learning

Doctoral thesis (2019) - Taygun Kekec

The digital era floods us with an excessive amount of text data. To make sense of such data automatically, there is an increasing demand for accurate numerical word representations. The complexity of natural languages motivates to represent words with high dimensional vectors. However, learning in a high dimensional space is challenging when the amount of training data is noisy and scarce. Additionally, lingual dependencies complicate learning, mostly because computational resources are limited and typically insufficient to account for all possible dependencies. This thesis addresses both challenges by following a probabilistic machine learning approach to find vectors, word embeddings, performing well under aforementioned limitations. An important finding of this thesis is that by bounding the length of the vector that represents a word as well as penalizing the discrepancy between vectors representing different words make a word embedding robust, which is especially beneficial when noisy and little training data is available. This finding is important because it shows how current word embedding methods are sensitive to small variations in the training data. Although, one might conclude from this finding that more data is not necessary anymore, this thesis does show that training on multiple sources, such as dictionaries and thesaurus, improves performance. But, each data source should be treated carefully, and it is important to weigh informative parts of each data source appropriately. To deal with lingual dependencies, this thesis introduces statistical negative sampling with which the learning objective of a word embedding can be approximated. There are many degrees of freedom in the approximated learning objective, and this thesis argues that current embedding strategies are based on weak heuristics to constrain these freedoms. Novel and more theoretical founded constraints are being proposed to constrain the approximations that are based on global statistics and maximum entropy. Finally, many words in a natural language have multiple meanings, and current word embeddings do not address this because they are built on a common assumption that one vector per word representation can capture all word meanings. This thesis shows that a representation based on multiple vectors per word easily overcomes this limitation by having different vectors representing the different meanings of a word. Taken together, this thesis proposes new insights and a more theoretical foundation for word embeddings which are important to create more powerful models able to deal with the complexity of natural languages. ...

The digital era floods us with an excessive amount of text data. To make sense of such data automatically, there is an increasing demand for accurate numerical word representations. The complexity of natural languages motivates to represent words with high dimensional vectors. However, learning in a high dimensional space is challenging when the amount of training data is noisy and scarce. Additionally, lingual dependencies complicate learning, mostly because computational resources are limited and typically insufficient to account for all possible dependencies. This thesis addresses both challenges by following a probabilistic machine learning approach to find vectors, word embeddings, performing well under aforementioned limitations. An important finding of this thesis is that by bounding the length of the vector that represents a word as well as penalizing the discrepancy between vectors representing different words make a word embedding robust, which is especially beneficial when noisy and little training data is available. This finding is important because it shows how current word embedding methods are sensitive to small variations in the training data. Although, one might conclude from this finding that more data is not necessary anymore, this thesis does show that training on multiple sources, such as dictionaries and thesaurus, improves performance. But, each data source should be treated carefully, and it is important to weigh informative parts of each data source appropriately. To deal with lingual dependencies, this thesis introduces statistical negative sampling with which the learning objective of a word embedding can be approximated. There are many degrees of freedom in the approximated learning objective, and this thesis argues that current embedding strategies are based on weak heuristics to constrain these freedoms. Novel and more theoretical founded constraints are being proposed to constrain the approximations that are based on global statistics and maximum entropy. Finally, many words in a natural language have multiple meanings, and current word embeddings do not address this because they are built on a common assumption that one vector per word representation can capture all word meanings. This thesis shows that a representation based on multiple vectors per word easily overcomes this limitation by having different vectors representing the different meanings of a word. Taken together, this thesis proposes new insights and a more theoretical foundation for word embeddings which are important to create more powerful models able to deal with the complexity of natural languages.

Boosted negative sampling by quadratically constrained entropy maximization

Journal article (2019) - Taygun Kekec, David Mimno, David M.J. Tax

Learning probability densities for natural language representations is a difficult problem because language is inherently sparse and high-dimensional. Negative sampling is a popular and effective way to avoid intractable maximum likelihood problems, but it requires correct specification of the sampling distribution. Previous state of the art methods rely on heuristic distributions that appear to do well in practice. In this work, we define conditions for optimal sampling distributions and demonstrate how to approximate them using Quadratically Constrained Entropy Maximization(QCEM). Our analysis shows that state of the art heuristics are restrictive approximations to our proposed framework. To demonstrate the merits of our formulation, we apply QCEM to matching synthetic exponential family distributions and to finding high dimensional word embedding vectors for English. We are able to achieve faster inference on synthetic experiments and improve the correlation on semantic similarity evaluations on the Rare Words dataset by 4.8%. ...

Supervising topic models with Gaussian processes

Journal article (2018) - Melih Kandemir, Taygun Kekeç, Reyyan Yeniterzi

Topic modeling is a powerful approach for modeling data represented as high-dimensional histograms. While the high dimensionality of such input data is extremely beneficial in unsupervised applications including language modeling and text data exploration, it introduces difficulties in cases where class information is available to boost up prediction performance. Feeding such input directly to a classifier suffers from the curse of dimensionality. Performing dimensionality reduction and classification disjointly, on the other hand, cannot enjoy optimal performance due to information loss in the gap between these two steps unaware of each other. Existing supervised topic models introduced as a remedy to such scenarios have thus far incorporated only linear classifiers in order to keep inference tractable, causing a dramatical sacrifice from expressive power. In this paper, we propose the first Bayesian construction to perform topic modeling and non-linear classification jointly. We use the well-known Latent Dirichlet Allocation (LDA) for topic modeling and sparse Gaussian processes for non-linear classification. We combine these two components by a latent variable encoding the empirical topic distribution of each document in the corpus. We achieve a novel variational inference scheme by adapting ideas from the newly emerging deep Gaussian processes into the realm of topic modeling. We demonstrate that our model outperforms other existing approaches such as: (i) disjoint LDA and non-linear classification, (ii) joint LDA and linear classification, (iii) joint non-LDA linear subspace modeling and linear classification, and (iv) non-linear classification without topic modeling, in three benchmark data sets from two real-world applications: text categorization and image tagging. ...

Markov Random Field for Wind Farm Planning

Conference paper (2017) - Hale Cetinay-Iyicil, Taygun Kekec, Fernando Kuipers, David Tax

Many countries aim to integrate a substantial amount of wind energy in the near future. This requires meticulous planning, which is challenging due to the uncertainty in wind profiles. In this paper, we propose a novel framework to discover and investigate those geographic areas that are well suited for building wind farms. We combine the key indicators of wind farm investment using fuzzy sets, and employ multiple-criteria decision analysis to obtain a coarse wind farm suitability value. We further demonstrate how this suitability value can be refined by a Markov Random Field (MRF) that takes the dependencies between adjacent areas into account. As a proof of concept, we take wind farm planning in Turkey, and demonstrate that our MRF modeling can accurately find promising areas ...

Robust Gram Embeddings

Conference paper (2016) - Taygun Kekec, David Tax

Word embedding models learn vectorial word representations that can be used in a variety of NLP applications. When training data is scarce, these models risk losing their generalization abilities due to the complexity of the models and the overfitting to finite data. We propose a regularized embedding formulation, called Robust Gram (RG), which penalizes overfitting by suppressing the disparity between target and context embeddings. Our experimental analysis shows that the RG model trained on small datasets generalizes better compared to alternatives, is more robust to variations in the training set, and correlates well to human similarities in a set of word similarity tasks. ...