Language Models With Meta-information

Doctoral thesis (2014)

Authors

Y. Shi

Contributors

C.M. Jonker (promotor)

M. Larson (promotor)

Department

Intelligent Systems () (TU Delft)

Language Models Recurrent Neural Networks Meta-information

To reference this document use:

http://resolver.tudelft.nl/uuid:d9a0ae1d-3336-4e43-bc3d-7a3a06461f54

More Info

expand_more

Published Date

11-03-2014

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Intelligent Systems

Abstract

Language modeling plays a critical role in natural language processing and understanding. Starting from a general structure, language models are able to learn natural language patterns from rich input data. However, the state-of-the-art language models only take advantage of words themselves, which are not sufficient to characterize the language. In this thesis, we improve recurrent neural network language models (RNNLM) by training them with additional information. Different methods of integrating the different types of additional information into RNNLMs are proposed in this thesis. All the potential information beyond the word itself that can be used to characterize the language is called meta-information. In this thesis, we propose to use different types of meta-information to represent languages such as discourse level information, which is reflected from the whole discourse, sentence level information which characterize the patterns of sentences and morphological information which represents the word from different perspectives. For example, we consider the following Dutch paragraph. < s > represents sentence beginning. < /s > stands for the sentence ending. < s > kan allemaal nog natuurlijk < /s > < s > maar ze ontlopen dan de groepswinnaar in elk geval in de kwartfinale < /s > < s > en vooral Nederland wil graag in Rotterdam die kwartfinale spelen < /s > < s > en dan moet er groepswinst behaald worden < /s > < s > anders verhuizen ze naar Brugge en krijgt het Jan Breydelstadion Oranje dus op bezoek < /s > < s > we gaan er even uit < /s > < s > slotfase zit eraan te komen < /s > < s > twee minuten nog tot het einde plus de toegevoegde tijd < /s > < s > dat is uh toch nog ook wel een paar minuten denk ik < /s > < s > maar de wedstrijd is gespeeld < /s > On the discourse level, this paragraph is labeled as “Live commentaries (broadcast)” from the socio-situational setting (SSS) perspective and “sport” from the topic perspective. On the sentence level, each word except for the beginning word and ending word , is annotated with its preceding word information and succeeding word information. For example, we consider word “slotfase” in the following sentence. < s > slotfase zit eraan te komen < /s >. This word has preceding information “< s >” and succeeding information “zit eraan te komen ”. On the word level, the word “slotfase” is annotated by a vector containing some of the proposed meta-information. On the discourse level, we investigate classification methods for socio-situational settings and topics. On the sentence level, in this thesis, we focus on information such as succeeding words information and whole sentence information. In this thesis, each word is annotated by a vector containing the meta-information collected. Different methods are proposed in this thesis to integrate the meta-information into language models. On the discourse level, a curriculum learning method has been used to combine the socio-situational settings and topics. On the sentence level, forward-backward recurrent neural network language models have been proposed to integrate the succeeding word information and whole sentence information into language models. On the word level, each word has been conditioned on its preceding words as well as on preceding meta-information. The results reported in this thesis show that meta-information can be used to improve the effectiveness of language models at the cost of increasing training time. In this thesis, we address this problem by applying parallel processing techniques. A subsampling stochastic gradient descent algorithm has been proposed to accelerate the training of recurrent neural network language models.

Files

Thesis.pdf

(pdf | 1.46 Mb)