Comparison of Linguistic Language Classification based on Origin and Data Driven Language Classification using the IPA and Clustering

None, None

Comparison of Linguistic Language Classification based on Origin and Data Driven Language Classification using the IPA and Clustering

Bachelor Thesis (2021)

Author(s)

I.G.M. Rethans (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

T.J. Viering – Mentor

S. MAKRODIMITRIS – Mentor

A. Naseri Jahfari – Mentor

Faculty

Electrical Engineering, Mathematics and Computer Science

Clustering Language Classification Similarity

To reference this document use

https://resolver.tudelft.nl/uuid:d0ebe73c-b0db-49f0-b8f7-000de3927412

More Info

expand_more

Publication Year

2021

Language

English

Graduation Date

01-07-2021

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

276

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Language similarity is very useful for enrichment data in both Natural Lanuguage Processing (NLP) and Automatic Speech Recognition (ASR). A clustering algorithm could provide an efficient means to define language similarity in a data-driven way. This research investigates the relation between linguistic classification by origin and data driven classification based on the pronunciation of languages using k-means clustering where the focus is placed
on the Indo-European languages. The results show large variation in cluster results and consequently large variation in correspondence with linguistic
classification. This is caused by a relatively even spread of the data over the feature space. Still, the results indicate significance in the relation between
the two classification methods. Furthermore, this research functions as a foundation and a source of inspiration for a lot of possible future research.

Files

Language_similarity_CSE300_1_.... (pdf)

(pdf | 0.545 Mb)

License info not available