Scalable machine learning algorithms on a big data infrastructure

Master Thesis (2016)
Author(s)

C. Folkers

Contributor(s)

Z. Al-Ars – Mentor

Copyright
© 2016 Folkers, C.
More Info
expand_more
Publication Year
2016
Copyright
© 2016 Folkers, C.
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Two currently popular topics in computer science are machine learning and big data. Often the two are combined to obtain powerful machines with learning capabilities or high throughput data analysis programs among others. This research analyses which machine learning techniques qualify to be efficiently implemented on a scalable big data infrastructure. Several machine learning algorithms are analyzed and modified to scale on a multi-processor machine. Furthermore this thesis investigates the scalability potential of an existing image segmentation pipeline, used for cancer diagnostics, containing an artificial neural network. The neural network is implemented according to one of the proposed scalable algorithms on a 64 CPU, 256 thread PowerPC-7 cluster with 64 CPU's capable of running 256 threads. While suffering from a large overhead penalty, the pipelines run time is still reduced greatly and show excellent scalability. This scalability allows for greater input sets with equal execution times by expanding the platforms resources. This provides an opening for future research in improving the pipelines diagnostics capability.

Files

Thesis.pdf
(pdf | 7.71 Mb)
License info not available