Print Email Facebook Twitter Early DNA Analysis Using Incomplete DNA Data Title Early DNA Analysis Using Incomplete DNA Data Author Li, Minfeng (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Al-Ars, Z. (mentor) Degree granting institution Delft University of Technology Programme Electrical Engineering | Embedded Systems Date 2018-08-23 Abstract In the past few years, considerable attention has been paid to reduce the computational time for the analysis of genome data, which eliminated critical computational bottlenecks in the time needed for the analysis of DNA information. However, the analysis of genome data is still facing time consuming challenges due to the slow speed of DNA sequencing machines. DNA sequencing is a time-consuming process that could take days to sequence even a single sample. This limits the speed of existing DNA analysis methods since they all need to wait for getting the full sequenced DNA data before they start the analysis. As a result, DNA analysis pipelines are not able to benefit from the reduced computational analysis time. Recently, a new method called early DNA analysis was introduced where the genome analysis pipeline is started withincomplete DNA data before all DNA sequencing finishes, which opens the door to decrease the total time consumption of DNA analysis including the sequencing time. In this thesis, a parallel implementation of the early DNA analysis approach based on the Apache Spark big data framework is proposed to improve its performance. Besides, using incomplete DNA data sets brings also a slight drop of the accuracy in genome analysis. The original method proposed a few simple methods to complete the unknown DNA data, but these can be improved to increase the accuracy. Therefore, a few new algorithms are also proposed and tested to increase accuracy in this thesis. Results show that the proposed scalability solution towards early DNA analysis could achieve a 7.6× speed-up with 97.48% correctness when deployed on a 4-node Power7+ cluster, while one of the advanced completion algorithms could increase the classification accuracy for unknown DNA data by 0.006%. To reference this document use: http://resolver.tudelft.nl/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208 Part of collection Student theses Document type master thesis Rights © 2018 Minfeng Li Files PDF Early_DNA_Analysis_Using_ ... A_Data.pdf 2.09 MB Close viewer /islandora/object/uuid:bea21c14-1aa7-4f75-8cd4-b23f17589208/datastream/OBJ/view