Classification of diverse bacterial populations

None, None; None, None

Classification of diverse bacterial populations

Bachelor Thesis (2018)

Author(s)

J. Uljee (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Y.C. de Vries (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Christine Anyansi – Mentor

Thomas Abeel – Coach

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Benchmark Bioinformatics Bacteria Strains Metagenomics Sequencing Metagenomic tools

To reference this document use:

https://resolver.tudelft.nl/uuid:9580b976-fd95-4798-a13d-86ac479ab3eb

More Info

expand_more

Publication Year

2018

Language

English

Copyright

Graduation Date

16-02-2018

Awarding Institution

Delft University of Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Accurate diagnosis and treatment of patients infected with multiple strains of a pathogen is a challenging task. The use of whole genome sequencing techniques provide high potential to give proper insight into the microbial composition of human metagenomic samples. Distinguishing multiple strains of a certain species is difficult due to the high similarity in genetic content. Currently several tools aimed at the identification of different strains in metagenomic sequence data are available. We present an independent benchmark to compare the performance of several of these tools. The tools have been evaluated with a variety of synthetic metagenomic samples containing strain mixtures of the species Enterococcus, Escherichia coli and Mycobacterium tuberculosis.
To facilitate this research, a benchmark framework in Python 3 was built. This framework made it possible to test the performance of tools aiming at unraveling the composition of sequence data. It is able to automatically generate batches of metagenomic readsets with custom predefined properties. The tools can easily do their analysis on those reads in a streamlined fashion. The output of the tools are put in a standardized format to make the complete comparison of tools easier.
This framework has been built as part of our Bachelor End Project over the course of 10 weeks. In the first few weeks we became familiar with the domain of bioinformatics and the type of tools that had to be included in this research. The implementation of the framework required thorough understanding of the tools and took quite some time to implement. Towards the end of the project, the framework has been used to run the tools with a large variety of synthetic readsets. Analysis of these outputs resulted in an insightful overview of the tools capabilities as presented in this paper.

Files

BEP_mixedbugs_report_Final_2.p... (pdf)

(pdf | 2.95 Mb)

License info not available