Powerful world of (meta-)genomics held back by lack of Standard Operating Procedures

None, None

Powerful world of (meta-)genomics held back by lack of Standard Operating Procedures

A computer science-oriented analysis on automated metagenomic approaches and pipelines, their common practices, and technical shortcomings

Bachelor Thesis (2021)

Author(s)

N. Salami (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

David Gregory Weissbrodt – Mentor (TU Delft - BT/Environmental Biotechnology)

Thomas Abeel – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

B. Abbas – Coach (TU Delft - BT/Environmental Biotechnology)

Gosia Migut – Coach (TU Delft - Computer Science & Engineering-Teaching Team)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Bioinformatics Metagenomics Metagenomic tools Standard Operating Procedures SOP

To reference this document use:

https://resolver.tudelft.nl/uuid:56d7cb92-f012-4474-a0a6-290381a83a92

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

28-06-2021

Awarding Institution

Delft University of Technology

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Context: The study and analysis of (meta-)genomics have been providing scientists with valuable insights into the functioning and composition of microbial communities. Latest advancements in next-gen and high throughput sequencing technologies have resulted in significant growth in the data produced and made available for further research. These advancements can help scientists dive deeper into the analysis of uncultivated microbial populations that may have important roles in their environments. Gap: However, analysis of such data requires multiple preprocessing and computational steps to interpret the microbial and genetic composition of samples. For most researchers, configuring these tools, linking them with advanced binning and annotation tools, and maintaining the provenance of the processing continues to be extremely challenging. Moreover, the most common issue with current practices of metagenomics is the reproducibility of the research due to the complexity of setup and configurations. Aim: Our aim is to get a big-picture understanding of the common practices and approaches for metagenomic analyses and to find out which ones are more often used by researchers and why. Further, to compare some of the existing tools and look into possibilities of developing and/or using a reproducible pipeline and give some general recommendations for it. Methods: For this purpose, three main methods were used. First, a literature survey was performed on metagenomic analysis approaches, methodologies, and tools. Next, researchers and scientists with different educational backgrounds active in this field were interviewed. Lastly, the process of pipeline construction and bottlenecks were evaluated through hands-on experience. Findings: By conducting this research, several common pitfalls and shortcomings of metagenomic analysis practices were identified. Since the expertise of most researchers in this field is lacking a fundamental computer science and programming background, very few would attempt developing a pipeline from scratch. Therefore, if instead, they would opt for using “ready-made” General Purpose Pipelines (GPP), they would also face various difficulties in setting up and configuring them to their needs. Also, it has been observed that many of the existing metagenomic tools are not developed and maintained according to computer science code production standards. Therefore, even the more popular tools can suffer from detrimental bugs that can render them broken and consequently deprecated. However, with the emergence of the new “all-in-one” interface-based online platforms such as Kbase.us that enable simple point-and-click set-up and sharing of workflow, there is hope for entering a new era of reproducible metagenomic analysis.

Files

Powerful_world_of_meta_genomic... (pdf)

(pdf | 6.41 Mb)

License info not available