Understanding the influence of DNA fragment lengths in detecting cancer

Detection of cancer using blood

Bachelor Thesis (2024)
Author(s)

M. Păun (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

I.B. Pronk – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Daan Hazelaar – Mentor (Erasmus MC)

S. Makrodimitris – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.J.T. Reinders – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.A. Pouwelse – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
28-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
303
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Detecting cancer at an initial stage could change the course of the disease's development. A non-invasive examination consists of the liquid biopsy of blood, revealing biomarkers that could provide information about the existence of a tumour or not in the organism. The research touches upon the relevance of DNA fragments, precisely the length of fragments, in the detection of cancer. An in-depth interpretation of the fragment length distribution for predicting the state of a patient as being healthy or sick with cancer was approached. The distribution was explored from four perspectives: the complete fragment length distribution, the size range from 90 to 150 bp, important lengths selected by the feature extraction methods and the Fourier Transform of the initial data. These were input in three machine learning models. Using the fragment lengths between 93 and 98 produced accuracy and AUC scores of over 0.85 for all supervised classification models. Processing the data with the Fourier Transform and using the amplitude of spectrums as features in the Random Forest model resulted in an AUC of 0.99.

Files

License info not available