Treatment Prediction in PDAC Patients: A Predictive Model for FOLFIRINOX Chemotherapy Response using Random Forest and Integrative Analysis of Blood and Tumor Markers

More Info
expand_more

Abstract

Pancreatic ductal adenocarcinoma (PDAC) is a devastating disease with a high mortality rate, poor prognosis, and a mere 7.7% 5-year survival rate [1] compared to 65% for all cancer types [2]. Approximately 80% of patients are diagnosed at the advanced stage [3], for which only palliative chemo(radio)therapy remains as treatment option. However, the efficacy of chemotherapy varies among patients, e.g. FOLFIRINOX has a response rate of only 25-30% in metastatic patients and a disease control rate of 70.95% [4]. Therefore, stratifying patients is crucial for individual benefit and for addressing the socioeconomic challenge of rising healthcare costs and increasing cancer incidence. This master thesis aims to investigate the relationships between tumor markers CA19-9 and CEA as well as blood marker data, both before and after one cycle of FOLFIRINOX and their correlation to the chemotherapy response. The goal is to subsequently develop a robust classification model to improve patient stratification and facilitate personalized treatment approaches. The analyzed cohort comprises 247 PDAC patients of which 55% are male and 45% are female participants. Among them, 152 had (borderline) resectable, 54 locally advanced and 41 metastatic PDAC. All patients received FOLFIRINOX treatment, and tumor responses were categorized using RECIST 1.1 [5]. First a thorough data analysis, including outlier and principal component analysis (PCA) is conducted to identify patterns and relationships between and within variables. Subsequently, a robust classification model was built using random forest modeling, accounting for dataset imbalance. Three optimal models are proposed based on (a) only pre-chemotherapy values, (b) values before and after the first FOLFIRINOX cycle and (c) only the top 10 identified variables in (b). For each of the most important variables, partial dependence and accumulated local effect plots are generated to gain further insights into their marginal effect on the classification outcomes. The initial data analysis revealed the prognostic and predictive significance of the tumor marker CA19-9, both before and after one cycle of treatment, and the difference in its levels. Additionally, various blood markers, including Hemoglobin, Thrombocytes and γ-Glutamyl Transferase, showed associations with treatment outcomes. The assessment of variable importance further confirmed these relationships between tumor and blood markers and their impact on treatment response. However, PCA did not identify significant patterns or relationships within or between groups of blood markers. Moreover, the developed random forest classification models exhibited promising balanced accuracy, with values of 0.97, 0.98, and 0.90 for models (a), (b), and (c), respectively, in stratifying PDAC patients into the two distinct response groups (disease control and progressive disease), facilitating treatment decision-making. In conclusion, this master thesis emphasizes the crucial role of comprehensive and rigorous data analysis in PDAC research, particularly when employing machine learning for predicting treatment outcomes. Integrating information from measured tumor and blood markers into the random forest models enables the prediction to FOLFIRINOX therapy both before and after one cycle. The implications of these findings are significant, as they can lead to improved patient management, efficient allocation of resources, personalized approaches, and contribution to the research and development efforts in PDAC. To validate and expand upon the presented results, further studies are required, ultimately advancing the field of personalized medicine in pancreatic cancer. Keywords: pancreatic cancer, PDAC, prediction model, random forest, data analysis, outlier analysis, principal component analysis.