The field of oncology has greatly benefited due to the study of mutational signatures, pat terns of mutations that appear within the cancer genome. Previous research has focused its resources on utilizing various mathematical models to uncover and understand these mutational sign
...
The field of oncology has greatly benefited due to the study of mutational signatures, pat terns of mutations that appear within the cancer genome. Previous research has focused its resources on utilizing various mathematical models to uncover and understand these mutational signatures by looking at the genetic information of aggregated cells, typically sequenced from a tumor biopsy, which is referred to as bulk data. However, recent developments in sequencing techniques have provided us with the possibility of investigating the genetic information at the single cell level rather than bulk. Thus, in this paper, we utilized machine learning-based tools to examine the effect of performing signature fitting at the single-cell level in relation to pseudo-bulk. We found that single cells have a higher degree of expression by contrast to the pseudo bulk, having the capability to identify a higher number of active mutational signatures. We also saw some single cells achieving better accuracy in the reconstruction of their mutational profile, by comparison to the pseudo-bulk. We identified that the heterogeneity across the single cells could be explained by a small number of clusters, which can potentially elucidate the active signatures found at the level of the pseudo-bulk sample. Finally, we found that some pseudo-bulk samples generated from subpopulations of cells unexpectedly deviate from the single cells which created them. From the findings, we believe that the study of active mutational signatures at the level of single cells has the potential to enlarge our understanding of cancer by providing us a more in-depth view of this disease. However, further research should be undergone in order to either augment or refute these findings, mainly due to the limiting factor of a relatively small number of mutations characterizing our data, together with the absence of a ground-truth for the bulk data.