Prognostic Molecular Classification of Breast Cancer Based on Features Extracted from a Scale Space

More Info
expand_more

Abstract

Breast cancer is one of the most prevalent cancers affecting females in the world. In recent years, many cancer researchers have been trying to determine molecular prognosis tools that predict cancer patient treatment response and/or chance of survival. In particular, the determination of gene expression signatures obtained by feature selection methods applied to large microarray datasets has shown potential. The main purpose of this study is to extend these gene signatures and molecular prognostic classifiers by investigating features constructed from a scale-space representation of the microarray data. Here, we construct a scale space by first mapping all genes to a one-dimensional functional space using protein family information. Next, we applied successive smoothing to the expression values resulting in one scale-space representation of the gene expression data from one sample. At the lowest scale, the scale space contains the original gene expression values, whereas at higher scales meta-features are formed, which are weighted sums of groups of genes. To test whether a scale-space representation is useful we performed feature selection and classification on a publicly available breast cancer expression dataset. We found that, instead of signatures consisting of single genes, meta-genes (i.e. groups of genes) that exist at higher scales were preferentially selected. We furthermore determined cross-validation errors using seven distinct classifiers (NMC, LDC, QDC, FISHERC, PARZENC, 3NNC, and LOGLC) and found that better performance is obtained using the scale-space representation than with the traditional representation of the gene expression data. As a result, we conclude that the scale-space analysis constitutes a potent way of selecting molecular signatures and is useful for prognostic classification.