Chemotherapy and hormonal therapy as adjuvant systemic therapies to inhibit breast cancer recurrence are not necessary for each patient. In Veer's paper "Gene expression profiling predicts clinical outcome of breast cancer" (Nature 2002, PMID: 11823860), they introduced a method based on DNA microarray technology, which tried to identify a gene expression profile with high cancer recurrence potential as poor prognosis group to receive adjuvant therapy. As a start for my furtherstudy about biomarker discovery on immune response protein microarra, I reproduced the three-step classfication procedure introduced in this paper by R. From thereproducing, I realized the sensitivy of hit genelist heavily rely on the training set by random sampling test. Besides reproducing, I also tried other popular gene selection strategy by using classification such as recursive feature elimination, shrunken centroid and so on.
Conclusions: Hit gene list as a very important subset of genes most related to disease outcome is highly depending on the training set. At the same time, the popular statistical methods for gene selection has the un-avoidable drawback that ignore the relationships between genes because of using univariate ranking. So, the gene selection methods in the process of classification is more promising such as random forest, shrunken centroids and so on.