Skip to Content

Random forest based potential k nearest neighbor classifier and its application in gene expression data

Systems Engineering - Theory & Practice

ISSN 1000-6788 CN 11-2267/N

Paper reference: 
2012, 32(4): 815-825.
Publication Date: 

Random forests (RF) has been widely used in bioinformatics especially in cancer diagnosis. This paper studies the classification scheme of RF from the viewpoint of adaptive k nearest neighbors, analyzes the information loss in RF, and proposes a new voting method called RF-based potential nearest neighbor which can use the information of OOB samples in each tree and show significant improvement. Comparison result on 6 cancer gene expression datasets demonstrated that RF-PN got better predictive accuracy than RF.


                                                   Fig 1. Potential Nearest Neighbor & Decision Tree


                         Table 1. The difference of average classification accuracy between RF-PN and RF


                                                                    Figure 2. Brain Tumor Dataset

               (X axis: Forest Based On Different Decision Trees, Y axis: Average Classification Accuracy Rate)


                                                                   Figure 3. DLBCL Dataset


                                                                      Figure 4. Leukemia Dataset


                                                                        Figure 5. Lung Cancer Dataset


                                                                     Figure 6. Tumor 1 Dataset


                                                                     Figure 7. Tumor 2 Dataset