Skip to Content

Random forest based potential k nearest neighbor classifier and its application in gene expression data

Journal: 
Systems Engineering - Theory & Practice

ISSN 1000-6788 CN 11-2267/N

Paper reference: 
2012, 32(4): 815-825.
Publication Date: 
2012-04-25

Random forests (RF) has been widely used in bioinformatics especially in cancer diagnosis. This paper studies the classification scheme of RF from the viewpoint of adaptive k nearest neighbors, analyzes the information loss in RF, and proposes a new voting method called RF-based potential nearest neighbor which can use the information of OOB samples in each tree and show significant improvement. Comparison result on 6 cancer gene expression datasets demonstrated that RF-PN got better predictive accuracy than RF.

                                    

                                                   Fig 1. Potential Nearest Neighbor & Decision Tree

                   

                         Table 1. The difference of average classification accuracy between RF-PN and RF

                   

                                                                    Figure 2. Brain Tumor Dataset

               (X axis: Forest Based On Different Decision Trees, Y axis: Average Classification Accuracy Rate)

                     

                                                                   Figure 3. DLBCL Dataset

                     

                                                                      Figure 4. Leukemia Dataset

                     

                                                                        Figure 5. Lung Cancer Dataset

                    

                                                                     Figure 6. Tumor 1 Dataset

                     

                                                                     Figure 7. Tumor 2 Dataset

Authors: