vote algorithm
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
Yifei Shen ◽  
Qinjie Chu ◽  
Xinxin Yin ◽  
Yinjun He ◽  
Panpan Bai ◽  
...  

Abstract Gene expression profiling holds great potential as a new approach to histological diagnosis and precision medicine of cancers of unknown primary (CUP). Batch effects and different data types greatly decrease the predictive performance of biomarker-based algorithms, and few methods have been widely applied to identify tissue origin of CUP up to now. To address this problem and assist in more precise diagnosis, we have developed a gene expression rank-based majority vote algorithm for tissue origin diagnosis of CUP (TOD-CUP) of most common cancer types. Based on massive tissue-specific RNA-seq data sets (10 553) found in The Cancer Genome Atlas (TCGA), 538 feature genes (biomarkers) were selected based on their gene expression ranks and used to predict tissue types. The top scoring pairs (TSPs) classifier of the tumor type was optimized by the TCGA training samples. To test the prediction accuracy of our TOD-CUP algorithm, we analyzed (1) two microarray data sets (1029 Agilent and 2277 Affymetrix/Illumina chips) and found 91% and 94% prediction accuracy, respectively, (2) RNA-seq data from five cancer types derived from 141 public metastatic cancer tumor samples and achieved 94% accuracy and (3) a total of 25 clinical cancer samples (including 14 metastatic cancer samples) were able to classify 24/25 samples correctly (96.0% accuracy). Taken together, the TOD-CUP algorithm provides a powerful and robust means to accurately identify the tissue origin of 24 cancer types across different data platforms. To make the TOD-CUP algorithm easily accessible for clinical application, we established a Web-based server for tumor tissue origin diagnosis (http://ibi. zju.edu.cn/todcup/).


Author(s):  
Cagatay Catal ◽  
Serkan Tugul ◽  
Basar Akpinar

Software repositories consist of thousands of applications and the manual categorization of these applications into domain categories is very expensive and time-consuming. In this study, we investigate the use of an ensemble of classifiers approach to solve the automatic software categorization problem when the source code is not available. Therefore, we used three data sets (package level/class level/method level) that belong to 745 closed-source Java applications from the Sharejar repository. We applied the Vote algorithm, AdaBoost, and Bagging ensemble methods and the base classifiers were Support Vector Machines, Naive Bayes, J48, IBk, and Random Forests. The best performance was achieved when the Vote algorithm was used. The base classifiers of the Vote algorithm were AdaBoost with J48, AdaBoost with Random Forest, and Random Forest algorithms. We showed that the Vote approach with method attributes provides the best performance for automatic software categorization; these results demonstrate that the proposed approach can effectively categorize applications into domain categories in the absence of source code.


Sign in / Sign up

Export Citation Format

Share Document