Integrated Use of Statistical-Based Approaches and Computational Intelligence Techniques for Tumors Classification Using Microarray

Discrete Dynamics in Nature and Society ◽

10.1155/2015/261013 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8

Author(s):

Chia-Ding Hou ◽

Yuehjen E. Shao

Keyword(s):

Microarray Data ◽

Computational Intelligence ◽

Cross Validation ◽

Expression Data ◽

Data Set ◽

Microarray Experiments ◽

Tumors Classification ◽

Microarray Chips ◽

Real Microarray Data ◽

Validation Experiments

With the recent development of biotechnologies, cDNA microarray chips are increasingly applied in cancer research. Microarray experiments can lead to a more thorough grasp of the molecular variations among tumors because they can allow the monitoring of expression levels in cells for thousands of genes simultaneously. Accordingly, how to successfully discriminate the classes of tumors using gene expression data is an urgent research issue and plays an important role in carcinogenesis. To refine the large dimension of the genes data and effectively classify tumor classes, this study proposes several hybrid discrimination procedures that combine the statistical-based techniques and computational intelligence approaches to discriminate the tumor classes. A real microarray data set was used to demonstrate the performance of the proposed approaches. In addition, the results of cross-validation experiments reveal that the proposed two-stage hybrid models are more efficient in discriminating the acute leukemia classes than the established single stage models.

Download Full-text

Generalized Rank Tests for Replicated Microarray Data

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1093 ◽

2005 ◽

Vol 4 (1) ◽

Cited By ~ 3

Author(s):

Mei-Ling Ting Lee ◽

Robert J Gray ◽

Harry Björkbacka ◽

Mason W Freeman

Keyword(s):

Microarray Data ◽

Block Design ◽

Rank Test ◽

Expression Data ◽

Rank Tests ◽

Microarray Experiments ◽

Randomized Block Design ◽

Signed Rank ◽

Signed Rank Test ◽

Statistical Analysis Software

Gene expression data from microarray experiments have been studied using several statistical models. Significance Analysis of Microarrays (SAM), for example, has proved to be useful in analyzing microarray data. In the spirit of the SAM procedures, we develop permutation based rank-tests for generalized Wilcoxon ranksum test for two-group comparisons of replicated microarray data. Also, for microarray experiments with randomized block design, we consider generalized signed rank test. The statistical analysis software package is written in R and is freely available in a package.

Download Full-text

Normalization for Two-channel Microarray Data

Methods of Information in Medicine ◽

10.1055/s-0038-1633987 ◽

2005 ◽

Vol 44 (03) ◽

pp. 418-422 ◽

Cited By ~ 1

Author(s):

C. Ittrich

Keyword(s):

Gene Expression ◽

Microarray Data ◽

Single Channel ◽

Single Channels ◽

Data Set ◽

Microarray Experiments ◽

Normalization Methods ◽

Sources Of Variation ◽

Many Sources ◽

Gene Expression Levels

Summary Objectives: In two-channel microarray experiments the measured gene expression levels are affected by many sources of systematic variation. Normalization refers to the process of removing such systematic sources of variation, to make measured intensities within and between slides comparable. Some commonly used normalization methods removing intensity-dependent dye bias and adjusting differences in variability between slides will be reviewed with the main focus on intensity-dependent normalization methods. Methods: This article describes different intensity-dependent within-slide normalization methods for the log ratios of red and green channel intensities but also refers to single channel normalization methods incorporating all single channels of the slides at once. Results: The described procedures provide a useful approach to remove systematic sources of variation like intensity-dependent dye bias and variability between slides in cDNA microarray experiments. This is illustrated by an experimental data set. Conclusions: Several reasonable normalization procedures for two-channel microarray data have recently been proposed. Deciding on which method would perform well for a concrete experiment is difficult. Designed spike-in experiments or dilution series with known differences for some selected genes would be helpful to assess the different methods, but may be impractical for most laboratories due to the high costs.

Download Full-text

ASSOCIATION RULE MINING FOR GENE EXPRESSION DATA

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1152 ◽

2014 ◽

pp. 240-243

Author(s):

O. V. KALE ◽

B. F. MOMIN

Keyword(s):

Microarray Data ◽

Association Rule ◽

Search Space ◽

Biological Research ◽

Research Association ◽

Expression Data ◽

Rule Mining ◽

Confidence Measure ◽

Data Set ◽

Space Experiments

Microarray technology has created a revolution in the field of biological research. Association rules can not only group the similarly expressed genes but also discern relationships among genes. We propose a new row-enumeration rule mining method to mine high confidence rules from microarray data. It is a support-free algorithm that directly uses the confidence measure to effectively prune the search space. Experiments on Leukemia microarray data set show that proposed algorithm outperforms support-based rule mining with respect to scalability and rule extraction.

Download Full-text

RANK-BASED CLUSTERING ANALYSIS FOR THE TIME-COURSE MICROARRAY DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004035 ◽

2009 ◽

Vol 07 (01) ◽

pp. 75-91 ◽

Cited By ~ 6

Author(s):

SUNG-GON YI ◽

YOON-JEONG JOO ◽

TAESUNG PARK

Keyword(s):

Microarray Data ◽

Clustering Analysis ◽

Time Course ◽

Testing Procedure ◽

Second Step ◽

Expression Data ◽

Clustering Methods ◽

Microarray Experiments ◽

Cancer Data ◽

Temporal Profiles

Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. In time-course microarray experiments in which gene expression is monitored over time, we are interested in clustering genes that show similar temporal profiles and identifying genes that show a pre-specified candidate profile. Unfortunately, many traditional clustering methods used for analyzing microarray data do not effectively detect temporal profiles for the time-course microarray data. We propose a rank-based clustering analysis for the time-course microarray data. Our clustering method consists of two steps: the first step discretizes the expression data into groups and then transform them into the rank data, the second step performs the rank-based clustering analysis. Our testing procedure uses the bootstrap samples to select the genes that show similar patterns for the candidate profiles. Simulation study is performed to evaluate the performance of the proposed rank-based method. The results are illustrated with the breast cancer data and the Arabidopsis cold stress data.

Download Full-text

Finding Groups in Gene Expression Data

Journal of Biomedicine and Biotechnology ◽

10.1155/jbb.2005.215 ◽

2005 ◽

Vol 2005 (2) ◽

pp. 215-225 ◽

Cited By ~ 16

Author(s):

David J. Hand ◽

Nicholas A. Heard

Keyword(s):

Cluster Analysis ◽

Regulatory Networks ◽

Large Scale ◽

Disease Diagnosis ◽

Expression Data ◽

Data Set ◽

Microarray Experiments ◽

Data Analyst ◽

Bump Hunting ◽

Function Discovery

The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups.

Download Full-text

A Novel Hybrid Approach for Multi-objective Bi-clustering in Microarray data

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200414113840 ◽

2020 ◽

Vol 13 ◽

Author(s):

Naveen Trivedi ◽

Suvendu Kanungo

Keyword(s):

Gene Expression ◽

Microarray Data ◽

Optimization Algorithm ◽

Clustering Algorithm ◽

Expression Patterns ◽

Cell Cycle Gene ◽

Expression Data ◽

Data Set ◽

Multi Objective ◽

Whale Optimization

Background: Today bi-clustering technique plays a vital role to analyze gene expression data in microarray technology. This technique performs clustering on both rows and columns of expression data simultaneously. It determines the expression level of genes set under the subset of several conditions or samples. Basically, obtained information is collected in the form of a sub matrix comprising of microarray data that satisfy coherent expression patterns of subsets of genes with respect to subsets of conditions. These sub matrices are represented as bi-clusters and overall process is called bi-clustering. In this paper, we proposed a new meta-heuristics hybrid ABC-MWOA-CC which is based on artificial bee colony (ABC), modified whale optimization algorithm (MWOA) and Cheng and Church (CC) algorithm to optimize the extracted bi-clusters. In order to validate this algorithm, we also delve into finding the statistical and biological relevancy of extracted genes with respect to various conditions. However, most of the bi-clustering techniques do not address the biological significance of genes belonging to extracted bi-clusters Objective: The major aim of the proposed work is to design and develop a novel hybrid multi-objective bi-clustering approach for in microarray data to produce desired number of valid bi-clusters. Further, these extracted bi-clusters are to be optimized to obtain optimal solution. Method: In the proposed approach, a hybrid multi-objective bi-clustering algorithm which is based on ABC along with MWOA is recommended to group the data into desired number of bi-clusters. Further, ABC with MWOA multi-objective optimization algorithm is applied in order to optimize the solutions using variety of the fitness functions. Results: In the analysis of the result, the multi-objective functions which are employed to judge the fitness calculation like Volume Mean (VM), Mean of Genes (GM), Mean of Conditions (CM) and Mean of MSR (MMSR) leads to improve the performance analysis of the CC bi-clustering algorithm on real life data set such as Yeast Saccharomyces cerevisiae cell cycle gene Expression datasets. Conclusion: The effectiveness of the ABC-MWOA-CC algorithm is comprehensively demonstrated by comparing it with well-known traditional ABC-CC, OPSM and CC algorithm in terms of VM, GM, CM and MMSR.

Download Full-text

EFFICIENTLY FINDING REGULATORY ELEMENTS USING CORRELATION WITH GENE EXPRESSION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720004000612 ◽

2004 ◽

Vol 02 (02) ◽

pp. 273-288 ◽

Cited By ~ 14

Author(s):

HIDEO BANNAI ◽

SHUNSUKE INENAGA ◽

AYUMI SHINOHARA ◽

MASAYUKI TAKEDA ◽

SATORU MIYANO

Keyword(s):

Gene Expression ◽

Dna Sequences ◽

Regulatory Elements ◽

Microarray Gene Expression Data ◽

Upstream Region ◽

Expression Data ◽

Microarray Gene Expression ◽

Data Set ◽

Microarray Experiments ◽

Time Linear

We present an efficient algorithm for detecting putative regulatory elements in the upstream DNA sequences of genes, using gene expression information obtained from microarray experiments. Based on a generalized suffix tree, our algorithm looks for motif patterns whose appearance in the upstream region is most correlated with the expression levels of the genes. We are able to find the optimal pattern, in time linear in the total length of the upstream sequences. We implement and apply our algorithm to publicly available microarray gene expression data, and show that our method is able to discover biologically significant motifs, including various motifs which have been reported previously using the same data set. We further discuss applications for which the efficiency of the method is essential, as well as possible extensions to our algorithm.

Download Full-text

Towards computer-aided severity assessment via deep neural networks for geographic and opacity extent scoring of SARS-CoV-2 chest X-rays

Scientific Reports ◽

10.1038/s41598-021-88538-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. Wong ◽

Z. Q. Lin ◽

L. Wang ◽

A. G. Chung ◽

B. Shen ◽

...

Keyword(s):

Neural Networks ◽

Monte Carlo ◽

Lung Disease ◽

Disease Severity ◽

Deep Neural Networks ◽

Cross Validation ◽

X Rays ◽

Computer Aided ◽

Monte Carlo Cross Validation ◽

Validation Experiments

AbstractA critical step in effective care and treatment planning for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause for the coronavirus disease 2019 (COVID-19) pandemic, is the assessment of the severity of disease progression. Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity, with two important assessment metrics being extent of lung involvement and degree of opacity. In this proof-of-concept study, we assess the feasibility of computer-aided scoring of CXRs of SARS-CoV-2 lung disease severity using a deep learning system. Data consisted of 396 CXRs from SARS-CoV-2 positive patient cases. Geographic extent and opacity extent were scored by two board-certified expert chest radiologists (with 20+ years of experience) and a 2nd-year radiology resident. The deep neural networks used in this study, which we name COVID-Net S, are based on a COVID-Net network architecture. 100 versions of the network were independently learned (50 to perform geographic extent scoring and 50 to perform opacity extent scoring) using random subsets of CXRs from the study, and we evaluated the networks using stratified Monte Carlo cross-validation experiments. The COVID-Net S deep neural networks yielded R$$^2$$ 2 of $$0.664 \pm 0.032$$ 0.664 ± 0.032 and $$0.635 \pm 0.044$$ 0.635 ± 0.044 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively, in stratified Monte Carlo cross-validation experiments. The best performing COVID-Net S networks achieved R$$^2$$ 2 of 0.739 and 0.741 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively. The results are promising and suggest that the use of deep neural networks on CXRs could be an effective tool for computer-aided assessment of SARS-CoV-2 lung disease severity, although additional studies are needed before adoption for routine clinical use.

Download Full-text

Development of Ground Special Vehicle PHM with Case-Based Reason Model

Applied Sciences ◽

10.3390/app11104494 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4494

Author(s):

Qicai Wu ◽

Haiwen Yuan ◽

Haibin Yuan

Keyword(s):

Cross Validation ◽

Optimization Methods ◽

Engineering Practice ◽

Case Based Reasoning ◽

Case Retrieval ◽

Data Set ◽

Attribute Weights ◽

Special Vehicle ◽

Weight Allocation ◽

Case Based

The case-based reasoning (CBR) method can effectively predict the future health condition of the system based on past and present operating data records, so it can be applied to the prognostic and health management (PHM) framework, which is a type of data-driven problem-solving. The establishment of a CBR model for practical application of the Ground Special Vehicle (GSV) PHM framework is in great demand. Since many CBR algorithms are too complicated in weight optimization methods, and are difficult to establish effective knowledge and reasoning models for engineering practice, an application development using a CBR model that includes case representation, case retrieval, case reuse, and simulated annealing algorithm is introduced in this paper. The purpose is to solve the problem of normal/abnormal determination and the degree of health performance prediction. Based on the proposed CBR model, optimization methods for attribute weights are described. State classification accuracy rate and root mean square error are adopted to setup objective functions. According to the reasoning steps, attribute weights are trained and put into case retrieval; after that, different rules of case reuse are established for these two kinds of problems. To validate the model performance of the application, a cross-validation test is carried on a historical data set. Comparative analysis of even weight allocation CBR (EW-CBR) method, correlation coefficient weight allocation CBR (CW-CBR) method, and SA weight allocation CBR (SA-CBR) method is carried out. Cross-validation results show that the proposed method can reach better results compared with the EW-CBR model and CW-CBR model. The developed PHM framework is applied to practical usage for over three years, and the proposed CBR model is an effective approach toward the best PHM framework solutions in practical applications.

Download Full-text

Interclass Interference Suppression in Multi-Class Problems

Applied Sciences ◽

10.3390/app11010450 ◽

2021 ◽

Vol 11 (1) ◽

pp. 450

Author(s):

Jinfu Liu ◽

Mingliang Bai ◽

Na Jiang ◽

Ran Cheng ◽

Xianling Li ◽

...

Keyword(s):

Classification Accuracy ◽

Cross Validation ◽

Selection Process ◽

Interference Suppression ◽

Generalization Ability ◽

Suppression Effect ◽

Binary Classifiers ◽

The One ◽

Fold Cross Validation ◽

Validation Experiments

Multi-classifiers are widely applied in many practical problems. But the features that can significantly discriminate a certain class from others are often deleted in the feature selection process of multi-classifiers, which seriously decreases the generalization ability. This paper refers to this phenomenon as interclass interference in multi-class problems and analyzes its reason in detail. Then, this paper summarizes three interclass interference suppression methods including the method based on all-features, one-class classifiers and binary classifiers and compares their effects on interclass interference via the 10-fold cross-validation experiments in 14 UCI datasets. Experiments show that the method based on binary classifiers can suppress the interclass interference efficiently and obtain the best classification accuracy among the three methods. Further experiments were done to compare the suppression effect of two methods based on binary classifiers including the one-versus-one method and one-versus-all method. Results show that the one-versus-one method can obtain a better suppression effect on interclass interference and obtain better classification accuracy. By proposing the concept of interclass inference and studying its suppression methods, this paper significantly improves the generalization ability of multi-classifiers.

Download Full-text