Microarray Breast Cancer Data Clustering Using Map Reduce Based K-Means Algorithm

Hymavathi Thottathyl; Kanadam Karteeka Pavan; Rajeev Priyatam Panchadula

doi:10.18280/ria.340610

Microarray Breast Cancer Data Clustering Using Map Reduce Based K-Means Algorithm

Revue d intelligence artificielle ◽

10.18280/ria.340610 ◽

2020 ◽

Vol 34 (6) ◽

pp. 763-769

Author(s):

Hymavathi Thottathyl ◽

Kanadam Karteeka Pavan ◽

Rajeev Priyatam Panchadula

Keyword(s):

Breast Cancer ◽

Microarray Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Great Increase ◽

Clinical Pathology ◽

Operating Period ◽

Cancer Data ◽

Genomic Knowledge ◽

Micro Array

Breast cancer is one of the world's most advanced and most common cancers occurring in women. An early diagnosis of breast cancer offers treatment for it; therefore, several experiments are in development establishing approaches for the early detection of breast cancer. The great increase in research in the last decade in microarray data processing is a potent tool of diagnosing diseases. Based on genomic knowledge, micro-arrays have changed the way clinical pathology recognizes, identifies, and classifies the diseases of humans, particularly those of cancer. In this article, we examined microarray data for breast cancer with the k-means clustering algorithm, but it was hard to scale and process a large number of micro-array data alone. To this end, we use a chart to minimize the paradigm for evaluating microarray data on breast cancer. Moreover, the efficiency of the parallel k-means model is measured with the operating period, the scaling, and all runtime of the model.

Download Full-text

Use of microRNA (miR) expression profiling to identify distinct subclasses of triple-negative breast cancers (TNBC).

Journal of Clinical Oncology ◽

10.1200/jco.2012.30.15_suppl.1007 ◽

2012 ◽

Vol 30 (15_suppl) ◽

pp. 1007-1007 ◽

Cited By ~ 1

Author(s):

Charles L. Shapiro ◽

Luciano Cascione ◽

Pierluigi Gasparini ◽

Francesca Lovat ◽

Stefania Carasi ◽

...

Keyword(s):

Breast Cancer ◽

Overall Survival ◽

Expression Profiling ◽

Clustering Algorithm ◽

Medical Center ◽

Expression Profiles ◽

Normal Breast ◽

Rank Test ◽

Cancer Data

1007 Background: TNBC is divided into basal and non-basal subclasses. To further subclassify TNBC we performed microRNA (miR) expression profiles and linked them to patient overall survival. Methods: During 1996-2005, 365 consecutive TNBC (phenotypically estrogen, progesterone and HER2 negative by immunohistochemistry [IHC]) were identified from the NCCN Breast Cancer Data Base/Tumor Registry at OSU Medical Center. One hundred fifty-eight (43%) formalin-fixed paraffin embedded (FFPE) breast cancer and 40 normal breast tissue blocks were available and tissue cores were obtained for RNA. RNA was isolated using the Ambion recoverall total nucleic acid isolation kit and the expression of ~700 miRs was assessed for each sample using the nanoString nCounter method. A consensus-clustering algorithm (ConsensusClusterPlus, Bioconductor www.bioconductor.org) was used to identify subclasses of TNBC and Kaplan-Meier overall survival curves were compared using the log-rank test. Censoring occurred at the date of death from causes other than breast cancer or at time of the last known follow-up, whichever occurred first. The median follow-up was 67 mo. (range 4-171 mo.). Results: The median age was 52 yrs. (range 20-84 yrs.); 81% white and 9% African-American; stages I, II, and III were 31%, 54% and 15%, respectively; and most patients received adjuvant anthracycline-based regimens with (25%) or without taxanes (75%). The algorithm identified 5 distinct subclasses; 1 clustering with normal breast miR expression whereas the other 4 each had a unique pattern of deregulated miRs. The median overall survivals were significantly different across the 5 cancer subclasses (log-rank p=0.028) (Table). Conclusions: miR expression profiling identifies and discriminates 5 TNBC subclasses, which do not coincide with those identified as basal and non-basal by IHC. Molecular analyses are ongoing to associate the miR-based subclasses with specific clinical features or the expression of specific pathways. [Table: see text]

Download Full-text

Association Analysis of Obesity/Overweight and Breast Cancer Using Data Mining Techniques

Frontiers in Health Informatics ◽

10.30699/fhi.v10i1.255 ◽

2021 ◽

Vol 10 (1) ◽

pp. 60

Author(s):

Mahsa Dehghani Soufi ◽

Reza Ferdousi

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Hierarchical Clustering ◽

Cancer Progression ◽

Clustering Algorithm ◽

Breast Cancer Incidence ◽

Research Work ◽

Model Assessment ◽

Cancer Data ◽

Data Mining Algorithms

Introduction: Growing evidence has shown that some overweight factors could be implicated in tumor genesis, higher recurrence and mortality. In addition, association of various overweight factors and breast cancer has not been extensively explored. The goal of this research was to explore and evaluate the association of various overweight/obesity factors and breast cancer, based on obesity breast cancer data set.Material and Methods: Several studies show that a significantly stronger association is obvious between overweight and higher breast cancer incidence, but the role of some overweight factors such as BMI, insulin-resistance, Homeostasis Model Assessment (HOMA), Leptin, adiponectin, glucose and MCP.1 is still debatable, So for experiment of research work several clinical and biochemical overweight factors, including age, Body Mass Index (BMI), Glucose, Insulin, Homeostatic Model Assessment (HOMA), Leptin, Adiponectin, Resistin and Monocyte chemo attractant protein-1(MCP-1) were analyzed. Data mining algorithms including k-means, Apriori, Hierarchical clustering algorithm (HCM) were applied using orange version 3.22 as an open source data mining tool.Results: The Apriori algorithm generated a list of frequent item sets and some strong rules from dataset and found that insulin, HOMA and leptin are two items often simultaneously were seen for BC patients that leads to cancer progression. K-means algorithm applied and it divided samples on three clusters and its results showed that the pair of andlt;Adiponectin, MCP.1andgt; has the highest effect on seperation of clusters. In addition HCM was carried out and classified BC patients into 1-32 clusters to So this research apply HCM algorithm. We carried out hierarchical clustering with average linkage without purning and classified BC patients into 1–32 clusters in order to identify BC patients with similar charestrictics.Conclusion: These finding provide the employed algorithms in this study can be helpful to our aim.

Download Full-text

Data Clustering on Breast Cancer Data Using Firefly Algorithm with Golden Ratio Method

Advances in Electrical and Computer Engineering ◽

10.4316/aece.2015.02010 ◽

2015 ◽

Vol 15 (2) ◽

pp. 75-84 ◽

Cited By ~ 3

Author(s):

M. DEMIR ◽

A. KARCI

Keyword(s):

Breast Cancer ◽

Data Clustering ◽

Firefly Algorithm ◽

Golden Ratio ◽

Ratio Method ◽

Breast Cancer Data ◽

Cancer Data

Download Full-text

Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset

Scientifica ◽

10.1155/2016/4273813 ◽

2016 ◽

Vol 2016 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Amir Ahmad

Keyword(s):

Breast Cancer ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Distance Measure ◽

Hamming Distance ◽

Machine Learning Techniques ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Cancer Data ◽

Feature Values

The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.

Download Full-text

The improved fuzzy clustering algorithm based on AFS theory and its applications to Wisconsin breast cancer data

2010 International Conference on Intelligent Control and Information Processing ◽

10.1109/icicip.2010.5564290 ◽

2010 ◽

Cited By ~ 1

Author(s):

Xianchang Wang ◽

Xiaodong Liu ◽

Lishi Zhang

Keyword(s):

Breast Cancer ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Breast Cancer Data ◽

Cancer Data ◽

Fuzzy Clustering Algorithm ◽

Afs Theory

Download Full-text

Bayesian Frailty Model for Time to Event Breast Cancer Data

Indian Journal Of Applied Research ◽

10.15373/2249555x/feb2014/184 ◽

2011 ◽

Vol 4 (2) ◽

pp. 8-12

Author(s):

Leo Alexander T Leo Alexander T ◽

◽

Pari Dayal L Pari Dayal L ◽

Valarmathi S Valarmathi S ◽

Ponnuraja C Ponnuraja C ◽

...

Keyword(s):

Breast Cancer ◽

Frailty Model ◽

Breast Cancer Data ◽

Time To Event ◽

Cancer Data

Download Full-text

Emperical Evaluation of Machine Learning algorithms for Breast Cancer Data Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.346351 ◽

2018 ◽

Vol 6 (10) ◽

pp. 346-351

Author(s):

S. Kumaravel ◽

S. Ophilia Domanica Vithya

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Learning Algorithms ◽

Data Classification ◽

Machine Learning Algorithms ◽

Breast Cancer Data ◽

Cancer Data

Download Full-text

Balanced Data Clustering Algorithm for Both Hard and Soft Clustering

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i2.176183 ◽

2018 ◽

Vol 6 (2) ◽

pp. 176-183

Author(s):

Purnendu Das ◽

◽

Bishwa Ranjan Roy ◽

Saptarshi Paul ◽

◽

...

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Soft Clustering

Download Full-text

Perbandingan CART dan Random Forest untuk Deteksi Kanker berbasis Klasifikasi Data Microarray

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i5.2083 ◽

2020 ◽

Vol 4 (5) ◽

pp. 805-812

Author(s):

Riska Chairunisa ◽

Adiwijaya ◽

Widi Astuti

Keyword(s):

Breast Cancer ◽

Lung Cancer ◽

Ovarian Cancer ◽

Random Forest ◽

Classification And Regression Tree ◽

Classification Method ◽

Discrete Wavelet ◽

Prostate Tumors ◽

Cancer Data ◽

Colon Tumors

Cancer is one of the deadliest diseases in the world with a mortality rate of 57,3% in 2018 in Asia. Therefore, early diagnosis is needed to avoid an increase in mortality caused by cancer. As machine learning develops, cancer gene data can be processed using microarrays for early detection of cancer outbreaks. But the problem that microarray has is the number of attributes that are so numerous that it is necessary to do dimensional reduction. To overcome these problems, this study used dimensions reduction Discrete Wavelet Transform (DWT) with Classification and Regression Tree (CART) and Random Forest (RF) as classification method. The purpose of using these two classification methods is to find out which classification method produces the best performance when combined with the DWT dimension reduction. This research use five microarray data, namely Colon Tumors, Breast Cancer, Lung Cancer, Prostate Tumors and Ovarian Cancer from Kent-Ridge Biomedical Dataset. The best accuracy obtained in this study for breast cancer data were 76,92% with CART-DWT, Colon Tumors 90,1% with RF-DWT, lung cancer 100% with RF-DWT, prostate tumors 95,49% with RF-DWT, and ovarian cancer 100% with RF-DWT. From these results it can be concluded that RF-DWT is better than CART-DWT.

Download Full-text

AGE FEATURES OF MOLECULAR-BIOLOGICAL SUBTYPES OF BREAST CANCER IN THE REPUBLIC OF UZBEKISTAN

Problems in oncology ◽

10.37469/0507-3758-2018-64-2-196-199 ◽

2018 ◽

Vol 64 (2) ◽

pp. 196-199

Author(s):

Gulya Miryusupova ◽

G. Khakimov ◽

N. Shayusupov

Keyword(s):

Breast Cancer ◽

Cancer Care ◽

Indigenous Women ◽

Morbidity And Mortality ◽

Breast Cancer Data ◽

Female Population ◽

Cancer Data ◽

Age Related ◽

The Republic ◽

Age Features

According to the results of breast cancer data in the Republic of Uzbekistan in addition to the increase in morbidity and mortality from breast cancer among women the presence of age specific features among indigenous women in the direction of “rejuvenating” of the disease with all molecular-biological (phenotypic) subtypes of breast cancer were marked. Within the framework of age-related features the prevalence of the least favorable phenotypes of breast cancer was found among indigenous women: Her2/neu hyperexpressive and three times negative subtype of breast cancer. The data obtained made it possible to build a so-called population “portrait” of breast cancer on the territory of the Republic, which in turn would contribute to further improvement of cancer care for the female population of the country.

Download Full-text