scholarly journals Microarray Breast Cancer Data Clustering Using Map Reduce Based K-Means Algorithm

2020 ◽  
Vol 34 (6) ◽  
pp. 763-769
Author(s):  
Hymavathi Thottathyl ◽  
Kanadam Karteeka Pavan ◽  
Rajeev Priyatam Panchadula

Breast cancer is one of the world's most advanced and most common cancers occurring in women. An early diagnosis of breast cancer offers treatment for it; therefore, several experiments are in development establishing approaches for the early detection of breast cancer. The great increase in research in the last decade in microarray data processing is a potent tool of diagnosing diseases. Based on genomic knowledge, micro-arrays have changed the way clinical pathology recognizes, identifies, and classifies the diseases of humans, particularly those of cancer. In this article, we examined microarray data for breast cancer with the k-means clustering algorithm, but it was hard to scale and process a large number of micro-array data alone. To this end, we use a chart to minimize the paradigm for evaluating microarray data on breast cancer. Moreover, the efficiency of the parallel k-means model is measured with the operating period, the scaling, and all runtime of the model.

2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 1007-1007 ◽  
Author(s):  
Charles L. Shapiro ◽  
Luciano Cascione ◽  
Pierluigi Gasparini ◽  
Francesca Lovat ◽  
Stefania Carasi ◽  
...  

1007 Background: TNBC is divided into basal and non-basal subclasses. To further subclassify TNBC we performed microRNA (miR) expression profiles and linked them to patient overall survival. Methods: During 1996-2005, 365 consecutive TNBC (phenotypically estrogen, progesterone and HER2 negative by immunohistochemistry [IHC]) were identified from the NCCN Breast Cancer Data Base/Tumor Registry at OSU Medical Center. One hundred fifty-eight (43%) formalin-fixed paraffin embedded (FFPE) breast cancer and 40 normal breast tissue blocks were available and tissue cores were obtained for RNA. RNA was isolated using the Ambion recoverall total nucleic acid isolation kit and the expression of ~700 miRs was assessed for each sample using the nanoString nCounter method. A consensus-clustering algorithm (ConsensusClusterPlus, Bioconductor www.bioconductor.org) was used to identify subclasses of TNBC and Kaplan-Meier overall survival curves were compared using the log-rank test. Censoring occurred at the date of death from causes other than breast cancer or at time of the last known follow-up, whichever occurred first. The median follow-up was 67 mo. (range 4-171 mo.). Results: The median age was 52 yrs. (range 20-84 yrs.); 81% white and 9% African-American; stages I, II, and III were 31%, 54% and 15%, respectively; and most patients received adjuvant anthracycline-based regimens with (25%) or without taxanes (75%). The algorithm identified 5 distinct subclasses; 1 clustering with normal breast miR expression whereas the other 4 each had a unique pattern of deregulated miRs. The median overall survivals were significantly different across the 5 cancer subclasses (log-rank p=0.028) (Table). Conclusions: miR expression profiling identifies and discriminates 5 TNBC subclasses, which do not coincide with those identified as basal and non-basal by IHC. Molecular analyses are ongoing to associate the miR-based subclasses with specific clinical features or the expression of specific pathways. [Table: see text]


2021 ◽  
Vol 10 (1) ◽  
pp. 60
Author(s):  
Mahsa Dehghani Soufi ◽  
Reza Ferdousi

Introduction: Growing evidence has shown that some overweight factors could be implicated in tumor genesis, higher recurrence and mortality. In addition, association of various overweight factors and breast cancer has not been extensively explored. The goal of this research was to explore and evaluate the association of various overweight/obesity factors and breast cancer, based on obesity breast cancer data set.Material and Methods: Several studies show that a significantly stronger association is obvious between overweight and higher breast cancer incidence, but the role of some overweight factors such as BMI, insulin-resistance, Homeostasis Model Assessment (HOMA), Leptin, adiponectin, glucose and MCP.1 is still debatable, So for experiment of research work several clinical and biochemical overweight factors, including age, Body Mass Index (BMI), Glucose, Insulin, Homeostatic Model Assessment (HOMA), Leptin, Adiponectin, Resistin and Monocyte chemo attractant protein-1(MCP-1) were analyzed. Data mining algorithms including k-means, Apriori, Hierarchical clustering algorithm (HCM) were applied using orange version 3.22 as an open source data mining tool.Results: The Apriori algorithm generated a list of frequent item sets and some strong rules from dataset and found that insulin, HOMA and leptin are two items often simultaneously were seen for BC patients that leads to cancer progression. K-means algorithm applied and it divided samples on three clusters and its results showed that the pair of andlt;Adiponectin, MCP.1andgt;  has the highest effect on seperation of clusters. In addition HCM was carried out and classified BC patients into 1-32 clusters to So this research apply HCM algorithm. We carried out hierarchical clustering with average linkage without purning and classified BC patients into 1–32 clusters in order to identify BC patients with similar charestrictics.Conclusion: These finding provide the employed algorithms in this study can be helpful to our aim.


Scientifica ◽  
2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Amir Ahmad

The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.


2011 ◽  
Vol 4 (2) ◽  
pp. 8-12
Author(s):  
Leo Alexander T Leo Alexander T ◽  
◽  
Pari Dayal L Pari Dayal L ◽  
Valarmathi S Valarmathi S ◽  
Ponnuraja C Ponnuraja C ◽  
...  

2018 ◽  
Vol 6 (2) ◽  
pp. 176-183
Author(s):  
Purnendu Das ◽  
◽  
Bishwa Ranjan Roy ◽  
Saptarshi Paul ◽  
◽  
...  

2020 ◽  
Vol 4 (5) ◽  
pp. 805-812
Author(s):  
Riska Chairunisa ◽  
Adiwijaya ◽  
Widi Astuti

Cancer is one of the deadliest diseases in the world with a mortality rate of 57,3% in 2018 in Asia. Therefore, early diagnosis is needed to avoid an increase in mortality caused by cancer. As machine learning develops, cancer gene data can be processed using microarrays for early detection of cancer outbreaks. But the problem that microarray has is the number of attributes that are so numerous that it is necessary to do dimensional reduction. To overcome these problems, this study used dimensions reduction Discrete Wavelet Transform (DWT) with Classification and Regression Tree (CART) and Random Forest (RF) as classification method. The purpose of using these two classification methods is to find out which classification method produces the best performance when combined with the DWT dimension reduction. This research use five microarray data, namely Colon Tumors, Breast Cancer, Lung Cancer, Prostate Tumors and Ovarian Cancer from Kent-Ridge Biomedical Dataset. The best accuracy obtained in this study for breast cancer data were 76,92% with CART-DWT, Colon Tumors 90,1% with RF-DWT, lung cancer 100% with RF-DWT, prostate tumors 95,49% with RF-DWT, and ovarian cancer 100% with RF-DWT. From these results it can be concluded that RF-DWT is better than CART-DWT.  


2018 ◽  
Vol 64 (2) ◽  
pp. 196-199
Author(s):  
Gulya Miryusupova ◽  
G. Khakimov ◽  
N. Shayusupov

According to the results of breast cancer data in the Republic of Uzbekistan in addition to the increase in morbidity and mortality from breast cancer among women the presence of age specific features among indigenous women in the direction of “rejuvenating” of the disease with all molecular-biological (phenotypic) subtypes of breast cancer were marked. Within the framework of age-related features the prevalence of the least favorable phenotypes of breast cancer was found among indigenous women: Her2/neu hyperexpressive and three times negative subtype of breast cancer. The data obtained made it possible to build a so-called population “portrait” of breast cancer on the territory of the Republic, which in turn would contribute to further improvement of cancer care for the female population of the country.


Sign in / Sign up

Export Citation Format

Share Document