scholarly journals Model-Based Clustering with Measurement or Estimation Errors

Genes ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 185 ◽  
Author(s):  
Wanli Zhang ◽  
Yanming Di

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling—called MCLUST-ME—that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.

2010 ◽  
Vol 4 (0) ◽  
pp. 80-116 ◽  
Author(s):  
Volodymyr Melnykov ◽  
Ranjan Maitra

2018 ◽  
Vol 29 (4) ◽  
pp. 791-819 ◽  
Author(s):  
Michael Fop ◽  
Thomas Brendan Murphy ◽  
Luca Scrucca

2014 ◽  
Vol 2014 ◽  
pp. 1-6
Author(s):  
Hua Li ◽  
Jie Zhou

This paper considers the robust estimation fusion problem for distributed multisensor systems with uncertain correlations of local estimation errors. For an uncertain class characterized by the Kullback-Leibler (KL) divergence from the actual model to nominal model of local estimation error covariance, the robust estimation fusion problem is formulated to find a linear minimum variance unbiased estimator for the least favorable model. It is proved that the optimal fuser under nominal correlation model is robust while the estimation error has a relative entropy uncertainty.


2015 ◽  
Vol 8 (2) ◽  
pp. 191-203 ◽  
Author(s):  
J. Vira ◽  
M. Sofiev

Abstract. This paper describes the assimilation of trace gas observations into the chemistry transport model SILAM (System for Integrated modeLling of Atmospheric coMposition) using the 3D-Var method. Assimilation results for the year 2012 are presented for the prominent photochemical pollutants ozone (O3) and nitrogen dioxide (NO2). Both species are covered by the AirBase observation database, which provides the observational data set used in this study. Attention was paid to the background and observation error covariance matrices, which were obtained primarily by the iterative application of a posteriori diagnostics. The diagnostics were computed separately for 2 months representing summer and winter conditions, and further disaggregated by time of day. This enabled the derivation of background and observation error covariance definitions, which included both seasonal and diurnal variation. The consistency of the obtained covariance matrices was verified using χ2 diagnostics. The analysis scores were computed for a control set of observation stations withheld from assimilation. Compared to a free-running model simulation, the correlation coefficient for daily maximum values was improved from 0.8 to 0.9 for O3 and from 0.53 to 0.63 for NO2.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 1686-1686
Author(s):  
Meong Hi Son ◽  
Taehyung Kim ◽  
Marc Tyndel ◽  
Mark D. Minden ◽  
Zhaolei Zhang ◽  
...  

Abstract Disruption of core binding factor (CBF) in acute myeloid leukemia (AML) has been identified as a favourable prognostic biomarker in AML. Consequently, CBF AML patients are typically treated less intensively than AML patients in most other prognostic groups. Nevertheless, a subset of CBF AML patients fail initial therapy and treatment of these patients after relapse is challenging. Few prognostic markers are available to stratify risk in CBF AML other than t(8;21) vs. inv(16) where t(8;21) is the unfavourable marker and KIT mutation status in t(8;21). In this study, we aimed to utilize miRNA expression profiles in CBF AML to identify patients with favourable vs. unfavourable prognoses. Patients and Methods We analyzed small RNA sequencing data from a discovery cohort of 188 de novo AML patients, of which 19 were CBF AML patients, from The Cancer Genome Atlas study (NEJM, 2013). We selected miRNAs with expression that satisfied a mean of greater than 10 read/106 miRNA mapped and a coefficient of variance greater than 2. Forty-eight miRNAs met these criteria and were further used for model-based clustering. As a validation cohort, we enrolled 38 CBF AML patients diagnosed from 1998 to 2014 at the Prince Margaret Cancer Center (PMCC). Diagnosis of CBF AML was confirmed with conventional cytogenetic analyses at a clinical genetics laboratory at the PMCC. PerfeCTa microRNA assays were applied to quantify the expression of miRNA normalized to RNU6. Results Discovery of the poor prognosis group in CBF AML patients Model-based clustering with 48 miRNAs from TCGA AML data set identified 4 distinct patient clusters (Figure A). Cluster 2 (C2) contained exclusively acute promyelocytic leukemia (APL) patients with high expression of all miRNAs and was excluded from further analysis. Notably, Cluster 1 (C1) was characterized by low expression of most of the 48 miRNAs of the list (Figure A). CBF AML patients were present in C1 (C1-CBF group; n=10) and C3 (C3-CBF group; n=9) only, with none in cluster 4. Figures B and C illustrate survival analyses of C1-CBF and C3-CBF patients compared to several other groups. Notably, C1-CBF patients had significantly worse overall survival (OS) (p=0.001) and disease free survival (DFS) (p=0.005) compared to C3-CBF, with survival statistics comparable to intermediate or poor risk AML subgroups, as measured by Kaplan-Meier analysis. Validation of C1-CBF and C3-CBF group in PMCC patients We identified 3 miRNAs, miR-127, miR-494, and miR-495, from the original 48, that were sufficient to perfectly reproduce the C1 and C3 subgroupings when clustering on them alone. We performed real time qPCR to measure the expression of these 3 miRNAs in the 38 CBF AML patients from the PMCC cohort and assigned them to the clusters. The PMCC C1-CBF (n=13) patients had significantly worse 3-yr OS and 3-yr DFS than the C3-CBF (n=25) patients (OS: C1-CBF, 23.1±11.7% vs C3-CBF, 72±9%; p=0.0062 and DFS: C1-CBF,27.8±13.6% vs C3-CBF 79.1± 8.3%; p=0.0092, Figures D & E). Using multivariate analysis with covariates including age, KIT mutation and translocation status, we demonstrated that these 3 miRNAs form an independent prognostic biomarker in CBF-AML for both OS and DFS (Table). Conclusion We demonstrate that 3 miRNAs (miR-127, miR-494, miR-495) can be used to stratify CBF-AML patients into favourable and unfavourable prognostic subgroups. We show that expression levels of this trio of miRNAs can be used as a clinical tool and we propose that CBF AML patients in the unfavourable miRNA group, C1-CBF, should no longer be treated as favourable risk patients. Figure Figure. Table Table. Disclosures No relevant conflicts of interest to declare.


2011 ◽  
Vol 29 (6) ◽  
pp. 1189-1196
Author(s):  
J. Vierinen

Abstract. We present a novel approach for modulating radar transmissions in order to improve target range and Doppler estimation accuracy. This is achieved by using non-uniform baud lengths. With this method it is possible to increase sub-baud range-resolution of phase coded radar measurements while maintaining a narrow transmission bandwidth. We first derive target backscatter amplitude estimation error covariance matrix for arbitrary targets when estimating backscatter in amplitude domain. We define target optimality and discuss different search strategies that can be used to find well performing transmission envelopes. We give several simulated examples of the method showing that fractional baud-length coding results in smaller estimation errors than conventional uniform baud length transmission codes when estimating the target backscatter amplitude at sub-baud range resolution. We also demonstrate the method in practice by analyzing the range resolved power of a low-altitude meteor trail echo that was measured using a fractional baud-length experiment with the EISCAT UHF system.


2016 ◽  
Vol 28 (6) ◽  
pp. 1141-1162
Author(s):  
Akifumi Notsu ◽  
Shinto Eguchi

Contamination of scattered observations, which are either featureless or unlike the other observations, frequently degrades the performance of standard methods such as K-means and model-based clustering. In this letter, we propose a robust clustering method in the presence of scattered observations called Gamma-clust. Gamma-clust is based on a robust estimation for cluster centers using gamma-divergence. It provides a proper solution for clustering in which the distributions for clustered data are nonnormal, such as t-distributions with different variance-covariance matrices and degrees of freedom. As demonstrated in a simulation study and data analysis, Gamma-clust is more flexible and provides superior results compared to the robustified K-means and model-based clustering.


2019 ◽  
Vol 13 (4) ◽  
pp. 1053-1082
Author(s):  
Derek S. Young ◽  
Xi Chen ◽  
Dilrukshi C. Hewage ◽  
Ricardo Nilo-Poyanco

2009 ◽  
Vol 3 (0) ◽  
pp. 1473-1496 ◽  
Author(s):  
Hui Zhou ◽  
Wei Pan ◽  
Xiaotong Shen

Sign in / Sign up

Export Citation Format

Share Document