Missing Values in Monotone Data Sets

Author(s):  
Viara Popova
2013 ◽  
Vol 240 ◽  
pp. 115-128 ◽  
Author(s):  
Emil Eirola ◽  
Gauthier Doquire ◽  
Michel Verleysen ◽  
Amaury Lendasse

1996 ◽  
Vol 5 (2) ◽  
pp. 113 ◽  
Author(s):  
Antony Unwin ◽  
George Hawkins ◽  
Heike Hofmann ◽  
Bernd Siegl

2020 ◽  
Author(s):  
Christopher Kadow ◽  
David Hall ◽  
Uwe Ulbrich

<p>Nowadays climate change research relies on climate information of the past. Historic climate records of temperature observations form global gridded datasets like HadCRUT4, which is investigated e.g. in the IPCC reports. However, record combining data-sets are sparse in the past. Even today they contain missing values. Here we show that machine learning technology can be applied to refill these missing climate values in observational datasets. We found that the technology of image inpainting using partial convolutions in a CUDA accelerated deep neural network can be trained by large Earth system model experiments from NOAA reanalysis (20CR) and the Coupled Model Intercomparison Project phase 5 (CMIP5). The derived deep neural networks are capable to independently refill added missing values of these experiments. The analysis shows a very high degree of reconstruction even in the cross-reconstruction of the trained networks on the other dataset. The network reconstruction reaches a better evaluation than other typical methods in climate science. In the end we will show the new reconstructed observational dataset HadCRUT4 and discuss further investigations.</p>


2016 ◽  
Vol 25 (3) ◽  
pp. 431-440 ◽  
Author(s):  
Archana Purwar ◽  
Sandeep Kumar Singh

AbstractThe quality of data is an important task in the data mining. The validity of mining algorithms is reduced if data is not of good quality. The quality of data can be assessed in terms of missing values (MV) as well as noise present in the data set. Various imputation techniques have been studied in MV study, but little attention has been given on noise in earlier work. Moreover, to the best of knowledge, no one has used density-based spatial clustering of applications with noise (DBSCAN) clustering for MV imputation. This paper proposes a novel technique density-based imputation (DBSCANI) built on density-based clustering to deal with incomplete values in the presence of noise. Density-based clustering algorithm proposed by Kriegal groups the objects according to their density in spatial data bases. The high-density regions are known as clusters, and the low-density regions refer to the noise objects in the data set. A lot of experiments have been performed on the Iris data set from life science domain and Jain’s (2D) data set from shape data sets. The performance of the proposed method is evaluated using root mean square error (RMSE) as well as it is compared with existing K-means imputation (KMI). Results show that our method is more noise resistant than KMI on data sets used under study.


2014 ◽  
Vol 39 (2) ◽  
pp. 107-127 ◽  
Author(s):  
Artur Matyja ◽  
Krzysztof Siminski

Abstract The missing values are not uncommon in real data sets. The algorithms and methods used for the data analysis of complete data sets cannot always be applied to missing value data. In order to use the existing methods for complete data, the missing value data sets are preprocessed. The other solution to this problem is creation of new algorithms dedicated to missing value data sets. The objective of our research is to compare the preprocessing techniques and specialised algorithms and to find their most advantageous usage.


2020 ◽  
Author(s):  
Weiping Ma ◽  
Sunkyu Kim ◽  
Shrabanti Chowdhury ◽  
Zhi Li ◽  
Mi Yang ◽  
...  

AbstractDeep proteomics profiling using labelled LC-MS/MS experiments has been proven to be powerful to study complex diseases. However, due to the dynamic nature of the discovery mass spectrometry, the generated data contain a substantial fraction of missing values. This poses great challenges for data analyses, as many tools, especially those for high dimensional data, cannot deal with missing values directly. To address this problem, the NCI-CPTAC Proteogenomics DREAM Challenge was carried out to develop effective imputation algorithms for labelled LC-MS/MS proteomics data through crowd learning. The final resulting algorithm, DreamAI, is based on an ensemble of six different imputation methods. The imputation accuracy of DreamAI, as measured by correlation, is about 15%-50% greater than existing tools among less abundant proteins, which are more vulnerable to be missed in proteomics data sets. This new tool nicely enhances data analysis capabilities in proteomics research.


2019 ◽  
Vol 6 (339) ◽  
pp. 73-98
Author(s):  
Małgorzata Aleksandra Misztal

The problem of incomplete data and its implications for drawing valid conclusions from statistical analyses is not related to any particular scientific domain, it arises in economics, sociology, education, behavioural sciences or medicine. Almost all standard statistical methods presume that every object has information on every variable to be included in the analysis and the typical approach to missing data is simply to delete them. However, this leads to ineffective and biased analysis results and is not recommended in the literature. The state of the art technique for handling missing data is multiple imputation. In the paper, some selected multiple imputation methods were taken into account. Special attention was paid to using principal components analysis (PCA) as an imputation method. The goal of the study was to assess the quality of PCA‑based imputations as compared to two other multiple imputation techniques: multivariate imputation by chained equations (MICE) and missForest. The comparison was made by artificially simulating different proportions (10–50%) and mechanisms of missing data using 10 complete data sets from the UCI repository of machine learning databases. Then, missing values were imputed with the use of MICE, missForest and the PCA‑based method (MIPCA). The normalised root mean square error (NRMSE) was calculated as a measure of imputation accuracy. On the basis of the conducted analyses, missForest can be recommended as a multiple imputation method providing the lowest rates of imputation errors for all types of missingness. PCA‑based imputation does not perform well in terms of accuracy.


2021 ◽  
Vol 5 (Supplement_1) ◽  
pp. 884-884
Author(s):  
Mohammed Abahussain ◽  
Priya Nambisan ◽  
Colleen Galambos ◽  
Bo Zhang ◽  
Elizabeth Bukowy

Abstract COVID-19 has been devastating for Nursing Homes (NHs). The concentration of older adults with underlying chronic conditions inevitably made the setting highly vulnerable leading to high rates of mortality for residents. However, some nursing homes fared better than others. This study examines several quality measures and organizational factors to understand whether these factors are associated with COVID-19 cases in Wisconsin. We combined three datasets from Centers for Medicare & Medicaid Services (CMS) – the Star Rating dataset, Provider Information dataset and COVID-19 Nursing Home dataset. Data used is from the period of Jan 1 – Oct 25, 2020 for the state of Wisconsin. The analysis includes 331 free-standing NHs with no missing values from the data sets. The variables used were self-reported information on nursing home ratings, staff shortage, staff reported hours, occupancy rate, number of beds and ownership. Of the 331 NHs examined, shortages were reported of 25.4%, 31.1%, 3.2% and 15.6% of licensed nurse staff (25.4%), nurse aides (31.1%), clinical staff, (3.2%) and other staff (15.6%) Additionally, there was a significant (p<.05) positive correlation between number of beds and COVID-19 cases, and there was no statistically significant association between occupancy rate and COVID-19 cases. NHs with better star ratings were also found to have less COVID-19 cases. Interestingly, private NHs had significantly higher COVID-19 cases than for-profit and government owned NHs, a finding that is congruent with other studies in this area. Recommendations for practice will be discussed.


Sign in / Sign up

Export Citation Format

Share Document