Data Preprocessing and Data Processing in Microspectral Analysis

2012 ◽  
Vol 251 ◽  
pp. 185-190
Author(s):  
Dun Hong Yao ◽  
Xiao Ning Peng ◽  
Jia He

In every field which needs data processing, the sparseness of data is an essential problem that should be resolved, especially in movies, shopping sites. The users with the same commodity preferences makes the data evaluation valuable. Otherwise, without any evaluation of information, it will result in sparse distribution of the entire data obtained. This article introduces a collaborative filtering technology used in sparse data processing methods - project-based rating prediction algorithm, and extends it to the areas of rough set, the sparse information table processing, rough set data preprocessing sparse issues.


2019 ◽  
Vol 8 (12) ◽  
pp. 556
Author(s):  
Wen Chen ◽  
Zhiyun Xu ◽  
Xiaoyao Zheng ◽  
Yonglong Luo

Technological advances have led to numerous developments in data sources. Geo-tagged photo metadata has provided a new source of mass research data for tourism studies. A series of data processing methods centering on the various types of information contained in geo-tagged photo metadata have thus been proposed; as a result, the development of tourism studies based on such data has advanced. However, an in-depth study of the data processing methods designed to conduct tourist flow prediction based on geo-tagged photo metadata has not yet been conducted. In order to acquire accurate substitutive data regarding inbound flows in cities, this paper introduces and designs several methods, including data screening, text data similarity calculation, geographical location clustering, and time series data modelling, in order to realize a data preprocessing model for inbound tourist flows in cities based on geo-tagged photo metadata. Wherein, the entropy filtering method was introduced to aid in determining whether the data were posted by inbound tourists; whether the inbound persons’ activities were related to tourism was judged through the calculation of tag text similarity; an efficient clustering method based on geographic grid partition was designed for cases in which the tag values were empty; finally, the time series of the inbound tourist flows of a certain region and period were obtained through data statistics and normalization. For the empirical research, Beijing City in China was selected as the research case, after which the feasibility and accuracy of the methods proposed in this paper were verified through data correlation analysis between Flickr data and real statistical yearbook data, as well as analysis of the prediction results based on a machine learning algorithm. The data preprocessing method introduced and designed in this paper provides a reference for the study of geo-tagged photo metadata in the field of tourism flow prediction. These methods can effectively filter out inbound tourist flow data from geotag photo metadata, thus providing a novel, reliable, and low-cost research data source for urban inbound tourism flow forecasting.


2010 ◽  
Vol 61 (4) ◽  
pp. 247-251 ◽  
Author(s):  
Peter Benický ◽  
Ladislav Jurišica

Real Time Motion Data PreprocessingThere is a lot of redundant data for image processing in an image, in motion picture as well. The more data for image processing we have, the more time is needed for preprocessing it. That is why we need to work with important data only. In order to identify or classify motion, data processing in real time is needed.


2017 ◽  
Vol 12 (8) ◽  
pp. 1934578X1701200
Author(s):  
Bing-Peng Yan ◽  
Chun-Mei Cao ◽  
Jin-Jun Hou ◽  
Qi-Rui Bi ◽  
Min Yang ◽  
...  

A systematical study was performed to investigate the processing workflow of LC-MS-based metabolomics data by optimizing parameter settings in XCMS software and comparing different preprocessing methods. Here we use a spike-in experiment combining with design of experiment (DoE) approaches for optimizing XCMS software parameters. A trusted index, which was based on accuracy evaluation of the spike-in data, was employed to assess the optimizing process. After optimizing the XCMS setting, the trusted index was improved from 3.67 to 30 and positive rate of spike-in standards also increased from 20% to 100%. Moreover, different data preprocessing methods, such as normalization, different scaling methods were also investigated on spike-in data since they were found to affect the outcome of the data analysis and ions features identification. Accordingly, UN-normalization and Pareto scaling were chosen as appropriate preprocessing methods to deal with LC-MS data through the evaluation of match index (mainly applied multivariate statistics methods). Finally, the optimized workflow was applied to experimental samples that acquired from metabolomics experiment and analyzed randomly with spike-in sample, which indicated a better applicability in formal metabolomics experiment. It is concluded that the proposed data processing workflow could be used as feasible approach for improving the quality of LC-MS-based metabolomics data and ensured the veracity of metabolites identification in data processing procedures to a certain extent.


1974 ◽  
Vol 13 (03) ◽  
pp. 125-140 ◽  
Author(s):  
Ch. Mellner ◽  
H. Selajstder ◽  
J. Wolodakski

The paper gives a report on the Karolinska Hospital Information System in three parts.In part I, the information problems in health care delivery are discussed and the approach to systems design at the Karolinska Hospital is reported, contrasted, with the traditional approach.In part II, the data base and the data processing system, named T1—J 5, are described.In part III, the applications of the data base and the data processing system are illustrated by a broad description of the contents and rise of the patient data base at the Karolinska Hospital.


1978 ◽  
Vol 17 (01) ◽  
pp. 36-40 ◽  
Author(s):  
J.-P. Durbec ◽  
Jaqueline Cornée ◽  
P. Berthezene

The practice of systematic examinations in hospitals and the increasing development of automatic data processing permits the storing of a great deal of information about a large number of patients belonging to different diagnosis groups.To predict or to characterize these diagnosis groups some descriptors are particularly useful, others carry no information. Data screening based on the properties of mutual information and on the log cross products ratios in contingency tables is developed. The most useful descriptors are selected. For each one the characterized groups are specified.This approach has been performed on a set of binary (presence—absence) radiological variables. Four diagnoses groups are concerned: cancer of pancreas, chronic calcifying pancreatitis, non-calcifying pancreatitis and probable pancreatitis. Only twenty of the three hundred and forty initial radiological variables are selected. The presence of each corresponding sign is associated with one or more diagnosis groups.


Metrologiya ◽  
2020 ◽  
pp. 16-24
Author(s):  
Alexandr D. Chikmarev

A single program has been developed to ensure that the final result of the data processing of the measurement calibration protocol is obtained under normal conditions. The calibration result contains a calibration function or a correction function in the form of a continuous sedate series and a calibration chart based on typical additive error probabilities. Solved the problem of the statistical treatment of the calibration protocol measuring in normal conditions within a single program “MMI–calibration 3.0” that includes identification of the calibration function in a continuous power series of indications of a measuring instrument and chart calibration. An example of solving the problem of calibration of the thermometer by the working standard of the 3rd grade with the help of the “MMI-calibration 3.0” program.


Sign in / Sign up

Export Citation Format

Share Document