Estimating error rates in the classification of paired organs

Alexander Brenning; Berthold Lausen

doi:10.1002/sim.3310

Supervised linear classification of Gaussian spatio-temporal data

Lietuvos matematikos rinkinys ◽

10.15388/lmr.2021.25214 ◽

2021 ◽

Vol 62 ◽

pp. 9-15

Author(s):

Marta Karaliutė ◽

Kęstutis Dučinskas

Keyword(s):

Time Moment ◽

Gaussian Random Field ◽

Covariance Structure ◽

Simulated Data ◽

Spatial Location ◽

Error Rates ◽

Prior Probabilities ◽

Structure Factors ◽

Spatio Temporal

In this article we focus on the problem of supervised classifying of the spatio-temporal Gaussian random field observation into one of two classes, specified by different mean parameters. The main distinctive feature of the proposed approach is allowing the class label to depend on spatial location as well as on time moment. It is assumed that the spatio-temporal covariance structure factors into a purely spatial component and a purely temporal component following AR(p) model. In numerical illustrations with simulated data, the influence of the values of spatial and temporal covariance parameters to the derived error rates for several prior probabilities models are studied.

Download Full-text

Classification of Pneumonia Cell Images Using Improved ResNet50 Model

Traitement du signal ◽

10.18280/ts.380117 ◽

2021 ◽

Vol 38 (1) ◽

pp. 165-173

Author(s):

Ahmet Çınar ◽

Muhammed Yıldırım ◽

Yeşim Eroğlu

Keyword(s):

Error Rates ◽

Lung Imaging ◽

Imaging Method ◽

Data Sets ◽

Accuracy Rate ◽

Traditional Methods ◽

X Ray ◽

Qualified Personnel ◽

Patient Will

Pneumonia is a disease caused by inflammation of the lung tissue that is transmitted by various means, primarily bacteria. Early and accurate diagnosis is important in reducing the morbidity and mortality of the disease. The primary imaging method used for the diagnosis of pneumonia is lung x-ray. While typical imaging findings of pneumonia may be present on lung imaging, nonspecific images may be present. In addition, many health units may not have qualified personnel to perform this procedure or there may be errors in diagnoses made by traditional methods. For this reason, computer systems can be used to prevent error rates that may occur in traditional methods. Many methods have been developed to train data sets. In this article, a new model has been developed based on the layers of the ResNet50. The developed model was compared with the architectures InceptionV3, AlexNet, GoogleNet, ResNet50 and DenseNet201. In the developed model, the maximum accuracy rate was achieved as 97.22%. The model developed was followed by DenseNet201, ResNet50, InceptionV3, GoogleNet and AlexNet, respectively, according to their accuracy. With these developed models, the diagnosis of pneumonia can be made early and accurately, and the treatment management of the patient will be determined quickly.

Download Full-text

Simulated rRNA/DNA Ratios Show Potential To Misclassify Active Populations as Dormant

Applied and Environmental Microbiology ◽

10.1128/aem.00696-17 ◽

2017 ◽

Vol 83 (11) ◽

Cited By ~ 25

Author(s):

Blaire Steven ◽

Cedar Hesse ◽

John Soghigian ◽

La Verne Gallegos-Graves ◽

John Dunbar

Keyword(s):

Community Structure ◽

Error Rates ◽

Growth Strategies ◽

Sequencing Data ◽

Sampling Depth ◽

Bacterial Populations ◽

Misclassification Errors ◽

High Throughput Dna Sequencing ◽

Ratio Approach

ABSTRACT The use of rRNA/DNA ratios derived from surveys of rRNA sequences in RNA and DNA extracts is an appealing but poorly validated approach to infer the activity status of environmental microbes. To improve the interpretation of rRNA/DNA ratios, we performed simulations to investigate the effects of community structure, rRNA amplification, and sampling depth on the accuracy of rRNA/DNA ratios in classifying bacterial populations as “active” or “dormant.” Community structure was an insignificant factor. In contrast, the extent of rRNA amplification that occurs as cells transition from dormant to growing had a significant effect (P < 0.0001) on classification accuracy, with misclassification errors ranging from 16 to 28%, depending on the rRNA amplification model. The error rate increased to 47% when communities included a mixture of rRNA amplification models, but most of the inflated error was false negatives (i.e., active populations misclassified as dormant). Sampling depth also affected error rates (P < 0.001). Inadequate sampling depth produced various artifacts that are characteristic of rRNA/DNA ratios generated from real communities. These data show important constraints on the use of rRNA/DNA ratios to infer activity status. Whereas classification of populations as active based on rRNA/DNA ratios appears generally valid, classification of populations as dormant is potentially far less accurate. IMPORTANCE The rRNA/DNA ratio approach is appealing because it extracts an extra layer of information from high-throughput DNA sequencing data, offering a means to determine not only the seedbank of taxa present in communities but also the subset of taxa that are metabolically active. This study provides crucial insights into the use of rRNA/DNA ratios to infer the activity status of microbial taxa in complex communities. Our study shows that the approach may not be as robust as previously supposed, particularly in complex communities composed of populations employing different growth strategies, and identifies factors that inflate the erroneous classification of active populations as dormant.

Download Full-text

Error rates in spatial classification of Gaussian data with random labeling

Lietuvos matematikos rinkinys ◽

10.15388/lmr.2010.77 ◽

2010 ◽

Vol 51 ◽

Author(s):

Lijana Stabingienė ◽

Kęstutis Dučinskas

Keyword(s):

Random Field ◽

Error Rate ◽

Field Model ◽

Gaussian Random Field ◽

Parametric Uncertainty ◽

Error Rates ◽

Spatial Classification ◽

Random Labeling ◽

Gaussian Data

In spatial classification it is usually assumed that features observations given labels are independently distributed. We have retracted this assumption by proposing stationary Gaussian random field model for features observations. The label are assumed to follow Disrete Random Field (DRF) model. Formula for exact error rate based on Bayes discriminant function (BDF) is derived. In the case of partial parametric uncertainty (mean parameters and variance are unknown), the approximation of the expected error rate associated with plug-in BDF is also derived. The dependence of considered error rates on the values of range and clustering parameters is investigated numerically for training locations being second-order neighbors to location of observation to be classified.

Download Full-text

Forest Type and Tree Species Classification of Nemoral Forests With Sentinel-1 and 2 Time Series Data

10.20944/preprints202101.0235.v1 ◽

2021 ◽

Author(s):

Kristian Skau Bjerreskov ◽

Thomas Nord-Larsen ◽

Rasmus Fensholt

Keyword(s):

Tree Species ◽

Forest Cover ◽

Forest Type ◽

Error Rates ◽

Sensor Data ◽

Series Data ◽

Forest Types ◽

Multi Temporal ◽

Species Groups

Mapping forest extent and forest cover classification are important for the assessment of forest resources in socio-economic as well as ecological terms. Novel developments in the availability of remotely sensed data, computational resources, and advances in areas of statistical learning have enabled fusion of multi-sensor data, often yielding superior classification results. Most former studies of nemoral forests fusing multi-sensor and multi-temporal data have been limited in spatial extent and typically to a simple classification of landscapes into major land cover classes. We hypothesize that multi-temporal, multi-censor data will have a specific strength in further classification of nemoral forest landscapes owing to the distinct seasonal patterns of the phenology of broadleaves. This study aimed to classify the Danish landscape into forest/non-forest and further into forest types (broadleaved/coniferous) and species groups, using a cloud-based approach based on multi-temporal Sentinel 1 and 2 data and machine learning (random forest) trained with National Forest Inventory (NFI) data. Mapping of non-forest and forest resulted in producer accuracies of 99% and 90 %, respectively. The mapping of forest types (broadleaf and conifer) within the forested area resulted in producer accuracies of 95% for conifer and 96% for broadleaf forest. Tree species groups were classified with producer accuracies ranging 34-74%. Species groups with coniferous species were the least confused whereas the broadleaf groups, especially Oak, had higher error rates. The results are applied in Danish National accounting of greenhouse gas emissions from forests, resource assessment and assessment of forest biodiversity potentials.

Download Full-text

Effect of anisotropy coeficient on error rates of linear discriminant functions

Lietuvos matematikos rinkinys ◽

10.15388/lmr.2007.24228 ◽

2021 ◽

Vol 47 ◽

Author(s):

Kęstutis Dučinskas ◽

Lina Dreižienė

Keyword(s):

Spatial Data ◽

Supervised Classification ◽

Statistical Approach ◽

Gaussian Random Field ◽

Error Rates ◽

Discriminant Functions ◽

Linear Discriminant ◽

Recognition Error ◽

Two Populations

Paper deals with statistical classification of spatial data as a part of widely applicable statistical approach to pattern recognition. Error rates in supervised classification of Gaussian random field observation into one of two populations specified by different constant means and common stationary geometric anisotropic covariance are considered. Formula for the exact Bayesian error rate is derived. The influence of the ratio of anisotropy to the error rates is evaluated numerically for the case of complete parametric certainty.

Download Full-text

Actual Error Rates in Classification of the T-Distributed Random Field Observation Based on Plug-in Linear Discriminant Function

Informatica ◽

10.15388/informatica.2015.64 ◽

2015 ◽

Vol 26 (4) ◽

pp. 557-568 ◽

Cited By ~ 2

Author(s):

Kęstutis Dučinskas ◽

Eglė Zikarienė

Keyword(s):

Random Field ◽

Discriminant Function ◽

Field Observation ◽

Error Rates ◽

Linear Discriminant Function ◽

Linear Discriminant ◽

Actual Error

Download Full-text

Interface Style and Error Rates - Some Experimental Results

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193120004402226 ◽

2000 ◽

Vol 44 (22) ◽

pp. 595-598

Author(s):

Peter F. Elzer ◽

Badi Boussoffara ◽

Carsten Beuthel

Keyword(s):

Comparative Study ◽

Error Rate ◽

Error Rates ◽

Experimental Results ◽

Research Project

At the IPP a number of new forms of visualizations of process values have been developed. Several of them have been evaluated by user experiments. In the context of a research project (supported by the Volkswagen Foundation in Germany (Ref. Nr.: I/69 886)) a comparative study with respect to the influence of the interface style on the error rate during classification of process states was undertaken. The paper describes the results and discusses them in a taxonomy context.

Download Full-text

DECISION TREE BASED INFORMATION INTEGRATION FOR AUTOMATED PROTEIN CLASSIFICATION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720005001259 ◽

2005 ◽

Vol 03 (03) ◽

pp. 717-742 ◽

Cited By ~ 15

Author(s):

ORHAN ÇAMOĞLU ◽

TOLGA CAN ◽

AMBUJ K. SINGH ◽

YUAN-FANG WANG

Keyword(s):

Protein Structure ◽

Information Integration ◽

Ensemble Classifier ◽

Error Rates ◽

Structure Comparison ◽

Fold Level ◽

The Family ◽

Scop Classification ◽

The Individual

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. We achieve accurate classification by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique, based on decision trees, is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure and using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, andfold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3–12 times less than the individual classifiers' error rates at the family level, 1.5–4.5 times less at the superfamily level, and 1.1–2.4 times less at the fold level.

Download Full-text

Error rates in physician dictation: quality assurance and medical record production

International Journal of Health Care Quality Assurance ◽

10.1108/ijhcqa-06-2012-0056 ◽

2014 ◽

Vol 27 (2) ◽

pp. 99-110 ◽

Cited By ~ 8

Author(s):

Gary C. David ◽

Donald Chand ◽

Balaji Sankaranarayanan

Keyword(s):

Quality Assurance ◽

Medical Record ◽

Medical Errors ◽

Medical Records ◽

Error Rates ◽

Content Type ◽

Standard Work ◽

Practical Implications ◽

Made In

Purpose – The purpose of the paper is to determine the instance of errors made in physician dictation of medical records. Design/methodology/approach – Purposive sampling method was employed to select medical transcriptionists (MTs) as “experts” to identify the frequency and types of medical errors in dictation files. Seventy-nine MTs examined 2,391 dictation files during one standard work day, and used a common template to record errors. Findings – The results demonstrated that on the average, on the order of 315,000 errors in one million dictations were surfaced. This shows that medical errors occur in dictation, and quality assurance measures are needed in dealing with those errors. Research limitations/implications – There was no potential for inter-coder reliability and confirming the error codes assigned by individual MTs. This study only examined the presence of errors in the dictation-transcription model. Finally, the project was done with the cooperation of MTSOs and transcription industry organizations. Practical implications – Anecdotal evidence points to the belief that records created directly by physicians alone will have fewer errors and thus be more accurate. This research demonstrates this is not necessarily the case when it comes to physician dictation. As a result, the place of quality assurance in the medical record production workflow needs to be carefully considered before implementing a “once-and-done” (i.e. physician-based) model of record creation. Originality/value – No other research has been published on the presence of errors or classification of errors in physician dictation. The paper questions the assumption that direct physician creation of medical records in the absence of secondary QA processes will result in higher quality documentation and fewer medical errors.

Download Full-text