Learning Susceptibility of a Pathogen to Antibiotics Using Data from Similar Pathogens

S. Andreassen; L. Leibovici; M. Paul; A. Zalounina

doi:10.3414/me9226

Learning Susceptibility of a Pathogen to Antibiotics Using Data from Similar Pathogens

Methods of Information in Medicine ◽

10.3414/me9226 ◽

2009 ◽

Vol 48 (03) ◽

pp. 242-247 ◽

Cited By ~ 2

Author(s):

S. Andreassen ◽

L. Leibovici ◽

M. Paul ◽

A. Zalounina

Keyword(s):

Maximum Likelihood ◽

Antibiotic Therapy ◽

Cross Validation ◽

Optimal Size ◽

Empirical Antibiotic Therapy ◽

Training Set ◽

Validation Set ◽

Using Data ◽

Selection Of

Summary Objectives: Selection of empirical antibiotic therapy relies on knowledge of the in vitro susceptibilities of potential pathogens to antibiotics. In this paper the limitations of this knowledge are outlined and a method that can reduce some of the problems is developed. Methods: We propose hierarchical Dirichlet learning for estimation of pathogen susceptibilities to antibiotics, using data from a group of similar pathogens in a bacteremia database. Results: A threefold cross-validation showed that maximum likelihood (ML) estimates of susceptibilities based on individual pathogens gave a distance between estimates obtained from the training set and observed frequencies in the validation set of 16.3%. Estimates based on the initial grouping of pathogens gave a distance of 16.7%. Dirichlet learning gave a distance of 15.6%. Inspection of the pathogen groups led to subdivision of three groups, Citrobacter, Other Gram Negatives and Acinetobacter, out of 26 groups. Estimates based on the subdivided groups gave a distance of 15.4% and Dirichlet learning further reduced this to 15.0%. The optimal size of the imaginary sample inherited from the group was 3. Conclusion: Dirichlet learning improved estimates of susceptibilities relative to ML estimators based on individual pathogens and to classical grouped estimators. The initial pathogen grouping was well founded and improvement by subdivision of the groups was only obtained in three groups. Dirichlet learning was robust to these revisions of the grouping, giving improved estimates in both cases, while the group-based estimates only gave improved estimates after the revision of the groups.

Download Full-text

Ascertainment of the number of samples in the validation set in Monte Carlo cross validation and the selection of model dimension with Monte Carlo cross validation

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2005.07.004 ◽

2006 ◽

Vol 82 (1-2) ◽

pp. 83-89 ◽

Cited By ~ 15

Author(s):

Yi Ping Du ◽

Sumaporn Kasemsumran ◽

Katsuhiko Maruo ◽

Takehiro Nakagawa ◽

Yukihiro Ozaki

Keyword(s):

Monte Carlo ◽

Cross Validation ◽

Validation Set ◽

Monte Carlo Cross Validation ◽

Model Dimension ◽

Selection Of

Download Full-text

Evaluating the accuracy of equivalent-source predictions using cross-validation

10.5194/egusphere-egu2020-15729 ◽

2020 ◽

Author(s):

Leonardo Uieda ◽

Santiago Soler

Keyword(s):

Prediction Accuracy ◽

Cross Validation ◽

Point Sources ◽

Magnetic Data ◽

Random Permutations ◽

Training Set ◽

Equivalent Source ◽

Upward Continuation ◽

Reduction To The Pole ◽

Validation Set

We investigate the use of cross-validation (CV) techniques to estimate the accuracy of equivalent-source (also known as equivalent-layer) models for interpolation and processing of potential-field data. Our preliminary results indicate that some common CV algorithms (e.g., random permutations and k-folds) tend to overestimate the accuracy. We have found that blocked CV methods, where the data are split along spatial blocks instead of randomly, provide more conservative and realistic accuracy estimates. Beyond evaluating an equivalent-source model's performance, cross-validation can be used to automatically determine configuration parameters, like source depth and amount of regularization, that maximize prediction accuracy and avoid over-fitting.Widely used in gravity and magnetic data processing, the equivalent-source technique consists of a linear model (usually point sources) used to predict the observed field at arbitrary locations. Upward-continuation, interpolation, gradient calculations, leveling, and reduction-to-the-pole can be performed simultaneously by using the model to make predictions (i.e., forward modelling). Likewise, the use of linear models to make predictions is the backbone of many machine learning (ML) applications. The predictive performance of ML models is usually evaluated through cross-validation, in which the data are split (usually randomly) into a training set and a validation set. Models are fit on the training set and their predictions are evaluated using the validation set using a goodness-of-fit metric, like the mean square error or the R&#178; coefficient of determination. Many cross-validation methods exist in the literature, varying in how the data are split and how this process is repeated. Prior research from the statistical modelling of ecological data suggests that prediction accuracy is usually overestimated by traditional CV methods when the data are spatially auto-correlated. This issue can be mitigated by splitting the data along spatial blocks rather than randomly. We conducted experiments on synthetic gravity data to investigate the use of traditional and blocked CV methods in equivalent-source interpolation. We found that the overestimation problem also occurs and that more conservative accuracy estimates are obtained when applying blocked versions of random permutations and k-fold. Further studies need to be conducted to generalize these findings to upward-continuation, reduction-to-the-pole, and derivative calculation.Open-source software implementations of the equivalent-source and blocked cross-validation (in progress) methods are available in the Python libraries Harmonica and Verde, which are part of the Fatiando a Terra project (www.fatiando.org).

Download Full-text

Drug repositioning by prediction of drug’s anatomical therapeutic chemical code via network-based inference approaches

Briefings in Bioinformatics ◽

10.1093/bib/bbaa027 ◽

2020 ◽

Cited By ~ 4

Author(s):

Yayuan Peng ◽

Manjiong Wang ◽

Yixiang Xu ◽

Zengrui Wu ◽

Jiye Wang ◽

...

Keyword(s):

Cross Validation ◽

Drug Repositioning ◽

Anatomical Therapeutic Chemical ◽

External Validation ◽

Chemical Properties ◽

Glucose Deprivation ◽

Target Drug ◽

Validation Set ◽

Fold Cross Validation

Abstract Drug discovery and development is a time-consuming and costly process. Therefore, drug repositioning has become an effective approach to address the issues by identifying new therapeutic or pharmacological actions for existing drugs. The drug’s anatomical therapeutic chemical (ATC) code is a hierarchical classification system categorized as five levels according to the organs or systems that drugs act and the pharmacology, therapeutic and chemical properties of drugs. The 2nd-, 3rd- and 4th-level ATC codes reserved the therapeutic and pharmacological information of drugs. With the hypothesis that drugs with similar structures or targets would possess similar ATC codes, we exploited a network-based approach to predict the 2nd-, 3rd- and 4th-level ATC codes by constructing substructure drug-ATC (SD-ATC), target drug-ATC (TD-ATC) and Substructure&Target drug-ATC (STD-ATC) networks. After 10-fold cross validation and two external validations, the STD-ATC models outperformed the SD-ATC and TD-ATC ones. Furthermore, with KR as fingerprint, the STD-ATC model was identified as the optimal model with AUC values at 0.899 ± 0.015, 0.916 and 0.893 for 10-fold cross validation, external validation set 1 and external validation set 2, respectively. To illustrate the predictive capability of the STD-ATC model with KR fingerprint, as a case study, we predicted 25 FDA-approved drugs (22 drugs were actually purchased) to have potential activities on heart failure using that model. Experiments in vitro confirmed that 8 of the 22 old drugs have shown mild to potent cardioprotective activities on both hypoxia model and oxygen–glucose deprivation model, which demonstrated that our STD-ATC prediction model would be an effective tool for drug repositioning.

Download Full-text

Comparison of Single-Breed and Multi-Breed Training Populations for Infrared Predictions of Novel Phenotypes in Holstein Cows

Animals ◽

10.3390/ani11071993 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1993

Author(s):

Lucio Flavio Macedo Mota ◽

Sara Pegolo ◽

Toshimi Baba ◽

Gota Morota ◽

Francisco Peñagaricano ◽

...

Keyword(s):

Cross Validation ◽

Predictive Ability ◽

Holstein Cows ◽

Reference Scenario ◽

Training Population ◽

Training Set ◽

Dual Purpose ◽

Brown Swiss ◽

Validation Set ◽

Holstein Population

In general, Fourier-transform infrared (FTIR) predictions are developed using a single-breed population split into a training and a validation set. However, using populations formed of different breeds is an attractive way to design cross-validation scenarios aimed at increasing prediction for difficult-to-measure traits in the dairy industry. This study aimed to evaluate the potential of FTIR prediction using training set combining specialized and dual-purpose dairy breeds to predict different phenotypes divergent in terms of biological meaning, variability, and heritability, such as body condition score (BCS), serum β-hydroxybutyrate (BHB), and kappa casein (k-CN) in the major cattle breed, i.e., Holstein-Friesian. Data were obtained from specialized dairy breeds: Holstein (468 cows) and Brown Swiss (657 cows), and dual-purpose breeds: Simmental (157 cows), Alpine Grey (75 cows), and Rendena (104 cows), giving a total of 1461 cows from 41 multi-breed dairy herds. The FTIR prediction model was developed using a gradient boosting machine (GBM), and predictive ability for the target phenotype in Holstein cows was assessed using different cross-validation (CV) strategies: a within-breed scenario using 10-fold cross-validation, for which the Holstein population was randomly split into 10 folds, one for validation and the remaining nine for training (10-fold_HO); an across-breed scenario (BS_HO) where the Brown Swiss cows were used as the training set and the Holstein cows as the validation set; a specialized multi-breed scenario (BS+HO_10-fold), where the entire Brown Swiss and Holstein populations were combined then split into 10 folds, and a multi-breed scenario (Multi-breed), where the training set comprised specialized (Holstein and Brown Swiss) and dual-purpose (Simmental, Alpine Grey, and Rendena) dairy cows, combined with nine folds of the Holstein cows. Lastly a Multi-breed CV2 scenario was implemented, assuming the same number of records as the reference scenario and using the same proportions as the multi-breed. Within-Holstein, FTIR predictions had a predictive ability of 0.63 for BCS, 0.81 for BHB, and 0.80 for k-CN. Using a specific breed (Brown Swiss) as the training set for prediction in the Holstein population reduced the prediction accuracy by 10% for BCS, 7% for BHB, and 11% for κ-CN. Notably, the combination of Holstein and Brown Swiss cows in the training set increased the predictive ability of the model by 6%, which was 0.66 for BCS, 0.85 for BHB, and 0.87 for k-CN. Using multiple specialized and dual-purpose animals in the training set outperforms the 10-fold_HO (standard) approach, with an increase in predictive ability of 8% for BCS, 7% for BHB, and 10% for k-CN. When the Multi-breed CV2 was implemented, no improvement was observed. Our findings suggest that FTIR prediction of different phenotypes in the Holstein breed can be improved by including different specialized and dual-purpose breeds in the training population. Our study also shows that predictive ability is enhanced when the size of the training population and the phenotypic variability are increased

Download Full-text

Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications

Healthcare Informatics Research ◽

10.4258/hir.2021.27.3.189 ◽

2021 ◽

Vol 27 (3) ◽

pp. 189-199

Author(s):

Ilias Tougui ◽

Abdelilah Jilbab ◽

Jamal El Mhamdi

Keyword(s):

Machine Learning ◽

Clinical Study ◽

Cross Validation ◽

Learning Technologies ◽

Data Availability ◽

Support Vector ◽

Training Set ◽

The Subject ◽

Validation Set ◽

Diagnostic Applications

Objectives: With advances in data availability and computing capabilities, artificial intelligence and machine learning technologies have evolved rapidly in recent years. Researchers have taken advantage of these developments in healthcare informatics and created reliable tools to predict or classify diseases using machine learning-based algorithms. To correctly quantify the performance of those algorithms, the standard approach is to use cross-validation, where the algorithm is trained on a training set, and its performance is measured on a validation set. Both datasets should be subject-independent to simulate the expected behavior of a clinical study. This study compares two cross-validation strategies, the subject-wise and the record-wise techniques; the subject-wise strategy correctly mimics the process of a clinical study, while the record-wise strategy does not.Methods: We started by creating a dataset of smartphone audio recordings of subjects diagnosed with and without Parkinson’s disease. This dataset was then divided into training and holdout sets using subject-wise and the record-wise divisions. The training set was used to measure the performance of two classifiers (support vector machine and random forest) to compare six cross-validation techniques that simulated either the subject-wise process or the record-wise process. The holdout set was used to calculate the true error of the classifiers.Results: The record-wise division and the record-wise cross-validation techniques overestimated the performance of the classifiers and underestimated the classification error.Conclusions: In a diagnostic scenario, the subject-wise technique is the proper way of estimating a model’s performance, and record-wise techniques should be avoided.

Download Full-text

Optimisation in the regularisation ill-posed problems

The Journal of the Australian Mathematical Society Series B Applied Mathematics ◽

10.1017/s0334270000005221 ◽

1986 ◽

Vol 28 (1) ◽

pp. 114-133 ◽

Cited By ~ 19

Author(s):

A. R. Davies ◽

R. S. Anderssen

Keyword(s):

Maximum Likelihood ◽

Tikhonov Regularization ◽

Cross Validation ◽

Discrepancy Principle ◽

Asymptotic Estimates ◽

Deconvolution Problem ◽

Ill Posed ◽

The Relationship ◽

Asymptotic Analyses ◽

Selection Of

We survey the role played by optimization in the choice of parameters for Tikhonov regularization of first-kind integral equations. Asymptotic analyses are presented for a selection of practical optimizing methods applied to a model deconvolution problem. These methods include the discrepancy principle, cross-validation and maximum likelihood. The relationship between optimality and regularity is emphasized. New bounds on the constants appearing in asymptotic estimates are presented.

Download Full-text

Arginase-1 Is Increased in Hodgkin Lymphoma, Associated to Poor Outcome and Positively Correlated to Semiquantitative PET Parameters

Blood ◽

10.1182/blood.v124.21.4401.4401 ◽

2014 ◽

Vol 124 (21) ◽

pp. 4401-4401

Author(s):

Alessandra Romano ◽

Alessandro Stefano ◽

Cosentino Sebastiano ◽

Giorgio Russo ◽

Nunziatina Laura Parrinello ◽

...

Keyword(s):

Hodgkin Lymphoma ◽

T Cell Activation ◽

Cell Activation ◽

Conflicts Of Interest ◽

Training Set ◽

Low Contrast ◽

Arginase 1 ◽

Response Status ◽

Validation Set

Abstract Background: In Hodgkin Lymphoma (HL) elevated neutrophil (HL-N) count is a well-known negative prognostic factor but its biological meaning is not elucidated. Our previous work showed that HL-N are dysfunctional and can suppress T-cell activation in vitro, as consequence of increased amount of Arginase-1 (Arg-1). Aim: Investigating clinical meaning of arginase increase in HL, correlating its amount to features at diagnosis, including some semiquantitative parameters of 18-FDG- PositronEmission Tomography (PET) acquired with a novel operator-independent algorithm. Material and methods: We prospectively measured soluble Arg-1 (s-Arg-1) in 135 sera obtained from 90 patients with HL, distinguished in a training set (N=35) and a validation set (N=55) and 20 healthy participants. In the training set, blood was taken at three fixed time-points prior, during, and after first-line therapy. In the first ten patients, a correlation between s-Arg-1 and semiquantitative parameters of PET at diagnosis was explored, including the Metabolic Tumor Volume (MTV) and the Total Lesion Glycolysis (TLG). Briefly, our group developed ad hoc tool independent from the operator: PET images are represented as a graph in which the voxels are its nodes and the edges are defined by a cost function which maps a change in image intensity to edge weights. This approach is an efficient and accurate method to segment lesion in low contrast images characterized by noise and weak edges as metabolic images (Stefano, 2013). Results: s-Arg-1 was increased in HL patients compared to healthy subjects, reduced after therapy in responders and increased in relapsed patients (p<0.0001). s-Arg-1 was positively correlated to the amount of Neutrophils and Arg-1 in N detected by RT-PCR. A cut-off level of 205 ng/mL for Arg-1 was chosen (equal to 2 times the 95th percentile in controls and ROC value with sensitivity and specificity of at least 80%) to predict response status at 24 months. In the training set, 32% patients had high s-Arg-1, 24% had positive PET-2 and were addressed to an early salvage therapy accordingly to BEACOPP scheme. A level of 205 ng/mL s-Arg-1 resulted in 83% (95% C.I. 58-96) sensitivity and 81% (95% C.I. 42-96) speciﬁcity in predicting response status in the training set (area under curve, AUC, 0.81, p=0.02). In the validation set, baseline levels of s-Arg-1>205 ng/mL resulted in 83% (C.I. 95% 62-95) sensitivity and 87% (C.I. 95% 47-99) specificity in predicting response status. Patients with s-Arg-1 ≥ 205 ng/mL had shorter PFS than patients carrying Arg-1 < 205 ng/mL (despite both groups did not reach the median, because of the short follow-up, p=0.005). In first 10 patients enrolled in the study, semiquantitative parameters of PET at diagnosis were explored: SUVmax was 12.7 (range 5.9-14.2), MTV median was45.5 (range 8.9-308.7), TLG mean was 43.7 (range 25.2-2475.3). MTV and TLG, but not SUVmax, were positively correlated to s-Arg-1 (respectively, r=0.68, p=0.003 and r=0.59, p=0.002). Conclusion S-Arg-1 is a predictor of PFS in HL and it is positively correlated with MTV in PET scans at baseline calculated with a novel operator-independent tool for imaging analysis. An update will be provided at the conference. Disclosures No relevant conflicts of interest to declare.

Download Full-text

The Development of Nomograms to Predict Blastulation Rate Following Cycles of In Vitro Fertilization in Patients With Tubal Factor Infertility, Polycystic Ovary Syndrome, or Endometriosis

Frontiers in Endocrinology ◽

10.3389/fendo.2021.751373 ◽

2021 ◽

Vol 12 ◽

Author(s):

Haixia Jin ◽

Xiaoxue Shen ◽

Wenyan Song ◽

Yan Liu ◽

Lin Qi ◽

...

Keyword(s):

Polycystic Ovary Syndrome ◽

Polycystic Ovary ◽

Area Under The Curve ◽

Blastocyst Formation ◽

Training Set ◽

Ovary Syndrome ◽

Vitro Fertilization ◽

Tubal Factor ◽

Validation Set

It is well known that the transfer of embryos at the blastocyst stage is superior to the transfer of embryos at the cleavage stage in many respects. However, the rate of blastocyst formation remains low in clinical practice. To reduce the possibility of wasting embryos and to accurately predict the possibility of blastocyst formation, we constructed a nomogram based on range of clinical characteristics to predict blastocyst formation rates in patients with different types of infertility. We divided patients into three groups based on female etiology: a tubal factor group, a polycystic ovary syndrome group, and an endometriosis group. Multiple logistic regression was used to analyze the relationship between patient characteristics and blastocyst formation. Each group of patients was divided into a training set and a validation set. The training set was used to construct the nomogram, while the validation set was used to test the performance of the model by using discrimination and calibration. The area under the curve (AUC) for the three groups indicated that the models performed fairly and that calibration was acceptable in each model.

Download Full-text

Inappropriate empirical antibiotic therapy for bloodstream infections based on discordant in-vitro susceptibilities: a retrospective cohort analysis of prevalence, predictors, and mortality risk in US hospitals

The Lancet Infectious Diseases ◽

10.1016/s1473-3099(20)30477-1 ◽

2020 ◽

Author(s):

Sameer S Kadri ◽

Yi Ling Lai ◽

Sarah Warner ◽

Jeffrey R Strich ◽

Ahmed Babiker ◽

...

Keyword(s):

Antibiotic Therapy ◽

Mortality Risk ◽

Retrospective Cohort ◽

Cohort Analysis ◽

Bloodstream Infections ◽

Empirical Antibiotic Therapy ◽

Retrospective Cohort Analysis

Download Full-text

Novel Microdilution Method to Assess Double and Triple Antibiotic Combination TherapyIn Vitro

International Journal of Microbiology ◽

10.1155/2016/4612021 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10

Author(s):

Mohamed El-Azizi

Keyword(s):

Pseudomonas Aeruginosa ◽

Antibiotic Therapy ◽

Fold Increase ◽

Early Selection ◽

Microdilution Method ◽

Severe Infections ◽

Time Kill ◽

Selection Of ◽

Antibiotic Combination

Anin vitromicrodilution method was developed to assess double and triple combinations of antibiotics. Five antibiotics including ciprofloxacin, amikacin, ceftazidime, piperacillin, and imipenem were tested against 10 clinical isolates ofPseudomonas aeruginosa. Each isolate was tested against ten double and nine triple combinations of the antibiotics. A 96-well plate was used to test three antibiotics, each one alone and in double and triple combinations against each isolate. The minimum bacteriostatic and bactericidal concentrations in combination were determined with respect to the most potent antibiotic. An Interaction Code (IC) was generated for each combination, where a numerical value was designated based on the 2-fold increase or decrease in the MICs with respect to the most potent antibiotic. The results of the combinations were verified by time-kill assay at constant concentrations of the antibiotics and in a chemostat. Only 13% of the double combinations were synergistic, whereas 5% showed antagonism. Forty-three percent of the triple combinations were synergistic with no antagonism observed, and 100% synergism was observed in combination of ciprofloxacin, amikacin, and ceftazidime. The presented protocol is simple and fast and can help the clinicians in the early selection of the effective antibiotic therapy for treatment of severe infections.

Download Full-text