Assessment of statistical methods used in library-based approaches to microbial source tracking

Kerry J. Ritter; Ethan Carruthers; C. Andrew Carson; R. D. Ellender; Valerie J. Harwood; Kyle Kingsley; Cindy Nakatsu; Michael Sadowsky; Brian Shear; Brian West; John E. Whitlock; Bruce A. Wiggins; Jayson D. Wilbur

doi:10.2166/wh.2003.0022

Assessment of statistical methods used in library-based approaches to microbial source tracking

Journal of Water and Health ◽

10.2166/wh.2003.0022 ◽

2003 ◽

Vol 1 (4) ◽

pp. 209-223 ◽

Cited By ~ 36

Author(s):

Kerry J. Ritter ◽

Ethan Carruthers ◽

C. Andrew Carson ◽

R. D. Ellender ◽

Valerie J. Harwood ◽

...

Keyword(s):

Statistical Methods ◽

Large Scale ◽

Final Analysis ◽

Microbial Source Tracking ◽

Source Tracking ◽

Effective Sample Size ◽

Correct Classification ◽

Average Similarity ◽

Maximum Similarity ◽

Threshold Criteria

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.

Download Full-text

A statistical appraisal of disproportional versus proportional microbial source tracking libraries

Journal of Water and Health ◽

10.2166/wh.2007.044 ◽

2007 ◽

Vol 5 (4) ◽

pp. 503-509 ◽

Cited By ~ 6

Author(s):

Brian J. Robinson ◽

Kerry J. Ritter ◽

R. D. Ellender

Keyword(s):

Nearest Neighbor ◽

Unknown Origin ◽

Microbial Source Tracking ◽

Source Tracking ◽

Correct Prediction ◽

Statistical Algorithm ◽

Average Similarity ◽

Source Category ◽

Matching Criteria ◽

Maximum Similarity

Library-based microbial source tracking (MST) can assist in reducing or eliminating fecal pollution in waters by predicting sources of fecal-associated bacteria. Library-based MST relies on an assembly of genetic or phenotypic “fingerprints” from pollution-indicative bacteria cultivated from known sources to compare with and identify fingerprints of unknown origin. The success of the library-based approach depends on how well each source candidate is represented in the library and which statistical algorithm or matching criterion is used to match unknowns. Because known source libraries are often built based on convenience or cost, some library sources may contain more representation than others. Depending on the statistical algorithm or matching criteria, predictions may become severely biased toward classifying unknowns into the library's dominant source category. We examined prediction bias for four of the most commonly used statistical matching algorithms in library-based MST when applied to disproportionately-represented known source libraries; maximum similarity (MS), average similarity (AS), discriminant analyses (DA), and k-means nearest neighbor (k-NN). MS was particularly sensitive to disproportionate source representation. AS and DA were more robust. k-NN provided a compromise between correct prediction and sensitivity to disproportional libraries including increased matching success and stability that should be considered when matching to disproportionally-represented libraries.

Download Full-text

Use of Antibiotic Resistance Analysis for Representativeness Testing of Multiwatershed Libraries

Applied and Environmental Microbiology ◽

10.1128/aem.69.6.3399-3405.2003 ◽

2003 ◽

Vol 69 (6) ◽

pp. 3399-3405 ◽

Cited By ~ 76

Author(s):

Bruce A. Wiggins ◽

Philip W. Cash ◽

Wes S. Creamer ◽

Scott E. Dart ◽

Preston P. Garcia ◽

...

Keyword(s):

Antibiotic Resistance ◽

Natural Waters ◽

Cross Validation ◽

Microbial Source Tracking ◽

Wild Animal ◽

Source Tracking ◽

Correct Classification ◽

Resistance Analysis ◽

Resistance Patterns ◽

Antibiotic Resistance Analysis

ABSTRACT The use of antibiotic resistance analysis (ARA) for microbial source tracking requires the generation of a library of isolates collected from known sources in the watershed. The size and composition of the library are critical in determining if it represents the diversity of patterns found in the watershed. This study was performed to determine the size that an ARA library needs to be to be representative of the watersheds for which it will be used and to determine if libraries from different watersheds can be merged to create multiwatershed libraries. Fecal samples from known human, domesticated, and wild animal sources were collected from six Virginia watersheds. From these samples, enterococci were isolated and tested by ARA. Based on cross-validation discriminant analysis, only the largest of the libraries (2,931 isolates) were found to be able to classify nonlibrary isolates as well as library isolates (i.e., were representative). Small libraries tended to have higher average rates of correct classification, but were much less able to correctly classify nonlibrary isolates. A merged multiwatershed library (6,587 isolates) was created and was found to be large enough to be representative of the isolates from the contributing watersheds. When isolates that were collected from the contributing watersheds approximately 1 year later were analyzed with the multiwatershed library, they were classified as well as the isolates in the library, suggesting that the resistance patterns are temporally stable for at least 1 year. The ability to obtain a representative, temporally stable library demonstrates that ARA can be used to identify sources of fecal pollution in natural waters.

Download Full-text

Methods To Increase Fidelity of Repetitive Extragenic Palindromic PCR Fingerprint-Based Bacterial Source Tracking Efforts

Applied and Environmental Microbiology ◽

10.1128/aem.71.1.512-518.2005 ◽

2005 ◽

Vol 71 (1) ◽

pp. 512-518 ◽

Cited By ~ 26

Author(s):

Wail M. Hassan ◽

Shiao Y. Wang ◽

Rudolph D. Ellender

Keyword(s):

Quality Factor ◽

Source Tracking ◽

Blind Test ◽

E Coli ◽

Bacterial Source Tracking ◽

Correct Assignment ◽

Average Similarity ◽

Maximum Similarity ◽

Repetitive Extragenic Palindromic Pcr ◽

Repetitive Extragenic Palindromic

ABSTRACT The goal of the study was to determine which similarity coefficient and statistical method to use to produce the highest rate of correct assignment (RCA) in repetitive extragenic palindromic PCR-based bacterial source tracking. In addition, the use of standards for deciding whether to accept or reject source assignments was investigated. The use of curve-based coefficients Cosine Coefficient and Pearson's Product Moment Correlation yielded higher RCAs than the use of band-based coefficients Jaccard, Dice, Jeffrey's x, and Ochiai. When enterococcal and Escherichia coli isolates from known sources were used in a blind test, the use of maximum similarity produced consistently higher RCAs than the use of average similarity. We also found that the use of a similarity value threshold and/or a quality factor threshold (the ratio of the average fingerprint similarity within a source to the average similarity of this source's isolates to an unknown) to decide whether to accept source assignments of unknowns increases the reliability of source assignments. Applying a similarity value threshold improved the overall RCA (ORCA) by 15 to 27% when enterococcal fingerprints were used and 8 to 29% when E. coli fingerprints were used. Applying the quality factor threshold resulted in a 22 to 32% improvement in the ORCA, depending on the fingerprinting technique used. This increase in reliability was, however, achieved at the expense of decreased numbers of isolates that were assigned a source.

Download Full-text

Applicability of DNA based quantitative microbial source tracking (QMST) evaluated on a large scale in the Danube River and its important tributaries

River Systems ◽

10.1127/lr/18/2008/117 ◽

2008 ◽

Vol 18 (1-2) ◽

pp. 117-125 ◽

Cited By ~ 4

Author(s):

G. H Reischer ◽

G. G Kavka ◽

D. C. Kasper ◽

Ch Winter ◽

R. L Mach ◽

...

Keyword(s):

Large Scale ◽

Microbial Source Tracking ◽

Danube River ◽

Source Tracking ◽

The Danube River

Download Full-text

0638 - Selecting microbial source tracking markers to identify fecal contamination from agricultural sources in Laguna Lake, Philippines

10.26226/morressier.5b5199bfb1b87b000ecef785 ◽

2018 ◽

Author(s):

Windell L. Rivera, PhD

Keyword(s):

Fecal Contamination ◽

Microbial Source Tracking ◽

Source Tracking ◽

Laguna Lake

Download Full-text

Investigating Sources of Fecal Contamination in Storm Drain Outfalls: Application of Genotypic Microbial Source Tracking

Proceedings of the Water Environment Federation ◽

10.2175/193864717822156497 ◽

2017 ◽

Vol 2017 (6) ◽

pp. 4813-4825

Author(s):

Darshan Baral ◽

Xu Li ◽

David Admiraal ◽

Bruce Dvorak

Keyword(s):

Fecal Contamination ◽

Microbial Source Tracking ◽

Source Tracking ◽

Storm Drain

Download Full-text

Microbial Source Tracking in Small Farms: Use of Different Methods for Adenovirus Detection

Water Air & Soil Pollution ◽

10.1007/s11270-021-05011-8 ◽

2021 ◽

Vol 232 (2) ◽

Author(s):

Meriane Demoliner ◽

Juliana Schons Gularte ◽

Viviane Girardi ◽

Ana Karolina Antunes Eisen ◽

Fernanda Gil de Souza ◽

...

Keyword(s):

Microbial Source Tracking ◽

Source Tracking ◽

Small Farms

Download Full-text

Improving Community Health through Microbial Source Tracking (EPA)

Federal Grants & Contracts ◽

10.1002/fgc.31794 ◽

2021 ◽

Vol 45 (14) ◽

pp. 6-6

Keyword(s):

Community Health ◽

Microbial Source Tracking ◽

Source Tracking

Download Full-text

Specificity and sensitivity evaluation of novel and existing Bacteroidales and Bifidobacteria-specific PCR assays on feces and sewage samples and their application for microbial source tracking in Ireland

Water Research ◽

10.1016/j.watres.2009.08.050 ◽

2009 ◽

Vol 43 (19) ◽

pp. 4980-4988 ◽

Cited By ~ 28

Author(s):

Siobhán Dorai-Raj ◽

Justin O' Grady ◽

Emer Colleran

Keyword(s):

Microbial Source Tracking ◽

Source Tracking ◽

Specific Pcr ◽

Specificity And Sensitivity ◽

Sensitivity Evaluation ◽

Pcr Assays

Download Full-text

The Effects of Indicator Organism Type on Phenotypic Characterization of Host-Specificity and the Implications for Microbial Source Tracking

Proceedings of the Water Environment Federation ◽

10.2175/193864707787223556 ◽

2007 ◽

Vol 2007 (11) ◽

pp. 7063-7071 ◽

Cited By ~ 1

Author(s):

Deniz Yurtsever ◽

Berat Z. Haznedaroglu ◽

Timur Dunaev ◽

Metin Duran

Keyword(s):

Host Specificity ◽

Microbial Source Tracking ◽

Source Tracking ◽

Phenotypic Characterization ◽

Indicator Organism

Download Full-text