scholarly journals Assessment of statistical methods used in library-based approaches to microbial source tracking

2003 ◽  
Vol 1 (4) ◽  
pp. 209-223 ◽  
Author(s):  
Kerry J. Ritter ◽  
Ethan Carruthers ◽  
C. Andrew Carson ◽  
R. D. Ellender ◽  
Valerie J. Harwood ◽  
...  

Several commonly used statistical methods for fingerprint identification in microbial source tracking (MST) were examined to assess the effectiveness of pattern-matching algorithms to correctly identify sources. Although numerous statistical methods have been employed for source identification, no widespread consensus exists as to which is most appropriate. A large-scale comparison of several MST methods, using identical fecal sources, presented a unique opportunity to assess the utility of several popular statistical methods. These included discriminant analysis, nearest neighbour analysis, maximum similarity and average similarity, along with several measures of distance or similarity. Threshold criteria for excluding uncertain or poorly matched isolates from final analysis were also examined for their ability to reduce false positives and increase prediction success. Six independent libraries used in the study were constructed from indicator bacteria isolated from fecal materials of humans, seagulls, cows and dogs. Three of these libraries were constructed using the rep-PCR technique and three relied on antibiotic resistance analysis (ARA). Five of the libraries were constructed using Escherichia coli and one using Enterococcus spp. (ARA). Overall, the outcome of this study suggests a high degree of variability across statistical methods. Despite large differences in correct classification rates among the statistical methods, no single statistical approach emerged as superior. Thresholds failed to consistently increase rates of correct classification and improvement was often associated with substantial effective sample size reduction. Recommendations are provided to aid in selecting appropriate analyses for these types of data.

2007 ◽  
Vol 5 (4) ◽  
pp. 503-509 ◽  
Author(s):  
Brian J. Robinson ◽  
Kerry J. Ritter ◽  
R. D. Ellender

Library-based microbial source tracking (MST) can assist in reducing or eliminating fecal pollution in waters by predicting sources of fecal-associated bacteria. Library-based MST relies on an assembly of genetic or phenotypic “fingerprints” from pollution-indicative bacteria cultivated from known sources to compare with and identify fingerprints of unknown origin. The success of the library-based approach depends on how well each source candidate is represented in the library and which statistical algorithm or matching criterion is used to match unknowns. Because known source libraries are often built based on convenience or cost, some library sources may contain more representation than others. Depending on the statistical algorithm or matching criteria, predictions may become severely biased toward classifying unknowns into the library's dominant source category. We examined prediction bias for four of the most commonly used statistical matching algorithms in library-based MST when applied to disproportionately-represented known source libraries; maximum similarity (MS), average similarity (AS), discriminant analyses (DA), and k-means nearest neighbor (k-NN). MS was particularly sensitive to disproportionate source representation. AS and DA were more robust. k-NN provided a compromise between correct prediction and sensitivity to disproportional libraries including increased matching success and stability that should be considered when matching to disproportionally-represented libraries.


2003 ◽  
Vol 69 (6) ◽  
pp. 3399-3405 ◽  
Author(s):  
Bruce A. Wiggins ◽  
Philip W. Cash ◽  
Wes S. Creamer ◽  
Scott E. Dart ◽  
Preston P. Garcia ◽  
...  

ABSTRACT The use of antibiotic resistance analysis (ARA) for microbial source tracking requires the generation of a library of isolates collected from known sources in the watershed. The size and composition of the library are critical in determining if it represents the diversity of patterns found in the watershed. This study was performed to determine the size that an ARA library needs to be to be representative of the watersheds for which it will be used and to determine if libraries from different watersheds can be merged to create multiwatershed libraries. Fecal samples from known human, domesticated, and wild animal sources were collected from six Virginia watersheds. From these samples, enterococci were isolated and tested by ARA. Based on cross-validation discriminant analysis, only the largest of the libraries (2,931 isolates) were found to be able to classify nonlibrary isolates as well as library isolates (i.e., were representative). Small libraries tended to have higher average rates of correct classification, but were much less able to correctly classify nonlibrary isolates. A merged multiwatershed library (6,587 isolates) was created and was found to be large enough to be representative of the isolates from the contributing watersheds. When isolates that were collected from the contributing watersheds approximately 1 year later were analyzed with the multiwatershed library, they were classified as well as the isolates in the library, suggesting that the resistance patterns are temporally stable for at least 1 year. The ability to obtain a representative, temporally stable library demonstrates that ARA can be used to identify sources of fecal pollution in natural waters.


2005 ◽  
Vol 71 (1) ◽  
pp. 512-518 ◽  
Author(s):  
Wail M. Hassan ◽  
Shiao Y. Wang ◽  
Rudolph D. Ellender

ABSTRACT The goal of the study was to determine which similarity coefficient and statistical method to use to produce the highest rate of correct assignment (RCA) in repetitive extragenic palindromic PCR-based bacterial source tracking. In addition, the use of standards for deciding whether to accept or reject source assignments was investigated. The use of curve-based coefficients Cosine Coefficient and Pearson's Product Moment Correlation yielded higher RCAs than the use of band-based coefficients Jaccard, Dice, Jeffrey's x, and Ochiai. When enterococcal and Escherichia coli isolates from known sources were used in a blind test, the use of maximum similarity produced consistently higher RCAs than the use of average similarity. We also found that the use of a similarity value threshold and/or a quality factor threshold (the ratio of the average fingerprint similarity within a source to the average similarity of this source's isolates to an unknown) to decide whether to accept source assignments of unknowns increases the reliability of source assignments. Applying a similarity value threshold improved the overall RCA (ORCA) by 15 to 27% when enterococcal fingerprints were used and 8 to 29% when E. coli fingerprints were used. Applying the quality factor threshold resulted in a 22 to 32% improvement in the ORCA, depending on the fingerprinting technique used. This increase in reliability was, however, achieved at the expense of decreased numbers of isolates that were assigned a source.


River Systems ◽  
2008 ◽  
Vol 18 (1-2) ◽  
pp. 117-125 ◽  
Author(s):  
G. H Reischer ◽  
G. G Kavka ◽  
D. C. Kasper ◽  
Ch Winter ◽  
R. L Mach ◽  
...  

2021 ◽  
Vol 232 (2) ◽  
Author(s):  
Meriane Demoliner ◽  
Juliana Schons Gularte ◽  
Viviane Girardi ◽  
Ana Karolina Antunes Eisen ◽  
Fernanda Gil de Souza ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document