The single-species metagenome: subtypingStaphylococcus aureuscore genome sequences from shotgun metagenomic data
In this study we developed a genome-based method for detectingStaphylococcus aureussubtypes from metagenome shotgun sequence data. We used a binomial mixture model and the coverage counts at >100,000 knownS. aureusSNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of 40 subtypes in metagenome samples. We were able to obtain >87% sensitivity and >94% specificity at 0.025X coverage forS. aureus. We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, containedS. aureusat genome coverage >0.025. In both projects, CC8 and CC30 were the most commonS. aureusclonal complexes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage often associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are possibly associated withS. aureuscarriage but there was limited power to identify factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city.