UNCROSS2: identification of cross-talk in 16S rRNA OTU tables
AbstractNext-generation amplicon sequencing is widely used for surveying biological diversity in applications such as microbial metagenomics, immune system repertoire analysis and targeted tumor sequencing of cancer-associated genes. In such studies, assignment of reads to incorrect samples (cross-talk) is a well-documented problem that is rarely considered in practice. Here, I describe UNCROSS2, an algorithm designed to detect and filter cross-talk in OTU tables generated by next-generation sequencing of the 16S ribosomal RNA gene. On eight published datasets, cross-talk rates are estimated to range from 0.4% to 1.5% mis-assigned reads. On a mock community test, UNCROSS2 identifies spurious counts due to cross-talk with sensitivity ∼80% to 90% and error rate from ∼1% to ∼20%, but it is not clear whether the accuracy of the algorithm is sufficient to decisively improve diversity rates in practice.