scholarly journals New synthetic-diploid benchmark for accurate variant calling evaluation

2017 ◽  
Author(s):  
Heng Li ◽  
Jonathan M Bloom ◽  
Yossi Farjoun ◽  
Mark Fleharty ◽  
Laura Gauthier ◽  
...  

Constructed from the consensus of multiple variant callers based on short-read data, existing benchmark datasets for evaluating variant calling accuracy are biased toward easy regions accessible by known algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two human cell lines that are homozygous across the whole genome. This benchmark provides a more accurate and less biased estimate of the error rate of small variant calls in a realistic context.

2021 ◽  
Author(s):  
Ying Chen ◽  
Nadia M. Davidson ◽  
Yuk Kei Wan ◽  
Harshil Patel ◽  
Fei Yao ◽  
...  

AbstractThe human genome contains more than 200,000 gene isoforms. However, different isoforms can be highly similar, and with an average length of 1.5kb remain difficult to study with short read sequencing. To systematically evaluate the ability to study the transcriptome at a resolution of individual isoforms we profiled 5 human cell lines with short read cDNA sequencing and Nanopore long read direct RNA, amplification-free direct cDNA, PCR-cDNA sequencing. The long read protocols showed a high level of consistency, with amplification-free RNA and cDNA sequencing being most similar. While short and long reads generated comparable gene expression estimates, they differed substantially for individual isoforms. We find that increased read length improves read-to-transcript assignment, identifies interactions between alternative promoters and splicing, enables the discovery of novel transcripts from repetitive regions, facilitates the quantification of full-length fusion isoforms and enables the simultaneous profiling of m6A RNA modifications when RNA is sequenced directly. Our study demonstrates the advantage of long read RNA sequencing and provides a comprehensive resource that will enable the development and benchmarking of computational methods for profiling complex transcriptional events at isoform-level resolution.


Genes ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 486 ◽  
Author(s):  
Adam Ameur ◽  
Huiwen Che ◽  
Marcel Martin ◽  
Ignas Bunikis ◽  
Johan Dahlberg ◽  
...  

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.


2019 ◽  
Vol 58 ◽  
pp. 26-34 ◽  
Author(s):  
I.O. Suchkova ◽  
L.K. Sasina ◽  
N.I. Dergacheva ◽  
G.A. Sofronov ◽  
E.L. Patkin

2004 ◽  
Vol 10 (5-6) ◽  
pp. 226-228
Author(s):  
L.M. Nosach ◽  
◽  
O.Yu. Povnitsa ◽  
V.L. Zhovnovata ◽  
◽  
...  

2021 ◽  
Vol 570 ◽  
pp. 206-213
Author(s):  
Ryohei Saito ◽  
Hiromasa Satoh ◽  
Kayo Aoba ◽  
Hajime Hirasawa ◽  
Naofumi Miwa

Sign in / Sign up

Export Citation Format

Share Document