scholarly journals BCFtools/csq: Haplotype-aware variant consequences

2016 ◽  
Author(s):  
Petr Danecek ◽  
Shane A. McCarthy

AbstractMotivation:Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues.Results:BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory.Availability:The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from http://samtools.github.io/bcftoolsContact:[email protected]

Author(s):  
Jouni Sirén ◽  
Erik Garrison ◽  
Adam M Novak ◽  
Benedict Paten ◽  
Richard Durbin

Abstract Motivation The variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are non-biological, unlikely recombinations of true haplotypes. Results We augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheeler transform. We demonstrate the scalability of the new implementation by building a whole-genome index of the 5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070 Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes. Availability and implementation Our software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Peter Pfaffelhuber ◽  
Elisabeth Sester-Huss ◽  
Franz Baumdicker ◽  
Jana Naue ◽  
Sabine Lutz-Bonengel ◽  
...  

AbstractThe inference of biogeographic ancestry (BGA) has become a focus of forensic genetics. Mis-inference of BGA can have profound unwanted consequences for investigations and society. We show that recent admixture can lead to misclassification and erroneous inference of ancestry proportions, using state of the art analysis tools with (i) simulations, (ii) 1000 genomes project data, and (iii) two individuals analyzed using the ForenSeq DNA Signature Prep Kit. Subsequently, we extend existing tools for estimation of individual ancestry (IA) by allowing for different IA in both parents, leading to estimates of parental individual ancestry (PIA), and a statistical test for recent admixture. Estimation of PIA outperforms IA in most scenarios of recent admixture. Furthermore, additional information about parental ancestry can be acquired with PIA that may guide casework.


2012 ◽  
Vol 9 (5) ◽  
pp. 459-462 ◽  
Author(s):  
Laura Clarke ◽  
◽  
Xiangqun Zheng-Bradley ◽  
Richard Smith ◽  
Eugene Kulesha ◽  
...  

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. M. Bea ◽  
E. Franco-Marín ◽  
V. Marco-Benedí ◽  
E. Jarauta ◽  
I. Gracia-Rubio ◽  
...  

AbstractAngiopoietin-like 3 (ANGPTL3) plays an important role in lipid metabolism in humans. Loss-of-function variants in ANGPTL3 cause a monogenic disease named familial combined hypolipidemia. However, the potential contribution of ANGPTL3 gene in subjects with familial combined hyperlipidemia (FCHL) has not been studied. For that reason, the aim of this work was to investigate the potential contribution of ANGPTL3 in the aetiology of FCHL by identifying gain-of-function (GOF) genetic variants in the ANGPTL3 gene in FCHL subjects. ANGPTL3 gene was sequenced in 162 unrelated subjects with severe FCHL and 165 normolipemic controls. Pathogenicity of genetic variants was predicted with PredictSNP2 and FruitFly. Frequency of identified variants in FCHL was compared with that of normolipemic controls and that described in the 1000 Genomes Project. No GOF mutations in ANGPTL3 were present in subjects with FCHL. Four variants were identified in FCHL subjects, showing a different frequency from that observed in normolipemic controls: c.607-109T>C, c.607-47_607-46delGT, c.835+41C>A and c.*52_*60del. This last variant, c.*52_*60del, is a microRNA associated sequence in the 3′UTR of ANGPTL3, and it was present 2.7 times more frequently in normolipemic controls than in FCHL subjects. Our research shows that no GOF mutations in ANGPTL3 were found in a large group of unrelated subjects with FCHL.


2019 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Ernesto Lowy-Gallego ◽  
Susan Fairley ◽  
Xiangqun Zheng-Bradley ◽  
Magali Ruffier ◽  
Laura Clarke ◽  
...  

We present a set of biallelic SNVs and INDELs, from 2,548 samples spanning 26 populations from the 1000 Genomes Project, called de novo on GRCh38. We believe this will be a useful reference resource for those using GRCh38. It represents an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date by encompassing all of the GRCh38 primary assembly autosomes and pseudo-autosomal regions, including novel, medically relevant loci. Here, we describe how the data set was created and benchmark our call set against that produced by the final phase of the 1000 Genomes Project on GRCh37 and the lift-over of that data to GRCh38.


2019 ◽  
Vol 4 ◽  
pp. 50 ◽  
Author(s):  
Ernesto Lowy-Gallego ◽  
Susan Fairley ◽  
Xiangqun Zheng-Bradley ◽  
Magali Ruffier ◽  
Laura Clarke ◽  
...  

We present biallelic SNVs called from 2,548 samples across 26 populations from the 1000 Genomes Project, called directly on GRCh38. We believe this will be a useful reference resource for those using GRCh38, representing an improvement over the “lift-overs” of the 1000 Genomes Project data that have been available to date and providing a resource necessary for the full adoption of GRCh38 by the community. Here, we describe how the call set was created and provide benchmarking data describing how our call set compares to that produced by the final phase of the 1000 Genomes Project on GRCh37.


2019 ◽  
Author(s):  
Jouni Sirén ◽  
Erik Garrison ◽  
Adam M. Novak ◽  
Benedict Paten ◽  
Richard Durbin

AbstractMotivationThe variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.ResultsWe augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.AvailabilityOur software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/[email protected] informationSupplementary data are available.


PLoS ONE ◽  
2014 ◽  
Vol 9 (1) ◽  
pp. e85899 ◽  
Author(s):  
Giuseppe Indolfi ◽  
Giusi Mangone ◽  
Elisa Bartolini ◽  
Gabriella Nebbia ◽  
Pier Luigi Calvo ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document