scholarly journals BarleyVarDB: a database of barley genomic variation

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Cong Tan ◽  
Brett Chapman ◽  
Penghao Wang ◽  
Qisen Zhang ◽  
Gaofeng Zhou ◽  
...  

Abstract Barley (Hordeum vulgare L.) is one of the first domesticated grain crops and represents the fourth most important cereal source for human and animal consumption. BarleyVarDB is a database of barley genomic variation. It can be publicly accessible through the website at http://146.118.64.11/BarleyVar. This database mainly provides three sets of information. First, there are 57 754 224 single nuclear polymorphisms (SNPs) and 3 600 663 insertions or deletions (InDels) included in BarleyVarDB, which were identified from high-coverage whole genome sequencing of 21 barley germplasm, including 8 wild barley accessions from 3 barley evolutionary original centers and 13 barley landraces from different continents. Second, it uses the latest barley genome reference and its annotation information publicly accessible, which has been achieved by the International Barley Genome Sequencing Consortium (IBSC). Third, 522 212 whole genome-wide microsatellites/simple sequence repeats (SSRs) were also included in this database, which were identified in the reference barley pseudo-molecular genome sequence. Additionally, several useful web-based applications are provided including JBrowse, BLAST and Primer3. Users can design PCR primers to asses polymorphic variants deposited in this database and use a user-friendly interface for accessing the barley reference genome. We envisage that the BarleyVarDB will benefit the barley genetic research community by providing access to all publicly available barley genomic variation information and barley reference genome as well as providing them with an ultra-high density of SNP and InDel markers for molecular breeding and identification of functional genes with important agronomic traits in barley. Database URL: http://146.118.64.11/BarleyVar

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xiaoting Xia ◽  
Shunjin Zhang ◽  
Huaju Zhang ◽  
Zijing Zhang ◽  
Ningbo Chen ◽  
...  

Abstract Background Native cattle breeds are an important source of genetic variation because they might carry alleles that enable them to adapt to local environment and tough feeding conditions. Jiaxian Red, a Chinese native cattle breed, is reported to have originated from crossbreeding between taurine and indicine cattle; their history as a draft and meat animal dates back at least 30 years. Using whole-genome sequencing (WGS) data of 30 animals from the core breeding farm, we investigated the genetic diversity, population structure and genomic regions under selection of Jiaxian Red cattle. Furthermore, we used 131 published genomes of world-wide cattle to characterize the genomic variation of Jiaxian Red cattle. Results The population structure analysis revealed that Jiaxian Red cattle harboured the ancestry with East Asian taurine (0.493), Chinese indicine (0.379), European taurine (0.095) and Indian indicine (0.033). Three methods (nucleotide diversity, linkage disequilibrium decay and runs of homozygosity) implied the relatively high genomic diversity in Jiaxian Red cattle. We used θπ, CLR, FST and XP-EHH methods to look for the candidate signatures of positive selection in Jiaxian Red cattle. A total number of 171 (θπ and CLR) and 17 (FST and XP-EHH) shared genes were identified using different detection strategies. Functional annotation analysis revealed that these genes are potentially responsible for growth and feed efficiency (CCSER1), meat quality traits (ROCK2, PPP1R12A, CYB5R4, EYA3, PHACTR1), fertility (RFX4, SRD5A2) and immune system response (SLAMF1, CD84 and SLAMF6). Conclusion We provide a comprehensive overview of sequence variations in Jiaxian Red cattle genomes. Selection signatures were detected in genomic regions that are possibly related to economically important traits in Jiaxian Red cattle. We observed a high level of genomic diversity and low inbreeding in Jiaxian Red cattle. These results provide a basis for further resource protection and breeding improvement of this breed.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Agata Stodolna ◽  
Miao He ◽  
Mahesh Vasipalli ◽  
Zoya Kingsbury ◽  
Jennifer Becq ◽  
...  

Abstract Background Clinical-grade whole-genome sequencing (cWGS) has the potential to become the standard of care within the clinic because of its breadth of coverage and lack of bias towards certain regions of the genome. Colorectal cancer presents a difficult treatment paradigm, with over 40% of patients presenting at diagnosis with metastatic disease. We hypothesised that cWGS coupled with 3′ transcriptome analysis would give new insights into colorectal cancer. Methods Patients underwent PCR-free whole-genome sequencing and alignment and variant calling using a standardised pipeline to output SNVs, indels, SVs and CNAs. Additional insights into the mutational signatures and tumour biology were gained by the use of 3′ RNA-seq. Results Fifty-four patients were studied in total. Driver analysis identified the Wnt pathway gene APC as the only consistently mutated driver in colorectal cancer. Alterations in the PI3K/mTOR pathways were seen as previously observed in CRC. Multiple private CNAs, SVs and gene fusions were unique to individual tumours. Approximately 30% of patients had a tumour mutational burden of > 10 mutations/Mb of DNA, suggesting suitability for immunotherapy. Conclusions Clinical whole-genome sequencing offers a potential avenue for the identification of private genomic variation that may confer sensitivity to targeted agents and offer patients new options for targeted therapies.


2020 ◽  
Author(s):  
Agata Stodolna ◽  
Miao He ◽  
Mahesh Vasipalli ◽  
Zoya Kingsbury ◽  
Jennifer Becq ◽  
...  

AbstractIntroductionClinical grade whole genome sequencing (cWGS) has the potential to become standard of care within the clinic because of its breadth of coverage and lack of bias towards certain regions of the genome. Colorectal cancer presents a difficult treatment paradigm, with over 40% of patients presenting at diagnosis with metastatic disease. We hypothesised that cWGS coupled with 3’ transcriptome analysis would give new insights into colorectal cancer.MethodsPatients underwent PCR-free whole genome sequencing and alignment and variant calling using a standardised pipeline to output SNVs, indels, SVs and CNAs. Additional insights into mutational signatures and tumour biology were gained by the use of 3’ RNAseq.ResultsFifty-four patients were studied in total. Driver analysis identified the Wnt pathway gene APC as the only consistently mutated driver in colorectal cancer. Alterations in the PI3K/mTOR pathways were seen as previously observed in CRC. Multiple private CNAs, SVs and gene fusions were unique to individual tumours. Approximately 20% of patients had a tumour mutational burden of >10 mutations/Mb of DNA, suggesting suitability for immunotherapy.ConclusionsClinical whole genome sequencing offers a potential avenue for identification of private genomic variation that may confer sensitivity to targeted agents and offer patients new options for targeted therapies.


Author(s):  
Marta Byrska-Bishop ◽  
Uday S. Evani ◽  
Xuefang Zhao ◽  
Anna O. Basile ◽  
Haley J. Abel ◽  
...  

ABSTRACTThe 1000 Genomes Project (1kGP), launched in 2008, is the largest fully open resource of whole genome sequencing (WGS) data consented for public distribution of raw sequence data without access or use restrictions. The final (phase 3) 2015 release of 1kGP included 2,504 unrelated samples from 26 populations, representing five continental regions of the world and was based on a combination of technologies including low coverage WGS (mean depth 7.4X), high coverage whole exome sequencing (mean depth 65.7X), and microarray genotyping. Here, we present a new, high coverage WGS resource encompassing the original 2,504 1kGP samples, as well as an additional 698 related samples that result in 602 complete trios in the 1kGP cohort. We sequenced this expanded 1kGP cohort of 3,202 samples to a targeted depth of 30X using Illumina NovaSeq 6000 instruments. We performed SNV/INDEL calling against the GRCh38 reference using GATK’s HaplotypeCaller, and generated a comprehensive set of SVs by integrating multiple analytic methods through a sophisticated machine learning model, upgrading the 1kGP dataset to current state-of-the-art standards. Using this strategy, we defined over 111 million SNVs, 14 million INDELs, and ∼170 thousand SVs across the entire cohort of 3,202 samples with estimated false discovery rate (FDR) of 0.3%, 1.0%, and 1.8%, respectively. By comparison to the low-coverage phase 3 callset, we observed substantial improvements in variant discovery and estimated FDR that were facilitated by high coverage re-sequencing and expansion of the cohort. Specifically, we called 7% more SNVs, 59% more INDELs, and 170% more SVs per genome than the phase 3 callset. Moreover, we leveraged the presence of families in the cohort to achieve superior haplotype phasing accuracy and we demonstrate improvements that the high coverage panel brings especially for INDEL imputation. We make all the data generated as part of this project publicly available and we envision this updated version of the 1kGP callset to become the new de facto public resource for the worldwide scientific community working on genomics and genetics.


2021 ◽  
Author(s):  
Daniel DiCorpo ◽  
Sheila M Gaynor ◽  
Emily M Russell ◽  
Kenneth E Westerman ◽  
Laura M Raffield ◽  
...  

ABSTRACTThe genetic determinants of fasting glucose (FG) and fasting insulin (FI) have been studied mostly through genome and exome arrays, resulting in over 100 associated variants. We extended this work with a high-coverage whole genome sequencing (WGS) analysis from fifteen cohorts in the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. More than 23,000 non-diabetic individuals from five self-reported race/ethnicities (African, Asian, European, Hispanic and Samoan) were included for each trait. We analyzed 60M variants in race/ethnicity-specific and pooled single variant and rare variant aggregate tests. Twenty-two variants across sixteen gene regions were found significantly associated with FG or FI, eight of which were rare (Minor Allele Frequency, MAF<0.05). Functional annotation from resources including the Diabetes Epigenome Atlas were compiled for each signal (chromatin states, annotation principal components, and others) to elucidate variant-to-function hypotheses. Near the G6PC2 locus we identified a distinct FG signal at rare variant rs2232326 (MAF=0.01) after conditioning on known common variants. Functional annotations show rs2232326 to be disruptive and likely damaging while being weakly transcribed in islets. A pair of FG-associated variants were identified near the SLC30A8 locus. These variants, one of which was rare (MAF=0.001) and Asian race/ethnicity-specific, were shown to be in islet-specific active enhancer regions. Other associated regions include rare variants near ROBO1 and PTPRT, and common variants near MTNR1B, GCK, GCKR, FOXA2, APOB, TCF7L2, and ADCY5. We provide a catalog of nucleotide-resolution genomic variation spanning intergenic and intronic regions down to a minor allele count of 20, creating a foundation for future sequencing-based investigation of glycemic traits.


2018 ◽  
Vol 59 ◽  
pp. 1-6 ◽  
Author(s):  
Mohamed M.H. Abdelbary ◽  
Laurence Senn ◽  
Estelle Moulin ◽  
Guy Prod'hom ◽  
Antony Croxatto ◽  
...  

2018 ◽  
Author(s):  
Shweta Ramdas ◽  
Ayse Bilge Ozel ◽  
Mary K. Treutelaar ◽  
Katie Holl ◽  
Myrna Mandel ◽  
...  

AbstractWe performed whole-genome sequencing for eight inbred rat strains commonly used in genetic mapping studies. They are the founders of the NIH heterogeneous stock (HS) outbred colony. We provide their sequences and variant calls to the rat genomics community. When analyzing the variant calls we identified regions with unusually high levels of heterozygosity. These regions are consistent across the eight inbred strains, including Brown Norway, which is the basis of the rat reference genome. These regions show higher read depths than other regions in the genome and contain higher rates of apparent tri-allelic variant sites. The evidence suggests that these regions may correspond to duplicated segments that were incorrectly overlaid as a single segment in the reference genome. We provide masks for these regions of suspected mis-assembly as a resource for the community to flag potentially false interpretations of mapping or functional results.


2019 ◽  
Author(s):  
Pooja Bangar ◽  
Neetu Tyagi ◽  
Bhavana Tiwari ◽  
Sanjay Kumar ◽  
Paramananda Barman ◽  
...  

Abstract Mungbean [Vigna radiata (L.) R. Wilczek var. radiata] is vital grain legume having nutritional and socio-economic importance, especially in the developing countries. We performed whole genome re-sequencing of three accessions representing the wild progenitor species, released and landrace of mungbean to identify SNPs with relevance to genetic relationships analyses. Approximately 9.3 million raw reads were obtained by using Ion Torrent PGM™ platform and more than 92% of the reads were mapped to the reference mungbean genome. We identified a total of 233,799 single nucleotide polymorphisms in relation to the reference genome (SNPs: 103,341 in wild, 93,078 in released and 37,380 in landrace accessions) and 9,544 insertions and deletions (InDels: 4,742 in wild, 3,608 in released and 1,194 in landrace accessions) in the coding and non-coding regions. In all accessions, genomic variants were unevenly distributed within and across the mungbean chromosomes. Among these 5,339; 4,739 and 1,795 SNPs were non-synonymous in 815, 790 and 317 genes of wild, released and landrace accessions, respectively. These polymorphisms might contribute to the variation in important pathways of genes for abiotic and biotic stress tolerance and important agronomic traits such as seed dormancy, flowering time and seed size in mungbean. Among the randomly selected SNPs, a selected subset was validated using Sanger sequencing technique. The genomic variations among mungbean wild, released and landrace accessions constitute a powerful tool to support genetic research and molecular breeding of mungbean.


2019 ◽  
Author(s):  
Gianpiero Marconi ◽  
Stefano Capomaccio ◽  
Cinzia Comino ◽  
Alberto Acquadro ◽  
Ezio Portis ◽  
...  

AbstractMethods for investigating DNA methylation nowadays either require a reference genome and high coverage, or investigate only CG methylation. Moreover, no large-scale analysis can be performed for N6-methyladenosine (6mA). Here we describe the methylation content sensitive enzyme double-digest restriction-site-associated DNA (ddRAD) technique (MCSeEd), a reduced-representation, reference-free, cost-effective approach for characterizing whole genome methylation patterns across different methylation contexts (e.g., CG, CHG, CHH, 6mA). MCSeEd can also detect genetic variations among hundreds of samples. MCSeEd is based on parallel restrictions carried out by combinations of methylation insensitive and sensitive endonucleases, followed by next-generation sequencing. Moreover, we present a robust bioinformatic pipeline (available at https://bitbucket.org/capemaster/mcseed/src/master/) for differential methylation analysis combined with single nucleotide polymorphism calling without or with a reference genome.


Sign in / Sign up

Export Citation Format

Share Document