Finding functional disease-associated non-coding variation using next-generation sequencing

Mapping Intimacies ◽

10.1101/060285 ◽

2016 ◽

Author(s):

Paolo Devanna ◽

Xiaowei Sylvia Chen ◽

Joses Ho ◽

Dario Gajewski ◽

Alessandro Gialluisi ◽

...

Keyword(s):

Next Generation Sequencing ◽

Binding Sites ◽

Large Scale ◽

Sequence Data ◽

Whole Genome Sequence ◽

Next Generation Sequencing Data ◽

Whole Genome ◽

Next Generation ◽

Whole Exome ◽

Generation Sequencing

ABSTRACTNext generation sequencing has opened the way for the large scale interrogation of cohorts at the whole exome, or whole genome level. Currently, the field largely focuses on potential disease causing variants that fall within coding sequences and that are predicted to cause protein sequence changes, generally discarding non-coding variants. However non-coding DNA makes up ~98% of the genome and contains a range of sequences essential for controlling the expression of protein coding genes. Thus, potentially causative non-coding variation is currently being overlooked. To address this, we have designed an approach to assess variation in one class of non-coding regulatory DNA; the 3′UTRome. Variants in the 3'UTR region of genes are of particular interest because 3'UTRs are responsible for modulating protein expression levels via their interactions with microRNAs. Furthermore they are amenable to large scale analysis as 3′UTR-microRNA interactions are based on complementary base pairing and as such can be predicted in silico at the genome-wide level. We report a strategy for identifying and functionally testing variants in microRNA binding sites within the 3'UTRome and demonstrate the efficacy of this pipeline in a cohort of language impaired children. Using whole exome sequence data from 43 probands, we extracted variants that lay within 3'UTR microRNA binding sites. We identified a common variant (SNP) in a microRNA binding site and found this SNP to be associated with an endophenotype of language impairment (non-word repetition). We showed that this variant disrupted microRNA regulation in cells and was linked to altered gene expression in the brain, suggesting it may represent a risk factor contributing to SLI. This work demonstrates that biologically relevant variants are currently being under-investigated despite the wealth of next-generation sequencing data available and presents a simple strategy for interrogating non-coding regions of the genome. We propose that this strategy should be routinely applied to whole exome and whole genome sequence data in order to broaden our understanding of how non-coding genetic variation underlies complex phenotypes such as neurodevelopmental disorders.

Download Full-text

Abstract 45: Detection of focal somatic copy number variants in whole genome, whole exome, and targeted next-generation sequencing data of tumor/normal pairs

10.1158/1557-3265.pmsclingen15-45 ◽

2016 ◽

Cited By ~ 1

Author(s):

Jessica Aldrich ◽

Jonathan J. Keats ◽

Winnie S. Liang ◽

John D. Carpten ◽

David W. Craig

Keyword(s):

Next Generation Sequencing ◽

Copy Number ◽

Copy Number Variants ◽

Next Generation Sequencing Data ◽

Whole Genome ◽

Next Generation ◽

Sequencing Data ◽

Targeted Next Generation Sequencing ◽

Whole Exome ◽

Generation Sequencing

Download Full-text

Speeding Up Large-Scale Next Generation Sequencing Data Analysis with pBWA

Journal of Applied Bioinformatics & Computational Biology ◽

10.4172/2329-9533.1000101 ◽

2017 ◽

Vol 01 (01) ◽

Cited By ~ 4

Author(s):

Darren Peters ◽

Xuemei Luo ◽

Ke Qiu ◽

Ping Liang

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

Validation of variants using cost effective highresolution melting (HRM) analysis predicted from target re-sequencing in Eucalyptus

Acta Botanica Croatica ◽

10.37427/botcro-2020-019 ◽

2020 ◽

Vol 79 (2) ◽

pp. 105-113

Author(s):

Abdul Bari Muneera Parveen ◽

Divya Lakshmanan ◽

Modhumita Ghosh Dasgupta

Keyword(s):

Next Generation Sequencing ◽

Large Scale ◽

Sequence Data ◽

Cost Effective ◽

Nucleotide Polymorphisms ◽

Next Generation ◽

Time Saving ◽

Hrm Analysis ◽

The Cost ◽

Generation Sequencing

The advent of next-generation sequencing has facilitated large-scale discovery and mapping of genomic variants for high-throughput genotyping. Several research groups working in tree species are presently employing next generation sequencing (NGS) platforms for marker discovery, since it is a cost effective and time saving strategy. However, most trees lack a chromosome level genome map and validation of variants for downstream application becomes obligatory. The cost associated with identifying potential variants from the enormous amount of sequence data is a major limitation. In the present study, high resolution melting (HRM) analysis was optimized for rapid validation of single nucleotide polymorphisms (SNPs), insertions or deletions (InDels) and simple sequence repeats (SSRs) predicted from exome sequencing of parents and hybrids of Eucalyptus tereticornis Sm. ? Eucalyptus grandis Hill ex Maiden generated from controlled hybridization. The cost per data point was less than 0.5 USD, providing great flexibility in terms of cost and sensitivity, when compared to other validation methods. The sensitivity of this technology in variant detection can be extended to other applications including Bar-HRM for species authentication and TILLING for detection of mutants.

Download Full-text

Extraction of Mitochondrial Genome from Whole Genome Next Generation Sequencing Data and Unveiling of Forensically Relevant Markers

Russian Journal of Genetics ◽

10.1134/s1022795420080128 ◽

2020 ◽

Vol 56 (8) ◽

pp. 982-991

Author(s):

S. Rauf ◽

N. Zahra ◽

S. S. Malik ◽

S. A. e Zahra ◽

K. Sughra ◽

...

Keyword(s):

Next Generation Sequencing ◽

Mitochondrial Genome ◽

Next Generation Sequencing Data ◽

Whole Genome ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Exogene: A performant workflow for detecting viral integrations from paired-end next-generation sequencing data

10.1101/2021.04.19.440427 ◽

2021 ◽

Author(s):

Jean-Pierre Kocher ◽

Zachary Stephens ◽

Daniel O'Brien ◽

Mrunal Dehankar ◽

Lewis Roberts ◽

...

Keyword(s):

Next Generation Sequencing ◽

Sequence Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Long Read ◽

Breakpoint Detection ◽

Targeted Capture ◽

Genome Heterogeneity ◽

Generation Sequencing

The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with those found in long read validation sets. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are validated by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq or targeted capture.

Download Full-text

NGSPERL: a semi-automated framework for large scale next generation sequencing data analysis

International Journal of Computational Biology and Drug Design ◽

10.1504/ijcbdd.2015.072082 ◽

2015 ◽

Vol 8 (3) ◽

pp. 203

Author(s):

Quanhu Sheng ◽

Shilin Zhao ◽

Mingsheng Guo ◽

Yu Shyr

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

Replicate whole-genome next-generation sequencing data derived from Caucasian donor saliva samples

Data in Brief ◽

10.1016/j.dib.2021.107349 ◽

2021 ◽

pp. 107349

Author(s):

Marcus Høy Hansen ◽

Charlotte Guldborg Nyvold

Keyword(s):

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Whole Genome ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text

Whole-genome sequencing data of Kazakh individuals

BMC Research Notes ◽

10.1186/s13104-021-05464-4 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Ulykbek Kairov ◽

Askhat Molkenov ◽

Saule Rakhimova ◽

Ulan Kozhamkulov ◽

Aigul Sharip ◽

...

Keyword(s):

Next Generation Sequencing ◽

Whole Genome Sequence ◽

Whole Genome ◽

Next Generation ◽

Sequencing Data ◽

Sequencing Platform ◽

Next Generation Sequencing Platform ◽

Central Asian ◽

Whole Genomes ◽

Generation Sequencing

Abstract Objectives Kazakhstan is a Central Asian crossroad of European and Asian populations situated along the way of the Great Silk Way. The territory of Kazakhstan has historically been inhabited by nomadic tribes and today is the multi-ethnic country with the dominant Kazakh ethnic group. We sequenced and analyzed the whole-genomes of five ethnic healthy Kazakh individuals with high coverage using next-generation sequencing platform. This whole-genome sequence data of healthy Kazakh individuals can be a valuable reference for biomedical studies investigating disease associations and population-wide genomic studies of ethnically diverse Central Asian region. Data description Blood samples have been collected from five ethnic healthy Kazakh individuals living in Kazakhstan. The genomic DNA was extracted from blood and sequenced. Sequencing was performed on Illumina HiSeq2000 next-generation sequencing platform. We sequenced and analyzed the whole-genomes of ethnic Kazakh individuals with the coverage ranging from 26 to 32X. Ranging from 98.85 to 99.58% base pairs were totally mapped and aligned on the human reference genome GRCh37 hg19. Het/Hom and Ts/Tv ratios for each whole genome ranged from 1.35 to 1.49 and from 2.07 to 2.08, respectively. Sequencing data are available in the National Center for Biotechnology Information SRA database under the accession number PRJNA374772.

Download Full-text

GenoPheno: cataloging large-scale phenotypic and next-generation sequencing data within human datasets

Briefings in Bioinformatics ◽

10.1093/bib/bbaa033 ◽

2020 ◽

Author(s):

Alba Gutiérrez-Sacristán ◽

Carlos De Niz ◽

Cartik Kothari ◽

Sek Won Kong ◽

Kenneth D Mandl ◽

...

Keyword(s):

Next Generation Sequencing ◽

Web Application ◽

Large Scale ◽

Human Subjects ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Phenotypic Data ◽

Data Repositories ◽

Generation Sequencing

Abstract Precision medicine promises to revolutionize treatment, shifting therapeutic approaches from the classical one-size-fits-all to those more tailored to the patient’s individual genomic profile, lifestyle and environmental exposures. Yet, to advance precision medicine’s main objective—ensuring the optimum diagnosis, treatment and prognosis for each individual—investigators need access to large-scale clinical and genomic data repositories. Despite the vast proliferation of these datasets, locating and obtaining access to many remains a challenge. We sought to provide an overview of available patient-level datasets that contain both genotypic data, obtained by next-generation sequencing, and phenotypic data—and to create a dynamic, online catalog for consultation, contribution and revision by the research community. Datasets included in this review conform to six specific inclusion parameters that are: (i) contain data from more than 500 human subjects; (ii) contain both genotypic and phenotypic data from the same subjects; (iii) include whole genome sequencing or whole exome sequencing data; (iv) include at least 100 recorded phenotypic variables per subject; (v) accessible through a website or collaboration with investigators and (vi) make access information available in English. Using these criteria, we identified 30 datasets, reviewed them and provided results in the release version of a catalog, which is publicly available through a dynamic Web application and on GitHub. Users can review as well as contribute new datasets for inclusion (Web: https://avillachlab.shinyapps.io/genophenocatalog/; GitHub: https://github.com/hms-dbmi/GenoPheno-CatalogShiny).

Download Full-text

Proteogenomic strategies for identification of aberrant cancer peptides using large-scale next-generation sequencing data

PROTEOMICS ◽

10.1002/pmic.201400206 ◽

2014 ◽

Vol 14 (23-24) ◽

pp. 2719-2730 ◽

Cited By ~ 45

Author(s):

Sunghee Woo ◽

Seong Won Cha ◽

Seungjin Na ◽

Clark Guest ◽

Tao Liu ◽

...

Keyword(s):

Next Generation Sequencing ◽

Large Scale ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Generation Sequencing

Download Full-text