variant2literature: full text literature search for genetic variants

Mapping Intimacies ◽

10.1101/583450 ◽

2019 ◽

Author(s):

Yin-Hung Lin ◽

Yu-Chen Lu ◽

Ting-Fu Chen ◽

Jacob Shujui Hsu ◽

Ko-Han Lee ◽

...

Keyword(s):

Genetic Variants ◽

Literature Search ◽

Recall Rate ◽

Biomedical Literature ◽

Text Recognition ◽

Command Line ◽

False Negatives ◽

Web Based ◽

File Formats ◽

Generation Sequencing

AbstractMotivationWhole genome sequencing (WGS) by next-generation sequencing produces millions of variants for an individual. The retrieval of biomedical literature for such a large number of genetic variants remains challenging, because in many cases the variants are only present in tables as images, or in the supplementary documents of which the file formats are diverse.ResultsThe proposed tool named variant2literature from the TaiGenomics (Toolkits for AI genomics) resolves the problem by incorporating text recognition with image processing. In addition to the adoption of advanced image-based text retrieval, the recall rate of finding the literature containing the variants of interest is further improved by employing the skill of variant normalization. Different variant presentations are transformed into chromosome coordinates (standard VCF format) such that false negatives can be largely avoided. variant2literature is available in two ways. First, a web-based interface is provided to search all the literature in PMC Open Access Subset. Second, the command-line executable can be downloaded such that the users are free to search all the files in a specified directory locally.Availabilityhttp://variant2literature.taigenomics.com/[email protected]

Download Full-text

easyfm: An easy software suite for file manipulation of Next Generation Sequencing data on desktops

10.22541/au.163845474.49811073/v1 ◽

2021 ◽

Author(s):

Hyungtaek Jung ◽

Brendan Jeon ◽

Daniel Ortiz-Barrientos

Keyword(s):

Next Generation Sequencing ◽

Life Sciences ◽

Next Generation Sequencing Data ◽

Command Line ◽

Next Generation ◽

Web Based ◽

File Formats ◽

Wide Range ◽

Ngs Data ◽

Generation Sequencing

Storing and manipulating Next Generation Sequencing (NGS) file formats for understanding biological phenomena is an essential but difficult task in the life sciences. Yet, most methods for analysing NGS data require complex command-line tools in high-performance computing (HPC) or web-based servers and have not yet been implemented in comprehensive, easy-to-use software. Here we present easyfm (easy file manipulation), a free standalone Graphical User Interface (GUI) software with Python support that can be used to facilitate the rapid discovery of target sequences (or user’s interest) in NGS datasets for novice users (more accessible to biologists). It enables them to perform end-to-end reproducible data analyses using a desktop application (Windows, Mac and Linux). Unlike existing tools, the GUI-based easyfm is not dependent on any HPC system and can be operated without an internet connection. For user-friendliness and convenience, easyfm was developed with four work modules and a secondary GUI window, covering different aspects of NGS data analysis, including post-processing, filtering, format conversion, generating results, real-time log, and help. In combination with the executable tools (BLAST+ and BLAT) and Python, easyfm allows the user to set analysis parameters, select/extract regions of interest, examine the input and output results, and convert to a wide range of file formats. To help augment the functionality of existing web-based and command-line tools, easyfm, a self-contained program, comes with extensive documentation (https://github.com/TaekAndBrendan/easyfm). This specific benefit allows easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.

Download Full-text

HubMed: a web-based biomedical literature search interface

Nucleic Acids Research ◽

10.1093/nar/gkl037 ◽

2006 ◽

Vol 34 (Web Server) ◽

pp. W745-W747 ◽

Cited By ~ 37

Author(s):

A. D. Eaton

Keyword(s):

Literature Search ◽

Biomedical Literature ◽

Search Interface ◽

Web Based

Download Full-text

PATHOGENIC GENETIC VARIANTS IDENTIFIED BY TARGETED NEXT-GENERATION SEQUENCING IN SOME UNDEFINED PRIMARY IMMUNODEFICIENCY CASES (IZMIR EXPERIENCE)

10.26226/morressier.594a7d45d462b8028d89348d ◽

2017 ◽

Author(s):

Kutukculer Necil

Keyword(s):

Next Generation Sequencing ◽

Primary Immunodeficiency ◽

Genetic Variants ◽

Next Generation ◽

Targeted Next Generation Sequencing ◽

Generation Sequencing

Download Full-text

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Database ◽

10.1093/database/baz085 ◽

2019 ◽

Vol 2019 ◽

Author(s):

Peter Brown ◽

Aik-Choon Tan ◽

Mohamed A El-Esawi ◽

Thomas Liehr ◽

Oliver Blanck ◽

...

Keyword(s):

Literature Search ◽

Relevant Literature ◽

Biomedical Literature ◽

Medical Subject Headings ◽

Document Similarity ◽

Inverse Document Frequency ◽

Research Fields ◽

Experience Levels ◽

Document Frequency ◽

Systematic Biases

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Download Full-text

Measuring Pharmacogene Variant Function at Scale Using Multiplexed Assays

The Annual Review of Pharmacology and Toxicology ◽

10.1146/annurev-pharmtox-032221-085807 ◽

2021 ◽

Vol 62 (1) ◽

Author(s):

Renee C. Geck ◽

Gabriel Boyle ◽

Clara J. Amorosi ◽

Douglas M. Fowler ◽

Maitreya J. Dunham

Keyword(s):

Next Generation Sequencing ◽

Functional Data ◽

Genetic Variants ◽

Genetic Information ◽

Annual Review ◽

Publication Date ◽

Drug Selection ◽

Functional Consequences ◽

Pharmacology And Toxicology ◽

Generation Sequencing

As costs of next-generation sequencing decrease, identification of genetic variants has far outpaced our ability to understand their functional consequences. This lack of understanding is a central challenge to a key promise of pharmacogenomics: using genetic information to guide drug selection and dosing. Recently developed multiplexed assays of variant effect enable experimental measurement of the function of thousands of variants simultaneously. Here, we describe multiplexed assays that have been performed on nearly 25,000 variants in eight key pharmacogenes ( ADRB2, CYP2C9, CYP2C19, NUDT15, SLCO1B1, TMPT, VKORC1, and the LDLR promoter), discuss advances in experimental design, and explore key challenges that must be overcome to maximize the utility of multiplexed functional data. Expected final online publication date for the Annual Review of Pharmacology and Toxicology, Volume 62 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

PubMed Labs: an experimental system for improving biomedical literature search

Database ◽

10.1093/database/bay094 ◽

2018 ◽

Vol 2018 ◽

Cited By ~ 9

Author(s):

Nicolas Fiorini ◽

Kathi Canese ◽

Rostyslav Bryzgunov ◽

Ievgeniia Radetska ◽

Asta Gindulyte ◽

...

Keyword(s):

Literature Search ◽

Experimental System ◽

Biomedical Literature

Download Full-text

Google Scholar Could Be Used as a Stand-Alone Resource for Systematic Reviews

Evidence Based Library and Information Practice ◽

10.18438/b8k31f ◽

2015 ◽

Vol 10 (2) ◽

pp. 147

Author(s):

Saori Wendy Herman

Keyword(s):

Systematic Reviews ◽

Recall Rate ◽

Biomedical Literature ◽

Google Scholar ◽

Cochrane Library ◽

Bibliographic Database ◽

Predictive Values ◽

Advanced Search ◽

Search Tool ◽

Meta Analyses

A Review of: Gehanno, J. F., Rollin, L., & Darmoni, S. (2013). Is the coverage of Google Scholar enough to be used alone for systematic reviews. BMC Medical Informatics and Decision Making, 13(1): 7. doi: 10.1186/1472-6947-13-7 Abstract Objective – To determine if Google Scholar (GS) is sensitive enough to be used as the sole search tool for systematic reviews. Design – Citation analysis. Setting – Biomedical literature. Subjects – Original studies included in 29 systematic reviews published in the Cochrane Library or JAMA. Methods – The authors searched MEDLINE for any systematic reviews published in the 2008 and 2009 issues of JAMA or in the July 8, 2009 issue of the Cochrane Database of Systematic Reviews. They chose 29 systematic reviews for the study and included these reviews in a gold standard database created specifically for this project. The authors searched GS for the title of each of the original references for the 29 reviews. They computed and noted the recall of GS for each reference. Main Results – The authors searched GS for 738 original studies with a 100% recall rate. They also made a side discovery of a number of major errors in the bibliographic references. Conclusion – Researchers could use GS as a stand-alone database for systematic reviews or meta-analyses. With a couple improvements to the rate of positive predictive values and advanced search features, GS could become the leading medical bibliographic database. Conclusion – Researchers could use GS as a stand-alone database for systematic reviews or meta-analyses. With a couple improvements to the rate of positive predictive values and advanced search features, GS could become the leading medical bibliographic database.

Download Full-text

Assay of frequency and spectrum of genetic variants in TTN in healthy russian population

Translational Medicine ◽

10.18705/2311-4495-2021-8-5-29-37 ◽

2021 ◽

Vol 8 (5) ◽

pp. 29-37

Author(s):

Yu. A. Vakhrushev ◽

A. A. Kozyreva ◽

S. V. Zhuk ◽

O. P. Rotar ◽

A. A. Kostareva

Keyword(s):

Next Generation Sequencing ◽

Genetic Variants ◽

Sanger Sequencing ◽

Heart Diseases ◽

Genetic Research ◽

Russian Population ◽

Next Generation ◽

Data Bases ◽

International Data ◽

Generation Sequencing

Background. Gene TTN associated with all types of cardiomyopathy, however its large size (294 b.p.) warrants a lot of individual unique genetic variants or variants with low frequency, that aggravates their interpretation. Besides that nowadays there is no data about spectrum of variants in this gene in healthy Russian population. Recognition frequency and spectrum of variants in gene TTN in healthy Russian population will allow us to use it for interpretation results of molecular genetic research for patients with different heart pathology, and define prognosis for different heart diseases.Objective. Recognize frequency and spectrum of single nucleotide and truncating variants in gene TTN in healthy Russian population and compare it with international data bases, and evaluate level of pathogenicity these variants and their distributing across titin structure.Design and methods. 192 men in age 55,8±6,6 years were tested with next-generation sequencing. Identified genetic variants were confirmed by Sanger sequencing. Results. Allele missense variant frequency (with frequency less than 0.1%) in TTN in healthy Russian population amount to 15.1 %, and truncating variants — 0.52 %. 37,9 % of them were variants of unknown significance, 62 % — likely-benign and 0.1 % — benign. There was no pathological and likely-pathological variants. Identified genetic variants distributed throughout the titin structure.Conclusion. Received result is congruent с international data bases and researches. Expended laboratory method (Next generation sequencing and confirmation with Sanger sequencing) can be used both in clinical practice, and in creating data bases of genetic variants in healthy Russian population.

Download Full-text

Interactive Web-Based Resource for Annotation of Genetic Variants Causing Hereditary Angioedema (HADA): Database Development, Implementation, and Validation (Preprint)

10.2196/preprints.19040 ◽

2020 ◽

Author(s):

Alejandro Mendoza-Alvarez ◽

Adrián Muñoz-Barrera ◽

Luis Alberto Rubio-Rodríguez ◽

Itahisa Marcelino-Rodriguez ◽

Almudena Corrales ◽

...

Keyword(s):

Hereditary Angioedema ◽

Genetic Condition ◽

First Line ◽

International Consensus ◽

Web Based ◽

Genetics And Genomics ◽

User Friendly ◽

Current Guidelines ◽

Generation Sequencing ◽

Esterase Inhibitor

BACKGROUND Hereditary angioedema is a rare genetic condition caused by C1 esterase inhibitor deficiency, dysfunction, or kinin cascade dysregulation, leading to an increased bradykinin plasma concentration. Hereditary angioedema is a poorly recognized clinical entity and is very often misdiagnosed as a histaminergic angioedema. Despite its genetic nature, first-line genetic screening is not integrated in routine diagnosis. Consequently, a delay in the diagnosis, and inaccurate or incomplete diagnosis and treatment of hereditary angioedema are common. OBJECTIVE In agreement with recent recommendations from the International Consensus on the Use of Genetics in the Management of Hereditary Angioedema, to facilitate the clinical diagnosis and adapt it to the paradigm of precision medicine and next-generation sequencing–based genetic tests, we aimed to develop a genetic annotation tool, termed Hereditary Angioedema Database Annotation (HADA). METHODS HADA is built on top of a database of known variants affecting function, including precomputed pathogenic assessment of each variant and a ranked classification according to the current guidelines from the American College of Medical Genetics and Genomics. RESULTS HADA is provided as a freely accessible, user-friendly web-based interface with versatility for the entry of genetic information. The underlying database can also be incorporated into automated command-line stand-alone annotation tools. CONCLUSIONS HADA can achieve the rapid detection of variants affecting function for different hereditary angioedema types, and further integrates useful information to reduce the diagnosis odyssey and improve its delay.

Download Full-text

Association of genetic variants in enamel-formation genes with dental caries: A meta- and gene-cluster analysis

10.1101/2020.09.19.20198044 ◽

2020 ◽

Author(s):

Xueyan Li ◽

DI LIU ◽

Sun Yang ◽

Jingyun Yang ◽

Youcheng Yu

Keyword(s):

Cluster Analysis ◽

Dental Caries ◽

Gene Cluster ◽

Genetic Variants ◽

Literature Search ◽

Genetic Variant ◽

Meta Analysis ◽

Cochrane Library ◽

Systematic Literature Search ◽

Enamel Formation

Previous studies have reported the association between multiple genetic variants in enamel formation-related genes and the risk of dental caries with inconsistent results. We performed a systematic literature search of the PubMed, Cochrane Library, HuGE and Google Scholar databases for studies published before March 21, 2020 and conducted meta-, gene-based and gene-cluster analysis on the association between genetic variants in enamel- formation-related genes and the risk of dental caries. Our systematic literature search identified 21 relevant publications including a total of 24 studies for analysis. The genetic variant rs17878486 in AMELX was significantly associated with dental caries risk (OR=1.40, 95% CI: 1.02-1.93, P=0.037). We found no significant association between the risk of dental caries with rs12640848 in ENAM (OR=1.15, 95% CI: 0.88-1.52, P=0.310), rs1784418 in MMP20 (OR=1.07, 95% CI: 0.76-1.49, P=0.702) and rs3796704 in ENAM (OR=1.06, 95% CI: 0.96-1.17, P=0.228). Gene-based analysis indicated that multiple genetic variants in AMELX showed joint association with the risk of dental caries (6 variants; P<10-5), so did genetic variants in MMP13 (3 variants; P=0.004), MMP2 (3 variants; P<10-5), MMP20 (2 variants; P<10-5) and MMP3 (2 variants; P<10-5). The gene-cluster analysis indicated a significant association between the genetic variants in this enamel-formation gene cluster and the risk of dental caries (P<10-5). The present meta-analysis revealed that genetic variant rs17878486 in AMELX were associated with dental caries, and multiple genetic variants in enamel-formation-related genes jointly contribute to the risk of dental caries, supporting the role of genetic variants in the enamel-formation genes in the etiology of dental caries.

Download Full-text