scholarly journals variant2literature: full text literature search for genetic variants

2019 ◽  
Author(s):  
Yin-Hung Lin ◽  
Yu-Chen Lu ◽  
Ting-Fu Chen ◽  
Jacob Shujui Hsu ◽  
Ko-Han Lee ◽  
...  

AbstractMotivationWhole genome sequencing (WGS) by next-generation sequencing produces millions of variants for an individual. The retrieval of biomedical literature for such a large number of genetic variants remains challenging, because in many cases the variants are only present in tables as images, or in the supplementary documents of which the file formats are diverse.ResultsThe proposed tool named variant2literature from the TaiGenomics (Toolkits for AI genomics) resolves the problem by incorporating text recognition with image processing. In addition to the adoption of advanced image-based text retrieval, the recall rate of finding the literature containing the variants of interest is further improved by employing the skill of variant normalization. Different variant presentations are transformed into chromosome coordinates (standard VCF format) such that false negatives can be largely avoided. variant2literature is available in two ways. First, a web-based interface is provided to search all the literature in PMC Open Access Subset. Second, the command-line executable can be downloaded such that the users are free to search all the files in a specified directory locally.Availabilityhttp://variant2literature.taigenomics.com/[email protected]

Author(s):  
Hyungtaek Jung ◽  
Brendan Jeon ◽  
Daniel Ortiz-Barrientos

Storing and manipulating Next Generation Sequencing (NGS) file formats for understanding biological phenomena is an essential but difficult task in the life sciences. Yet, most methods for analysing NGS data require complex command-line tools in high-performance computing (HPC) or web-based servers and have not yet been implemented in comprehensive, easy-to-use software. Here we present easyfm (easy file manipulation), a free standalone Graphical User Interface (GUI) software with Python support that can be used to facilitate the rapid discovery of target sequences (or user’s interest) in NGS datasets for novice users (more accessible to biologists). It enables them to perform end-to-end reproducible data analyses using a desktop application (Windows, Mac and Linux). Unlike existing tools, the GUI-based easyfm is not dependent on any HPC system and can be operated without an internet connection. For user-friendliness and convenience, easyfm was developed with four work modules and a secondary GUI window, covering different aspects of NGS data analysis, including post-processing, filtering, format conversion, generating results, real-time log, and help. In combination with the executable tools (BLAST+ and BLAT) and Python, easyfm allows the user to set analysis parameters, select/extract regions of interest, examine the input and output results, and convert to a wide range of file formats. To help augment the functionality of existing web-based and command-line tools, easyfm, a self-contained program, comes with extensive documentation (https://github.com/TaekAndBrendan/easyfm). This specific benefit allows easyfm to seamlessly integrate visual and interactive representations of NGS files, supporting a wider scope of bioinformatics applications in the life sciences.


Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Peter Brown ◽  
Aik-Choon Tan ◽  
Mohamed A El-Esawi ◽  
Thomas Liehr ◽  
Oliver Blanck ◽  
...  

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.


Author(s):  
Renee C. Geck ◽  
Gabriel Boyle ◽  
Clara J. Amorosi ◽  
Douglas M. Fowler ◽  
Maitreya J. Dunham

As costs of next-generation sequencing decrease, identification of genetic variants has far outpaced our ability to understand their functional consequences. This lack of understanding is a central challenge to a key promise of pharmacogenomics: using genetic information to guide drug selection and dosing. Recently developed multiplexed assays of variant effect enable experimental measurement of the function of thousands of variants simultaneously. Here, we describe multiplexed assays that have been performed on nearly 25,000 variants in eight key pharmacogenes ( ADRB2, CYP2C9, CYP2C19, NUDT15, SLCO1B1, TMPT, VKORC1, and the LDLR promoter), discuss advances in experimental design, and explore key challenges that must be overcome to maximize the utility of multiplexed functional data. Expected final online publication date for the Annual Review of Pharmacology and Toxicology, Volume 62 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


Database ◽  
2018 ◽  
Vol 2018 ◽  
Author(s):  
Nicolas Fiorini ◽  
Kathi Canese ◽  
Rostyslav Bryzgunov ◽  
Ievgeniia Radetska ◽  
Asta Gindulyte ◽  
...  

2015 ◽  
Vol 10 (2) ◽  
pp. 147
Author(s):  
Saori Wendy Herman

A Review of: Gehanno, J. F., Rollin, L., & Darmoni, S. (2013). Is the coverage of Google Scholar enough to be used alone for systematic reviews. BMC Medical Informatics and Decision Making, 13(1): 7. doi: 10.1186/1472-6947-13-7 Abstract Objective – To determine if Google Scholar (GS) is sensitive enough to be used as the sole search tool for systematic reviews. Design – Citation analysis. Setting – Biomedical literature. Subjects – Original studies included in 29 systematic reviews published in the Cochrane Library or JAMA. Methods – The authors searched MEDLINE for any systematic reviews published in the 2008 and 2009 issues of JAMA or in the July 8, 2009 issue of the Cochrane Database of Systematic Reviews. They chose 29 systematic reviews for the study and included these reviews in a gold standard database created specifically for this project. The authors searched GS for the title of each of the original references for the 29 reviews. They computed and noted the recall of GS for each reference. Main Results – The authors searched GS for 738 original studies with a 100% recall rate. They also made a side discovery of a number of major errors in the bibliographic references. Conclusion – Researchers could use GS as a stand-alone database for systematic reviews or meta-analyses. With a couple improvements to the rate of positive predictive values and advanced search features, GS could become the leading medical bibliographic database. Conclusion – Researchers could use GS as a stand-alone database for systematic reviews or meta-analyses. With a couple improvements to the rate of positive predictive values and advanced search features, GS could become the leading medical bibliographic database.


2021 ◽  
Vol 8 (5) ◽  
pp. 29-37
Author(s):  
Yu. A. Vakhrushev ◽  
A. A. Kozyreva ◽  
S. V. Zhuk ◽  
O. P. Rotar ◽  
A. A. Kostareva

Background. Gene TTN associated with all types of cardiomyopathy, however its large size (294 b.p.) warrants a lot of individual unique genetic variants or variants with low frequency, that aggravates their interpretation. Besides that nowadays there is no data about spectrum of variants in this gene in healthy Russian population. Recognition frequency and spectrum of variants in gene TTN in healthy Russian population will allow us to use it for interpretation results of molecular genetic research for patients with different heart pathology, and define prognosis for different heart diseases.Objective. Recognize frequency and spectrum of single nucleotide and truncating variants in gene TTN in healthy Russian population and compare it with international data bases, and evaluate level of pathogenicity these variants and their distributing across titin structure.Design and methods. 192 men in age 55,8±6,6 years were tested with next-generation sequencing. Identified genetic variants were confirmed by Sanger sequencing. Results. Allele missense variant frequency (with frequency less than 0.1%) in TTN in healthy Russian population amount to 15.1 %, and truncating variants — 0.52 %. 37,9 % of them were variants of unknown significance, 62 % — likely-benign and 0.1 % — benign. There was no pathological and likely-pathological variants. Identified genetic variants distributed throughout the titin structure.Conclusion. Received result is congruent с international data bases and researches. Expended laboratory method (Next generation sequencing and confirmation with Sanger sequencing) can be used both in clinical practice, and in creating data bases of genetic variants in healthy Russian population.


2020 ◽  
Author(s):  
Alejandro Mendoza-Alvarez ◽  
Adrián Muñoz-Barrera ◽  
Luis Alberto Rubio-Rodríguez ◽  
Itahisa Marcelino-Rodriguez ◽  
Almudena Corrales ◽  
...  

BACKGROUND Hereditary angioedema is a rare genetic condition caused by C1 esterase inhibitor deficiency, dysfunction, or kinin cascade dysregulation, leading to an increased bradykinin plasma concentration. Hereditary angioedema is a poorly recognized clinical entity and is very often misdiagnosed as a histaminergic angioedema. Despite its genetic nature, first-line genetic screening is not integrated in routine diagnosis. Consequently, a delay in the diagnosis, and inaccurate or incomplete diagnosis and treatment of hereditary angioedema are common. OBJECTIVE In agreement with recent recommendations from the International Consensus on the Use of Genetics in the Management of Hereditary Angioedema, to facilitate the clinical diagnosis and adapt it to the paradigm of precision medicine and next-generation sequencing–based genetic tests, we aimed to develop a genetic annotation tool, termed Hereditary Angioedema Database Annotation (HADA). METHODS HADA is built on top of a database of known variants affecting function, including precomputed pathogenic assessment of each variant and a ranked classification according to the current guidelines from the American College of Medical Genetics and Genomics. RESULTS HADA is provided as a freely accessible, user-friendly web-based interface with versatility for the entry of genetic information. The underlying database can also be incorporated into automated command-line stand-alone annotation tools. CONCLUSIONS HADA can achieve the rapid detection of variants affecting function for different hereditary angioedema types, and further integrates useful information to reduce the diagnosis odyssey and improve its delay.


2020 ◽  
Author(s):  
Xueyan Li ◽  
DI LIU ◽  
Sun Yang ◽  
Jingyun Yang ◽  
Youcheng Yu

Previous studies have reported the association between multiple genetic variants in enamel formation-related genes and the risk of dental caries with inconsistent results. We performed a systematic literature search of the PubMed, Cochrane Library, HuGE and Google Scholar databases for studies published before March 21, 2020 and conducted meta-, gene-based and gene-cluster analysis on the association between genetic variants in enamel- formation-related genes and the risk of dental caries. Our systematic literature search identified 21 relevant publications including a total of 24 studies for analysis. The genetic variant rs17878486 in AMELX was significantly associated with dental caries risk (OR=1.40, 95% CI: 1.02-1.93, P=0.037). We found no significant association between the risk of dental caries with rs12640848 in ENAM (OR=1.15, 95% CI: 0.88-1.52, P=0.310), rs1784418 in MMP20 (OR=1.07, 95% CI: 0.76-1.49, P=0.702) and rs3796704 in ENAM (OR=1.06, 95% CI: 0.96-1.17, P=0.228). Gene-based analysis indicated that multiple genetic variants in AMELX showed joint association with the risk of dental caries (6 variants; P<10-5), so did genetic variants in MMP13 (3 variants; P=0.004), MMP2 (3 variants; P<10-5), MMP20 (2 variants; P<10-5) and MMP3 (2 variants; P<10-5). The gene-cluster analysis indicated a significant association between the genetic variants in this enamel-formation gene cluster and the risk of dental caries (P<10-5). The present meta-analysis revealed that genetic variant rs17878486 in AMELX were associated with dental caries, and multiple genetic variants in enamel-formation-related genes jointly contribute to the risk of dental caries, supporting the role of genetic variants in the enamel-formation genes in the etiology of dental caries.


Sign in / Sign up

Export Citation Format

Share Document