Review of pathogenic gene prediction based on Text Mining

Youliang Huang; Fengying Guo; Xing Zhai; Renquan Liu

doi:10.25196/adcp20178

Review of pathogenic gene prediction based on Text Mining

Advances in Disease Control and Prevention ◽

10.25196/adcp20178 ◽

2017 ◽

Vol 2 (1) ◽

pp. 27

Author(s):

Youliang Huang ◽

Fengying Guo ◽

Xing Zhai ◽

Renquan Liu

Keyword(s):

Text Mining ◽

Early Stage ◽

Gene Prediction ◽

Genetic Diseases ◽

Meaningful Work ◽

Difficult Problem ◽

Prediction Tools ◽

Research Task ◽

Biomedical Industry ◽

Research Questions

Background: Whether it is a single genetic disease, multiple genetic diseases or acquired genetic diseases, the discovery and prediction of disease-causing genes is a difficult problem in biomedical industry. It has also become an important research task in biomedical text mining. Methods: In the early stage of the prediction of disease-causing genes, the researchers predicted two major methods of linkage analysis and association analysis. Conclusion: The prediction of disease-causing genes can be predicted by using different prediction methods and different bioinformatics data according to the different problems of research questions and the different situations of concern. At present, the researchers have developed a number of related prediction tools to help understand, detect and predict disease-causing genes and pathogenesis of disease and other issues. Prediction of disease-causing genes is still a difficult and meaningful work.

Download Full-text

GPRED-GC: a Gene PREDiction model accounting for 5 ′- 3′ GC gradient

BMC Bioinformatics ◽

10.1186/s12859-019-3047-3 ◽

2019 ◽

Vol 20 (S15) ◽

Cited By ~ 1

Author(s):

Prapaporn Techa-Angkoon ◽

Kevin L. Childs ◽

Yanni Sun

Keyword(s):

Ab Initio ◽

Gene Annotation ◽

Gene Prediction ◽

Source Code ◽

Gc Content ◽

Prediction Tools ◽

Homologous Sequences ◽

Manual Intervention ◽

Grass Genomes ◽

Gc Contents

Abstract Background Gene is a key step in genome annotation. Ab initio gene prediction enables gene annotation of new genomes regardless of availability of homologous sequences. There exist a number of ab initio gene prediction tools and they have been widely used for gene annotation for various species. However, existing tools are not optimized for identifying genes with highly variable GC content. In addition, some genes in grass genomes exhibit a sharp 5 ′- 3′ decreasing GC content gradient, which is not carefully modeled by available gene prediction tools. Thus, there is still room to improve the sensitivity and accuracy for predicting genes with GC gradients. Results In this work, we designed and implemented a new hidden Markov model (HMM)-based ab initio gene prediction tool, which is optimized for finding genes with highly variable GC contents, such as the genes with negative GC gradients in grass genomes. We tested the tool on three datasets from Arabidopsis thaliana and Oryza sativa. The results showed that our tool can identify genes missed by existing tools due to the highly variable GC contents. Conclusions GPRED-GC can effectively predict genes with highly variable GC contents without manual intervention. It provides a useful complementary tool to existing ones such as Augustus for more sensitive gene discovery. The source code is freely available at https://sourceforge.net/projects/gpred-gc/.

Download Full-text

A systematic review of text mining approaches applied to various application areas in the biomedical domain

Journal of Knowledge Management ◽

10.1108/jkm-09-2019-0524 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Sudha Cheerkoot-Jalim ◽

Kavi Kumar Khedo

Keyword(s):

Social Media ◽

Text Mining ◽

Biomedical Application ◽

Data Sources ◽

Biomedical Text ◽

Biomedical Text Mining ◽

Biomedical Domain ◽

Content Type ◽

Health Related ◽

Research Questions

Purpose This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed. Design/methodology/approach The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted. Findings It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums. Originality/value To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.

Download Full-text

Constructing a database for the relations between CNV and human genetic diseases via systematic text mining

BMC Bioinformatics ◽

10.1186/s12859-018-2526-2 ◽

2018 ◽

Vol 19 (S19) ◽

Cited By ~ 4

Author(s):

Xi Yang ◽

Zhuo Song ◽

Chengkun Wu ◽

Wei Wang ◽

Gen Li ◽

...

Keyword(s):

Text Mining ◽

Genetic Diseases

Download Full-text

ArtiFuse—computational validation of fusion gene detection tools without relying on simulated reads

Bioinformatics ◽

10.1093/bioinformatics/btz613 ◽

2019 ◽

Author(s):

Patrick Sorn ◽

Christoph Holtsträter ◽

Martin Löwer ◽

Ugur Sahin ◽

David Weber

Keyword(s):

Fusion Gene ◽

Gene Prediction ◽

Supplementary Information ◽

Fusion Genes ◽

Rna Seq ◽

High Coverage ◽

Prediction Tools ◽

Novel Approach ◽

Tool Performance ◽

Transcriptional Variants

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset without the need for any simulated reads. We demonstrate our approach on eight RNA-seq datasets for three fusion gene prediction tools: average recall values peak for all three tools between 0.4 and 0.56 for high-quality and high-coverage datasets. As ArtiFuse affords total control over involved genes and breakpoint position, we also assessed performance with regard to gene-related properties, showing a drop-in recall value for low-expressed genes in high-coverage samples and genes with co-expressed paralogues. Overall tool performance assessed from ArtiFusions is lower compared to previously reported estimates on simulated reads. Due to the use of real RNA-seq datasets, we believe that ArtiFuse provides a more realistic benchmark that can be used to develop more accurate fusion gene prediction tools for application in clinical settings. Availability and implementation ArtiFuse is implemented in Python. The source code and documentation are available at https://github.com/TRON-Bioinformatics/ArtiFusion. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Text Mining for Big Data Analysis in Financial Sector: A Literature Review

Sustainability ◽

10.3390/su11051277 ◽

2019 ◽

Vol 11 (5) ◽

pp. 1277 ◽

Cited By ~ 18

Author(s):

Mirjana Pejić Bach ◽

Živko Krstić ◽

Sanja Seljan ◽

Lejla Turulja

Keyword(s):

Big Data ◽

Text Mining ◽

Literature Review ◽

Financial Sector ◽

Structured Data ◽

Decision Making Process ◽

Strong Impact ◽

Big Data Technologies ◽

Research Questions ◽

Data Investigation

Big data technologies have a strong impact on different industries, starting from the last decade, which continues nowadays, with the tendency to become omnipresent. The financial sector, as most of the other sectors, concentrated their operating activities mostly on structured data investigation. However, with the support of big data technologies, information stored in diverse sources of semi-structured and unstructured data could be harvested. Recent research and practice indicate that such information can be interesting for the decision-making process. Questions about how and to what extent research on data mining in the financial sector has developed and which tools are used for these purposes remains largely unexplored. This study aims to answer three research questions: (i) What is the intellectual core of the field? (ii) Which techniques are used in the financial sector for textual mining, especially in the era of the Internet, big data, and social media? (iii) Which data sources are the most often used for text mining in the financial sector, and for which purposes? In order to answer these questions, a qualitative analysis of literature is carried out using a systematic literature review, citation and co-citation analysis.

Download Full-text

Gender-Responsive Financing of Education in Punjab

Journal of Education ◽

10.1177/00220574211031971 ◽

2021 ◽

pp. 002205742110319

Author(s):

Rabia Manzoor ◽

Rabia Tabbasum ◽

Vaqar Ahmed ◽

Junaid Zahid ◽

Shujaat Ahmed Syed

Keyword(s):

Early Stage ◽

Gender Disparities ◽

Education Sector ◽

Budget Process ◽

Gender Analysis ◽

The Public ◽

Key Informant ◽

Research Questions ◽

Secondary Information ◽

Gender Lens

This study provides a gender analysis of public sector budgets in education sector of Punjab for the period of 2016 to 2018 from the preprimary to secondary level. The research methodology is based on review of secondary information and data, key informant interviews, stakeholder consultations, and a review of budgetary process. It helps in systematically approaching our research questions. The study finds gender disparities in budgets for the public education sector with key focus on reconfiguration of budget process. The gender lens should be introduced at a very early stage where budget call circulars are being sent to the departments concerned.

Download Full-text

Comparative Analysis of Gene Prediction Tools: RAST, Genmark hmm and AMIgene

International Journal of Engineering Trends and Technology ◽

10.14445/22315381/ijett-v43p238 ◽

2017 ◽

Vol 43 (4) ◽

pp. 234-237

Author(s):

Chander Jyoti ◽

Sandeep Saini ◽

Varinder Kumar ◽

Kajal Abrol ◽

Kanchan Pandey ◽

...

Keyword(s):

Comparative Analysis ◽

Gene Prediction ◽

Prediction Tools

Download Full-text

The Study on the Control of the Early-Stage Crack of the Concrete Poured in Winter in Shenyang Subway

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.671-674.1135 ◽

2013 ◽

Vol 671-674 ◽

pp. 1135-1139 ◽

Cited By ~ 2

Author(s):

Li Han ◽

Wen Zhao ◽

Yu Zhao

Keyword(s):

Temperature Stress ◽

Civil Engineering ◽

Early Stage ◽

Difficult Problem ◽

Construction Quality ◽

Concrete Crack ◽

Combination Of Methods

The control of concrete crack is a complicated and difficult problem for civil engineering. It is more difficult to control the crack of the concrete poured in winter. Thorough the analysis of the construction of Shenyang subway in winter, a combination of methods are applied. Frost proof and decreasing the temperature stress of concrete are considered together. The combination of antifreezing measures and decreasing the temperature stress of concrete make the construction quality arrive a satisfying level.

Download Full-text

DIFFERENT GENOMIC SIGNAL PROCESSING METHODS FOR EUKARYOTIC GENE PREDICTION: A SYSTEMATIC REVIEW

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237217300012 ◽

2017 ◽

Vol 29 (01) ◽

pp. 1730001 ◽

Cited By ~ 4

Author(s):

Mai S. Mabrouk ◽

Safaa M. Naeem ◽

Mohamed A. Eldosoky

Keyword(s):

Systematic Review ◽

Signal Processing ◽

Cancer Detection ◽

Dna Sequences ◽

Early Stage ◽

Gene Prediction ◽

Malignant Neoplasm ◽

Biological Information ◽

Genomic Signal Processing ◽

Genomic Signal

Bioinformatics field has now solidly settled itself as a control in molecular biology and incorporates an extensive variety of branches of knowledge from structural biology, genomics to gene expression studies. Bioinformatics is the application of computer technology to the management of biological information. Genomic signal processing (GSP) techniques have been connected most all around in bioinformatics and will keep on assuming an essential part in the investigation of biomedical issues. GSP refers to using the digital signal processing (DSP) methods for genomic data (e.g. DNA sequences) analysis. Recently, applications of GSP in bioinformatics have obtained great consideration such as identification of DNA protein coding regions, identification of reading frames, cancer detection and others. Cancer is one of the most dangerous diseases that the world faces and has raised the death rate in recent years, it is known medically as malignant neoplasm, so detection of it at the early stage can yield a promising approach to determine and take actions to treat with this risk. GSP is a method which can be used to detect the cancerous cells that are often caused due to genetic abnormality. This systematic review discusses some of the GSP applications in bioinformatics generally. The GSP techniques, used for cancer detection especially, are presented to collect the recent results and what has been reached at this point to be a new subject of research.

Download Full-text

High-Dimensional Single-Cell Transcriptomics in Melanoma and Cancer Immunotherapy

Genes ◽

10.3390/genes12101629 ◽

2021 ◽

Vol 12 (10) ◽

pp. 1629

Author(s):

Camelia Quek ◽

Xinyu Bai ◽

Georgina V. Long ◽

Richard A. Scolyer ◽

James S. Wilmott

Keyword(s):

Single Cell ◽

Patient Outcomes ◽

Checkpoint Inhibitors ◽

Early Stage ◽

Tumour Microenvironment ◽

Advanced Melanoma ◽

High Dimensional ◽

Clinical Settings ◽

Research Questions ◽

Cellular Phenotypes

Recent advances in single-cell transcriptomics have greatly improved knowledge of complex transcriptional programs, rapidly expanding our knowledge of cellular phenotypes and functions within the tumour microenvironment and immune system. Several new single-cell technologies have been developed over recent years that have enabled expanded understanding of the mechanistic cells and biological pathways targeted by immunotherapies such as immune checkpoint inhibitors, which are now routinely used in patient management with high-risk early-stage or advanced melanoma. These technologies have method-specific strengths, weaknesses and capabilities which need to be considered when utilising them to answer translational research questions. Here, we provide guidance for the implementation of single-cell transcriptomic analysis platforms by reviewing the currently available experimental and analysis workflows. We then highlight the use of these technologies to dissect the tumour microenvironment in the context of cancer patients treated with immunotherapy. The strategic use of single-cell analytics in clinical settings are discussed and potential future opportunities are explored with a focus on their use to rationalise the design of novel immunotherapeutic drug therapies that will ultimately lead to improved cancer patient outcomes.

Download Full-text