Mian: Interactive Web-Based 16S rRNA Operational Taxonomic Unit Table Data Visualization and Discovery Platform

Mapping Intimacies ◽

10.1101/416073 ◽

2018 ◽

Author(s):

Boyang Tom Jin

Keyword(s):

Feature Selection ◽

Operational Taxonomic Unit ◽

Host Responses ◽

Exact Test ◽

Alpha And Beta Diversity ◽

Sequencing Technologies ◽

Correlation Networks ◽

Statistical Measures ◽

Table Data ◽

Taxonomic Groups

ABSTRACTIn recent years, there has been strong interest in examining the microbiome and its impact on human health and the environment. By leveraging modern sequencing technologies, investigators can quickly determine the composition of a given microbial sample. At the same time, the same investigations often yield an array of categorical and numerical metadata derived from the sequenced samples such as immunohistochemical measures or locality information. Understanding how the microbiome data is associated with this external metadata is essential in developing targeted treatments for chronic diseases or proposing bacteria-modulated host responses. While many R or Python libraries and command-line tools have been developed for specific analysis purposes, there are still relatively few tools to facilitate open-ended data exploration and hypothesis generation. Here we introduce Mian, an open-source web framework to interactively visualize or run a suite of statistical and feature selection tools on the microbiome to identify important taxonomic groups in the context of any provided categorical or numerical metadata. Visualizations include boxplots, correlation networks, and PCA or NMDS scatterplots. Tools include Fisher’s Exact Test, Boruta feature selection, alpha and beta diversity, and differential and correlational analysis. Mian supports multiple standard representations of the OTU table as input and optionally subsamples the data during the upload process. Users can also filter and aggregate the OTU table at different taxonomic levels and dynamically adjust analysis parameters to see how the visualizations, results, and statistical measures change in real-time. Mian is freely available at: miandata.org

Download Full-text

DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbaa039 ◽

2020 ◽

Cited By ~ 3

Author(s):

Yu Zhang ◽

Cangzhi Jia ◽

Melissa Jane Fullwood ◽

Chee Keong Kwoh

Keyword(s):

Neural Network ◽

Feature Selection ◽

Deep Neural Network ◽

Feature Selection Method ◽

Classification Problem ◽

Open Reading Frames ◽

Nucleotide Bias ◽

Sequencing Technologies ◽

Type Data ◽

Coding Potential

Abstract The development of deep sequencing technologies has led to the discovery of novel transcripts. Many in silico methods have been developed to assess the coding potential of these transcripts to further investigate their functions. Existing methods perform well on distinguishing majority long noncoding RNAs (lncRNAs) and coding RNAs (mRNAs) but poorly on RNAs with small open reading frames (sORFs). Here, we present DeepCPP (deep neural network for coding potential prediction), a deep learning method for RNA coding potential prediction. Extensive evaluations on four previous datasets and six new datasets constructed in different species show that DeepCPP outperforms other state-of-the-art methods, especially on sORF type data, which overcomes the bottleneck of sORF mRNA identification by improving more than 4.31, 37.24 and 5.89% on its accuracy for newly discovered human, vertebrate and insect data, respectively. Additionally, we also revealed that discontinuous k-mer, and our newly proposed nucleotide bias and minimal distribution similarity feature selection method play crucial roles in this classification problem. Taken together, DeepCPP is an effective method for RNA coding potential prediction.

Download Full-text

Feature selection based on community detection in feature correlation networks

Computing ◽

10.1007/s00607-019-00705-8 ◽

2019 ◽

Vol 101 (10) ◽

pp. 1513-1538

Author(s):

Miloš Savić ◽

Vladimir Kurbalija ◽

Zoran Bosnić ◽

Mirjana Ivanović

Keyword(s):

Feature Selection ◽

Community Detection ◽

Correlation Networks ◽

Feature Correlation

Download Full-text

Effect of probiotic on innate inflammatory response and viral shedding in experimental rhinovirus infection – a randomised controlled trial

Beneficial Microbes ◽

10.3920/bm2016.0160 ◽

2017 ◽

Vol 8 (2) ◽

pp. 207-215 ◽

Cited By ~ 19

Author(s):

R.B. Turner ◽

J.A. Woodfolk ◽

L. Borish ◽

J.W. Steinke ◽

J.T. Patrie ◽

...

Keyword(s):

Controlled Trial ◽

Treated Group ◽

Subjective Symptom ◽

Nasal Lavage ◽

Host Responses ◽

Controlled Study ◽

Fisher Exact Test ◽

Exact Test ◽

Virus Challenge ◽

Rhinovirus Infection

Ingestion of probiotics appears to have modest effects on the incidence of viral respiratory infection. The mechanism of these effects is not clear; however, there is evidence from animal models that the probiotic may have an effect on innate immune responses to pathogens. The purpose of this randomised, placebo-controlled study was to determine the effect of administration of Bifidobacterium animalis subspecies lactis Bl-04 on innate and adaptive host responses to experimental rhinovirus challenge. The effect on the response of chemokine (C-X-C motif) ligand 8 (CXCL8) to rhinovirus infection was defined as the primary endpoint for the study. 152 seronegative volunteers who had been supplemented for 28 days, 73 with probiotic and 79 with placebo, were challenged with RV-A39. Supplement or placebo administration was then continued for five days during collection of specimens for assessment of host response, infection, and symptoms. 58 probiotic and 57 placebo-supplemented volunteers met protocol-defined criteria for analysis. Probiotic resulted in higher nasal lavage CXCL8 on day 0 prior to virus challenge (90 vs 58 pg/ml, respectively, P=0.04, ANCOVA). The CXCL8 response to rhinovirus infection in nasal lavage was significantly reduced in the probiotic treated group (P=0.03, ANCOVA). Probiotic was also associated with a reduction in nasal lavage virus titre and the proportion of subjects shedding virus in nasal secretions (76% in the probiotic group, 91% in the placebo group, P=0.04, Fisher Exact test). The administration of probiotic did not influence lower respiratory inflammation (assessed by exhaled nitric oxide), subjective symptom scores, or infection rate. This study demonstrates that ingestion of Bl-04 may have an effect on the baseline state of innate immunity in the nose and on the subsequent response of the human host to rhinovirus infection. Clinicaltrials.gov registry number: NCT01669603.

Download Full-text

Fungi from the Rhynie chert: a view from the dark side

Transactions of the Royal Society of Edinburgh Earth Sciences ◽

10.1017/s026359330000081x ◽

2003 ◽

Vol 94 (4) ◽

pp. 457-473 ◽

Cited By ~ 96

Author(s):

T. N. Taylor ◽

S. D. Klavins ◽

M. Krings ◽

E. L. Taylor ◽

H. Kerp ◽

...

Keyword(s):

Fossil Record ◽

Molecular Data ◽

Host Responses ◽

Dark Side ◽

Early Devonian ◽

Mycorrhizal Associations ◽

Rhynie Chert ◽

Complex Ecosystem ◽

Taxonomic Groups ◽

Fungal Interactions

ABSTRACTThe exquisite preservation of organisms in the Early Devonian Rhynie chert ecosystem has permitted the documentation of the morphology and life history biology of fungi belonging to several major taxonomic groups (e.g., Chytridiomycota, Ascomycota, Glomeromycota). The Rhynie chert also provides the first unequivocal evidence in the fossil record of fungal interactions that can in turn be compared with those in modern ecosystems. These interactions in the Rhynie chert involve both green algae and macroplants, with examples of saprophytism, parasitism, and mutualism, including the earliest mycorrhizal associations and lichen symbiosis known to date in the fossil record. Especially significant are several types of specific host responses to fungal infection that indicate that these plants had already evolved methods of defence similar and perhaps analogous to those of extant plants. This suggests that mechanisms underlying the establishment and sustenance of associations of fungi with land plants were well in place prior to the Early Devonian. In addition, a more complete understanding of the microbial organisms involved in this complex ecosystem can also provide calibration points for phylogenies based on molecular data analysis. The richness of the microbial community in the Rhynie chert holds tremendous potential for documenting additional fungal groups, which permits speculation about further interactions with abiotic and biotic components of the environment.

Download Full-text

On the Stability of Feature Selection Methods in Software Quality Prediction: An Empirical Investigation

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015400288 ◽

2015 ◽

Vol 25 (09n10) ◽

pp. 1467-1490 ◽

Cited By ~ 7

Author(s):

Huanjing Wang ◽

Taghi M. Khoshgoftaar ◽

Naeem Seliya

Keyword(s):

Feature Selection ◽

Software Quality ◽

Software Metrics ◽

Subset Selection ◽

Feature Subset ◽

Selection Methods ◽

Statistical Measures ◽

Quality Modeling ◽

The Stability ◽

Stable Feature

Software quality modeling is the process of using software metrics from previous iterations of development to locate potentially faulty modules in current under-development code. This has become an important part of the software development process, allowing practitioners to focus development efforts where they are most needed. One difficulty encountered in software quality modeling is the problem of high dimensionality, where the number of available software metrics is too large for a classifier to work well. In this case, many of the metrics may be redundant or irrelevant to defect prediction results, thereby selecting a subset of software metrics that are the best predictors becomes important. This process is called feature (metric) selection. There are three major forms of feature selection: filter-based feature rankers, which uses statistical measures to assign a score to each feature and present the user with a ranked list; filter-based feature subset evaluation, which uses statistical measures on feature subsets to find the best feature subset; and wrapper-based subset selection, which builds classification models using different subsets to find the one which maximizes performance. Software practitioners are interested in which feature selection methods are best at providing the most stable feature subset in the face of changes to the data (here, the addition or removal of instances). In this study we select feature subsets using fifteen feature selection methods and then use our newly proposed Average Pairwise Tanimoto Index (APTI) to evaluate the stability of the feature selection methods. We evaluate the stability of feature selection methods on a pair of subsamples generated by our fixed-overlap partitions algorithm. Four different levels of overlap are considered in this study. 13 software metric datasets from two real-world software projects are used in this study. Results demonstrate that ReliefF (RF) is the most stable feature selection method and wrapper based feature subset selection shows least stability. In addition, as the overlap of partitions increased, the stability of the feature selection strategies increased.

Download Full-text

Transcript-Specific Loss-of-Function Variants in VPS16 Are Enriched in Patients With Dystonia

Neurology Genetics ◽

10.1212/nxg.0000000000000644 ◽

2021 ◽

Vol 8 (1) ◽

pp. e644

Author(s):

Joohyun Park ◽

Annemarie Reilaender ◽

Jan N. Petry-Schmelzer ◽

Petra Stöbe ◽

Isabell Cordts ◽

...

Keyword(s):

Fisher Exact Test ◽

Data Sets ◽

Variant Interpretation ◽

Loss Of Function ◽

Splice Isoforms ◽

Exact Test ◽

Expression Levels ◽

Genome Data ◽

Statistical Measures ◽

Clinical Description

Background and ObjectivesOur objective was to improve rare variant interpretation using statistical measures as well as publicly accessible annotation of expression levels and tissue specificity of different splice isoforms. We describe rare VPS16 variants observed in patients with dystonia and patients without dystonia, elaborate on our interpretation of VPS16 variants affecting different transcripts, and provide detailed clinical description of the movement disorder caused by VPS16 variants.MethodsIn-house exome and genome data sets (n = 11,539) were screened for rare heterozygous missense and putative loss-of-function (pLoF) variants in VPS16. Using pext (proportion expressed across transcripts) values from the Genome Aggregation Database (gnomAD), we differentiated variants affecting weakly and highly expressed exons/transcripts and applied statistical measures to systematically identify disease-associated genetic variation among patients with dystonia (n = 280).ResultsSix different heterozygous pLoFs in VPS16 transcripts were identified in 13 individuals. Three of these pLoFs occurred in 9 individuals with different phenotypes, and 3 pLoFs were identified in 4 unrelated individuals with early-onset dystonia. Although pLoFs were enriched in the dystonia cohort (n = 280; p = 2.04 × 10−4; 4/280 cases vs 9/11,259 controls; Fisher exact test), it was not exome-wide significant. According to the pext values in gnomAD, all 3 pLoFs observed in the patients with dystonia were located in the highly expressed canonical transcript ENST00000380445.3, whereas 2 of 3 pLoFs detected in 8 individuals without dystonia were located in the first exon of the noncanonical transcript ENST00000380443.3 that is weakly expressed across all tissues. Taking these biological implications into account, pLoFs involving the canonical transcript were exome-wide significantly enriched in patients with dystonia (p = 1.67 × 10−6; 4/280 cases vs 1/11,259 controls; Fisher exact test). All VPS16 patients showed mild progressive dystonia with writer's cramp as the presenting symptom between age 7 and 34 years (mean 20 years) that often progressed to generalized dystonia and was even accompanied by hyperkinetic movements and myoclonus in 1 patient.DiscussionOur data provide strong evidence for VPS16 pLoFs to be implicated in dystonia and knowledge on exon resolution expression levels as well as statistical measures proved to be useful for variant interpretation.

Download Full-text

Feature Selection Method based on Fisher’s Exact Test for Agricultural Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d1104.1284s219 ◽

2019 ◽

Vol 8 (4S2) ◽

pp. 558-564

Keyword(s):

Feature Selection ◽

Statistical Methods ◽

Selection Process ◽

Large Data ◽

Chi Square ◽

Data Set ◽

Exact Test ◽

Fisher's Exact Test ◽

Large Data Set ◽

Fisher’S Exact Test

This paper is aimed to analyze the feature selection process based on different statistical methods viz., Correlation, Gain Ratio, Information gain, OneR, Chi-square MapReduce model, Fisher’s exact test for agricultural data. During the recent past, Fishers exact test was commonly used for feature selection process. However, it supports only for small data set. To handle large data set, the Chi square, one of the most popular statistical methods is used. But, it also finds irrelevant data and thus resultant accuracy is not as expected. As a novelty, Fisher’s exact test is combined with Map Reduce model to handle large data set. In addition, the simulation outcome proves that proposed fisher’s exact test finds the significant attributes with more accurate and reduced time complexity when compared to other existing methods.

Download Full-text

The environmental niche of soil bacterial, archaeal, fungal and protist communities differ along edaphic and topoclimatic gradients in the Alps

10.21203/rs.3.rs-609984/v1 ◽

2021 ◽

Author(s):

Lucie A Malard ◽

Heidi K Mod ◽

Nicolas Guex ◽

Olivier Broennimann ◽

Erika Yashiro ◽

...

Keyword(s):

Niche Breadth ◽

Environmental Changes ◽

Environmental Gradients ◽

Community Diversity ◽

Environmental Niche ◽

Alpha And Beta Diversity ◽

Niche Concept ◽

The Alps ◽

Microbial Groups ◽

Taxonomic Groups

Abstract BackgroundThe niche concept describes the range of conditions supporting the establishment and persistence of species in the environment. Although widely used in ecology, it has not been often applied to microbes, for which comparative niche analyses are still lacking. Yet, quantifying the niche of microbial taxa is necessary to forecast how taxa and the communities they compose might respond to environmental changes. In this study, we identified important topoclimatic, edaphic, spatial and biotic drivers of the alpha and beta diversity of bacterial, archaeal, fungal and protist communities. Then, we established a method to calculate the niche breadth and position of each taxon along environmental gradients to determine whether microorganisms have distinct environmental niches. ResultsFor all microbial groups, edaphic properties were identified as the most important drivers of both community diversity and composition. Protists presented the largest niche breadths, followed by bacteria and archaea, with fungi displaying the smallest. Niche breadth generally decreased towards environmental extremes, especially along edaphic gradients, suggesting increased specialisation of all microbial taxa in highly selective environments. ConclusionIn this study, we showed that microorganisms have well defined niches, as do macro-organisms, and that these likely drive part of the observed spatial patterns of community variations, but with notable differences among taxonomic groups. Applying the niche concept more widely to microbial ecology should open many novel perspectives, especially to tackle global change challenges.

Download Full-text

Gait Recognition Analysis for Human Identification Analysis- A Hybrid Deep Learning Process

10.21203/rs.3.rs-549846/v1 ◽

2021 ◽

Author(s):

Mathivanan B ◽

Perumal P

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Feature Selection ◽

Median Filter ◽

Gait Recognition ◽

Human Gait ◽

Scale Invariant ◽

Inference System ◽

Statistical Measures ◽

Hu Moments

Abstract Gait is an individual biometric behavior which can be detected based on distance which has different submissions in social security, forensic detection and crime prevention. Hence, in this paper, Advanced Deep Belief Neural Network with Black Widow Optimization (ADBNN-BWO) Algorithm is developed to identify the human emotions by human walking style images. This proposed methodology is working based on four stages like pre-processing, feature extraction, feature selection and classification. For the pre-processing, contrast enhancement median filter is used and Hu Moments, GLCM, Fast Scale-invariant feature transform (F-SIFT), in addition skeleton features are used for the feature extraction. To extract the features efficiently, the feature extraction algorithm can be often very essential calculation. After that, feature selection is performed. Then the classification process is done by utilizing the proposed ADBNN-BWO Algorithm. Based on the proposed method, the human gait recognition is achieved which utilized to identify the emotions from the walking style. The proposed method is validated by using the open source gait databases. The proposed method is implemented in MATLAB platform and their corresponding performances/outputs are evaluated. Moreover, the statistical measures of proposed method are also determined and compared with the existing method as Artificial Neural Network (ANN), Mayfly algorithm with Particle Swarm Optimization (MA-PSO), Recurrent Neural Network -PSO (RNN-PSO) and Adaptive Neuro Fuzzy Inference System (ANFIS) respectively.

Download Full-text

Fuzzy-FishNet: A highly precise distribution-free network approach for feature selection in clinical proteomics

10.1101/024430 ◽

2015 ◽

Author(s):

Wilson Wen Bin Goh

Keyword(s):

Feature Selection ◽

Small Sample Size ◽

Small Sample ◽

Clinical Proteomics ◽

P Value ◽

Fisher Exact Test ◽

Proteomics Data ◽

Exact Test ◽

Genomics And Proteomics ◽

T Distribution

Network-based analysis methods can help resolve coverage and inconsistency issues in proteomics data. Previously, it was demonstrated that a suite of rank-based network approaches (RBNAs) provides unparalleled consistency and reliable feature selection. However, reliance on the t-statistic/t-distribution and hypersensitivity (coupled to a relatively flat p-value distribution) makes feature prioritization for validation difficult. To address these concerns, a refinement based on the fuzzified Fisher exact test, Fuzzy-FishNet was developed. Fuzzy-FishNet is highly precise (providing probability values that allows exact ranking of features). Furthermore, feature ranks are stable, even in small sample size scenario. Comparison of features selected by genomics and proteomics data respectively revealed that in spite of relative feature stability, cross-platform overlaps are extremely limited, suggesting that networks may not be the answer towards bridging the proteomics-genomics divide.

Download Full-text