scholarly journals Dynamic species classification of microorganisms across time, abiotic and biotic environments — a sliding window approach

2017 ◽  
Author(s):  
Frank Pennekamp ◽  
Jason I. Griffiths ◽  
Emanuel A. Fronhofer ◽  
Aurélie Garnier ◽  
Mathew Seymour ◽  
...  

Summary1. Technological advances have greatly simplified to take and analyze digital images and videos, and ecologists increasingly use these techniques for trait, behavioral and taxonomic analyses. The development of techniques to automate biological measurements from the environment opens up new possibilities to infer species numbers, observe presence/absence patterns and recognize individuals based on audio-visual information.2. Streams of quantitative data, such as temporal species abundances, are processed by machine learning (ML) algorithms into meaningful information. Machine learning approaches learn to distinguish classes (e.g., species) from observed quantitative features (phenotypes), and in-turn predict the distinguished classes in subsequent observations. However, in biological systems, the environment changes, often driving phenotypic changes in behaviour and morphology.3. Here we describe a framework for classifying species under dynamic biotic and abiotic conditions using a novel sliding window approach. We train a random forest classifier on subsets of the data, covering restricted temporal, biotic and abiotic ranges (i.e. windows). We test our approach by applying the classification framework to experimental microbial communities where results were validated against manual classification. Individuals from one to six ciliate species were monitored over hundreds of generations in dozens of different species combinations and over a temperature gradient. We describe the steps of our classification pipeline and systematically explore the effects of the abiotic and biotic environments as well as temporal effects on classification success.4. Differences in biotic and abiotic conditions caused simplistic classification approaches to be unsuccessful. In contrast, the sliding window approach allowed classification to be highly successful, because phenotypic differences driven by environmental change could be captured in the learning algorithm. Importantly, automatic classification showed comparable success compared to manual identifications.5. Our framework allows for reliable classification even in dynamic environmental contexts, and may help to improve long-term monitoring of species from environmental samples. It therefore has application in disciplines with automatic enumeration and phenotyping of organisms such as eco-toxicology, ecology and evolutionary ecology, and broad-scale environmental monitoring.

Author(s):  
Sheela Rani P ◽  
Dhivya S ◽  
Dharshini Priya M ◽  
Dharmila Chowdary A

Machine learning is a new analysis discipline that uses knowledge to boost learning, optimizing the training method and developing the atmosphere within which learning happens. There square measure 2 sorts of machine learning approaches like supervised and unsupervised approach that square measure accustomed extract the knowledge that helps the decision-makers in future to require correct intervention. This paper introduces an issue that influences students' tutorial performance prediction model that uses a supervised variety of machine learning algorithms like support vector machine , KNN(k-nearest neighbors), Naïve Bayes and supplying regression and logistic regression. The results supported by various algorithms are compared and it is shown that the support vector machine and Naïve Bayes performs well by achieving improved accuracy as compared to other algorithms. The final prediction model during this paper may have fairly high prediction accuracy .The objective is not just to predict future performance of students but also provide the best technique for finding the most impactful features that influence student’s while studying.


2014 ◽  
Vol 53 (11) ◽  
pp. 2457-2480 ◽  
Author(s):  
Meike Kühnlein ◽  
Tim Appelhans ◽  
Boris Thies ◽  
Thomas Nauß

AbstractA new rainfall retrieval technique for determining rainfall rates in a continuous manner (day, twilight, and night) resulting in a 24-h estimation applicable to midlatitudes is presented. The approach is based on satellite-derived information on cloud-top height, cloud-top temperature, cloud phase, and cloud water path retrieved from Meteosat Second Generation (MSG) Spinning Enhanced Visible and Infrared Imager (SEVIRI) data and uses the random forests (RF) machine-learning algorithm. The technique is realized in three steps: (i) precipitating cloud areas are identified, (ii) the areas are separated into convective and advective-stratiform precipitating areas, and (iii) rainfall rates are assigned separately to the convective and advective-stratiform precipitating areas. Validation studies were carried out for each individual step as well as for the overall procedure using collocated ground-based radar data. Regarding each individual step, the models for rain area and convective precipitation detection produce good results. Both retrieval steps show a general tendency toward elevated prediction skill during summer months and daytime. The RF models for rainfall-rate assignment exhibit similar performance patterns, yet it is noteworthy how well the model is able to predict rainfall rates during nighttime and twilight. The performance of the overall procedure shows a very promising potential to estimate rainfall rates at high temporal and spatial resolutions in an automated manner. The near-real-time continuous applicability of the technique with acceptable prediction performances at 3–8-hourly intervals is particularly remarkable. This provides a very promising basis for future investigations into precipitation estimation based on machine-learning approaches and MSG SEVIRI data.


Molecules ◽  
2019 ◽  
Vol 24 (13) ◽  
pp. 2414
Author(s):  
Weixing Dai ◽  
Dianjing Guo

Machine learning plays an important role in ligand-based virtual screening. However, conventional machine learning approaches tend to be inefficient when dealing with such problems where the data are imbalanced and features describing the chemical characteristic of ligands are high-dimensional. We here describe a machine learning algorithm LBS (local beta screening) for ligand-based virtual screening. The unique characteristic of LBS is that it quantifies the generalization ability of screening directly by a refined loss function, and thus can assess the risk of over-fitting accurately and efficiently for imbalanced and high-dimensional data in ligand-based virtual screening without the help of resampling methods such as cross validation. The robustness of LBS was demonstrated by a simulation study and tests on real datasets, in which LBS outperformed conventional algorithms in terms of screening accuracy and model interpretation. LBS was then used for screening potential activators of HIV-1 integrase multimerization in an independent compound library, and the virtual screening result was experimentally validated. Of the 25 compounds tested, six were proved to be active. The most potent compound in experimental validation showed an EC50 value of 0.71 µM.


Computation ◽  
2019 ◽  
Vol 7 (1) ◽  
pp. 13 ◽  
Author(s):  
Francesco Rundo ◽  
Sergio Rinella ◽  
Simona Massimino ◽  
Marinella Coco ◽  
Giorgio Fallica ◽  
...  

The development of detection methodologies for reliable drowsiness tracking is a challenging task requiring both appropriate signal inputs and accurate and robust algorithms of analysis. The aim of this research is to develop an advanced method to detect the drowsiness stage in electroencephalogram (EEG), the most reliable physiological measurement, using the promising Machine Learning methodologies. The methods used in this paper are based on Machine Learning methodologies such as stacked autoencoder with softmax layers. Results obtained from 62 volunteers indicate 100% accuracy in drowsy/wakeful discrimination, proving that this approach can be very promising for use in the next generation of medical devices. This methodology can be extended to other uses in everyday life in which the maintaining of the level of vigilance is critical. Future works aim to perform extended validation of the proposed pipeline with a wide-range training set in which we integrate the photoplethysmogram (PPG) signal and visual information with EEG analysis in order to improve the robustness of the overall approach.


Author(s):  
Zhixiang Chen ◽  
Binhai Zhu ◽  
Xiannong Meng

In this chapter, machine-learning approaches to real-time intelligent Web search are discussed. The goal is to build an intelligent Web search system that can find the user’s desired information with as little relevance feedback from the user as possible. The system can achieve a significant search precision increase with a small number of iterations of user relevance feedback. A new machine-learning algorithm is designed as the core of the intelligent search component. This algorithm is applied to three different search engines with different emphases. This chapter presents the algorithm, the architectures, and the performances of these search engines. Future research issues regarding real-time intelligent Web search are also discussed.


2020 ◽  
Author(s):  
Dana Azouri ◽  
Shiran Abadi ◽  
Yishay Mansour ◽  
Itay Mayrose ◽  
Tal Pupko

Abstract Inferring a phylogenetic tree, which describes the evolutionary relationships among a set of organisms, genes, or genomes, is a fundamental step in numerous evolutionary studies. With the aim of making tree inference feasible for problems involving more than a handful of sequences, current algorithms for phylogenetic tree reconstruction utilize various heuristic approaches. Such approaches rely on performing costly likelihood optimizations, and thus evaluate only a subset of all potential trees. Consequently, all existing methods suffer from the known tradeoff between accuracy and running time. Here, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus avoiding numerous expensive likelihood computations. Our analyses suggest that machine-learning approaches can make heuristic tree searches substantially faster without losing accuracy and thus could be incorporated for narrowing down the examined neighboring trees of each intermediate tree in any tree search methodology.


Author(s):  
Ram D. Joshi ◽  
Chandra K. Dhakal

Diabetes mellitus is one of the most common human diseases worldwide and may cause several health-related complications. It is responsible for considerable morbidity, mortality, and economic loss. A timely diagnosis and prediction of this disease could provide patients with an opportunity to take the appropriate preventive and treatment strategies. To improve the understanding of risk factors, we predict type 2 diabetes for Pima Indian women utilizing a logistic regression model and decision tree—a machine learning algorithm. Our analysis finds five main predictors of type 2 diabetes: glucose, pregnancy, body mass index (BMI), diabetes pedigree function, and age. We further explore a classification tree to complement and validate our analysis. The six-fold classification tree indicates glucose, BMI, and age are important factors, while the ten-node tree implies glucose, BMI, pregnancy, diabetes pedigree function, and age as the significant predictors. Our preferred specification yields a prediction accuracy of 78.26% and a cross-validation error rate of 21.74%. We argue that our model can be applied to make a reasonable prediction of of type 2 diabetes, and could potentially be used to complement existing preventive measures to curb the incidence of diabetes and reduce associated costs.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 88
Author(s):  
Charles Carslake ◽  
Jorge A. Vázquez-Diosdado ◽  
Jasmeet Kaler

Previous research has shown that sensors monitoring lying behaviours and feeding can detect early signs of ill health in calves. There is evidence to suggest that monitoring change in a single behaviour might not be enough for disease prediction. In calves, multiple behaviours such as locomotor play, self-grooming, feeding and activity whilst lying are likely to be informative. However, these behaviours can occur rarely in the real world, which means simply counting behaviours based on the prediction of a classifier can lead to overestimation. Here, we equipped thirteen pre-weaned dairy calves with collar-mounted sensors and monitored their behaviour with video cameras. Behavioural observations were recorded and merged with sensor signals. Features were calculated for 1–10-s windows and an AdaBoost ensemble learning algorithm implemented to classify behaviours. Finally, we developed an adjusted count quantification algorithm to predict the prevalence of locomotor play behaviour on a test dataset with low true prevalence (0.27%). Our algorithm identified locomotor play (99.73% accuracy), self-grooming (98.18% accuracy), ruminating (94.47% accuracy), non-nutritive suckling (94.96% accuracy), nutritive suckling (96.44% accuracy), active lying (90.38% accuracy) and non-active lying (90.38% accuracy). Our results detail recommended sampling frequencies, feature selection and window size. The quantification estimates of locomotor play behaviour were highly correlated with the true prevalence (0.97; p < 0.001) with a total overestimation of 18.97%. This study is the first to implement machine learning approaches for multi-class behaviour identification as well as behaviour quantification in calves. This has potential to contribute towards new insights to evaluate the health and welfare in calves by use of wearable sensors.


2012 ◽  
pp. 1090-1107
Author(s):  
Artem A. Lenskiy ◽  
Jong-Soo Lee

The use of visual information for the navigation of unmanned ground vehicles in a cross-country environment recently received great attention. However, until now, the use of textural information has been somewhat less effective than color or laser range information. This chapter reviews the recent achievements in cross-country scene segmentation and addresses their shortcomings. It then describes a problem related to classification of high dimensional texture features. Finally, it compares three machine learning algorithms aimed at resolving this problem. The experimental results for each machine learning algorithm with the discussion of comparisons are given at the end of the chapter.


Sign in / Sign up

Export Citation Format

Share Document