scholarly journals ViraMiner: Deep Learning on Raw DNA Sequences for Identifying Viral Genomes in Human Samples

2019 ◽  
Author(s):  
Ardi Tampuu ◽  
Zurab Bzhalava ◽  
Joakim Dillner ◽  
Raul Vicente

ABSTRACTDespite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.

PLoS ONE ◽  
2019 ◽  
Vol 14 (9) ◽  
pp. e0222271 ◽  
Author(s):  
Ardi Tampuu ◽  
Zurab Bzhalava ◽  
Joakim Dillner ◽  
Raul Vicente

Animals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1549
Author(s):  
Robert D. Chambers ◽  
Nathanael C. Yoder ◽  
Aletha B. Carson ◽  
Christian Junge ◽  
David E. Allen ◽  
...  

Collar-mounted canine activity monitors can use accelerometer data to estimate dog activity levels, step counts, and distance traveled. With recent advances in machine learning and embedded computing, much more nuanced and accurate behavior classification has become possible, giving these affordable consumer devices the potential to improve the efficiency and effectiveness of pet healthcare. Here, we describe a novel deep learning algorithm that classifies dog behavior at sub-second resolution using commercial pet activity monitors. We built machine learning training databases from more than 5000 videos of more than 2500 dogs and ran the algorithms in production on more than 11 million days of device data. We then surveyed project participants representing 10,550 dogs, which provided 163,110 event responses to validate real-world detection of eating and drinking behavior. The resultant algorithm displayed a sensitivity and specificity for detecting drinking behavior (0.949 and 0.999, respectively) and eating behavior (0.988, 0.983). We also demonstrated detection of licking (0.772, 0.990), petting (0.305, 0.991), rubbing (0.729, 0.996), scratching (0.870, 0.997), and sniffing (0.610, 0.968). We show that the devices’ position on the collar had no measurable impact on performance. In production, users reported a true positive rate of 95.3% for eating (among 1514 users), and of 94.9% for drinking (among 1491 users). The study demonstrates the accurate detection of important health-related canine behaviors using a collar-mounted accelerometer. We trained and validated our algorithms on a large and realistic training dataset, and we assessed and confirmed accuracy in production via user validation.


Webology ◽  
2021 ◽  
Vol 18 (Special Issue 04) ◽  
pp. 1470-1478
Author(s):  
R. Lavanya ◽  
Ebani Gogia ◽  
Nihal Rai

Recommendation system is a crucial part of offering items especially in services that offer streaming. For streaming movie services on OTT, RS are a helping hand for users in finding new movies for leisure. In this paper, we propose a machine learning an approach based on auto encoders to produce a CF system which outputs movie rating for a user based on a huge DB of ratings from other users. Utilising Movie Lens dataset, we explore the use of deep learning neural network based Stacked Auto encoders to predict user s ratings on new movies, thereby enabling movie recommendations. We consequently implement Singular Value Decomposition (SVD) to recommend movies to users. The experimental result showcase that our R S out performs a user-based neighbourhood baseline in terms of MSE on predicted ratings and in a survey in which user judge between recommendation s from both systems.


2021 ◽  
Author(s):  
Marco Luca Sbodio ◽  
Natasha Mulligan ◽  
Stefanie Speichert ◽  
Vanessa Lopez ◽  
Joao Bettencourt-Silva

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.


Author(s):  
A. Wichmann ◽  
A. Agoub ◽  
M. Kada

Machine learning methods have gained in importance through the latest development of artificial intelligence and computer hardware. Particularly approaches based on deep learning have shown that they are able to provide state-of-the-art results for various tasks. However, the direct application of deep learning methods to improve the results of 3D building reconstruction is often not possible due, for example, to the lack of suitable training data. To address this issue, we present RoofN3D which provides a new 3D point cloud training dataset that can be used to train machine learning models for different tasks in the context of 3D building reconstruction. It can be used, among others, to train semantic segmentation networks or to learn the structure of buildings and the geometric model construction. Further details about RoofN3D and the developed data preparation framework, which enables the automatic derivation of training data, are described in this paper. Furthermore, we provide an overview of other available 3D point cloud training data and approaches from current literature in which solutions for the application of deep learning to unstructured and not gridded 3D point cloud data are presented.


Sensors ◽  
2019 ◽  
Vol 19 (3) ◽  
pp. 521 ◽  
Author(s):  
Alejandro Baldominos ◽  
Alejandro Cervantes ◽  
Yago Saez ◽  
Pedro Isasi

We have compared the performance of different machine learning techniques for human activity recognition. Experiments were made using a benchmark dataset where each subject wore a device in the pocket and another on the wrist. The dataset comprises thirteen activities, including physical activities, common postures, working activities and leisure activities. We apply a methodology known as the activity recognition chain, a sequence of steps involving preprocessing, segmentation, feature extraction and classification for traditional machine learning methods; we also tested convolutional deep learning networks that operate on raw data instead of using computed features. Results show that combination of two sensors does not necessarily result in an improved accuracy. We have determined that best results are obtained by the extremely randomized trees approach, operating on precomputed features and on data obtained from the wrist sensor. Deep learning architectures did not produce competitive results with the tested architecture.


2021 ◽  
Author(s):  
Sidhant Idgunji ◽  
Madison Ho ◽  
Jonathan L. Payne ◽  
Daniel Lehrmann ◽  
Michele Morsilli ◽  
...  

<p>The growing digitization of fossil images has vastly improved and broadened the potential application of big data and machine learning, particularly computer vision, in paleontology. Recent studies show that machine learning is capable of approaching human abilities of classifying images, and with the increase in computational power and visual data, it stands to reason that it can match human ability but at much greater efficiency in the near future. Here we demonstrate this potential of using deep learning to identify skeletal grains at different levels of the Linnaean taxonomic hierarchy. Our approach was two-pronged. First, we built a database of skeletal grain images spanning a wide range of animal phyla and classes and used this database to train the model. We used a Python-based method to automate image recognition and extraction from published sources. Second, we developed a deep learning algorithm that can attach multiple labels to a single image. Conventionally, deep learning is used to predict a single class from an image; here, we adopted a Branch Convolutional Neural Network (B-CNN) technique to classify multiple taxonomic levels for a single skeletal grain image. Using this method, we achieved over 90% accuracy for both the coarse, phylum-level recognition and the fine, class-level recognition across diverse skeletal grains (6 phyla and 15 classes). Furthermore, we found that image augmentation improves the overall accuracy. This tool has potential applications in geology ranging from biostratigraphy to paleo-bathymetry, paleoecology, and microfacies analysis. Further improvement of the algorithm and expansion of the training dataset will continue to narrow the efficiency gap between human expertise and machine learning.</p>


2020 ◽  
Vol 10 (23) ◽  
pp. 8466
Author(s):  
Marcel Neuhausen ◽  
Dennis Pawlowski ◽  
Markus König

Keeping an overview of all ongoing processes on construction sites is almost unfeasible, especially for the construction workers executing their tasks. It is difficult for workers to concentrate on their work while paying attention to other processes. If their workflows in hazardous areas do not run properly, this can lead to dangerous accidents. Tracking pedestrian workers could improve the productivity and safety management on construction sites. For this, vision-based tracking approaches are suitable, but the training and evaluation of such a system requires a large amount of data originating from construction sites. These are rarely available, which complicates deep learning approaches. Thus, we use a small generic dataset and juxtapose a deep learning detector with an approach based on classical machine learning techniques. We identify workers using a YOLOv3 detector and compare its performance with an approach based on a soft cascaded classifier. Afterwards, tracking is done by a Kalman filter. In our experiments, the classical approach outperforms YOLOv3 on the detection task given a small training dataset. However, the Kalman filter is sufficiently robust to compensate for the drawbacks of YOLOv3. We found that both approaches generally yield a satisfying tracking performances but feature different characteristics.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Qi Wang ◽  
Bryce Kille ◽  
Tian Rui Liu ◽  
R. A. Leo Elworth ◽  
Todd J. Treangen

AbstractWith advances in synthetic biology and genome engineering comes a heightened awareness of potential misuse related to biosafety concerns. A recent study employed machine learning to identify the lab-of-origin of DNA sequences to help mitigate some of these concerns. Despite their promising results, this deep learning based approach had limited accuracy, was computationally expensive to train, and wasn’t able to provide the precise features that were used in its predictions. To address these shortcomings, we developed PlasmidHawk for lab-of-origin prediction. Compared to a machine learning approach, PlasmidHawk has higher prediction accuracy; PlasmidHawk can successfully predict unknown sequences’ depositing labs 76% of the time and 85% of the time the correct lab is in the top 10 candidates. In addition, PlasmidHawk can precisely single out the signature sub-sequences that are responsible for the lab-of-origin detection. In summary, PlasmidHawk represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences. PlasmidHawk is available at https://gitlab.com/treangenlab/plasmidhawk.git.


Sign in / Sign up

Export Citation Format

Share Document