ViraMiner: Deep Learning on Raw DNA Sequences for Identifying Viral Genomes in Human Samples

Mapping Intimacies ◽

10.1101/602656 ◽

2019 ◽

Cited By ~ 4

Author(s):

Ardi Tampuu ◽

Zurab Bzhalava ◽

Joakim Dillner ◽

Raul Vicente

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Dna Sequences ◽

Recommendation System ◽

Clinical Importance ◽

Training Dataset ◽

Viral Genomes ◽

Human Samples ◽

Types Of Information ◽

Improved Accuracy

ABSTRACTDespite its clinical importance, detection of highly divergent or yet unknown viruses is a major challenge. When human samples are sequenced, conventional alignments classify many assembled contigs as “unknown” since many of the sequences are not similar to known genomes. In this work, we developed ViraMiner, a deep learning-based method to identify viruses in various human biospecimens. ViraMiner contains two branches of Convolutional Neural Networks designed to detect both patterns and pattern-frequencies on raw metagenomics contigs. The training dataset included sequences obtained from 19 metagenomic experiments which were analyzed and labeled by BLAST. The model achieves significantly improved accuracy compared to other machine learning methods for viral genome classification. Using 300 bp contigs ViraMiner achieves 0.923 area under the ROC curve. To our knowledge, this is the first machine learning methodology that can detect the presence of viral sequences among raw metagenomic contigs from diverse human samples. We suggest that the proposed model captures different types of information of genome composition, and can be used as a recommendation system to further investigate sequences labeled as “unknown” by conventional alignment methods. Exploring these highly-divergent viruses, in turn, can enhance our knowledge of infectious causes of diseases.

Download Full-text

ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples

PLoS ONE ◽

10.1371/journal.pone.0222271 ◽

2019 ◽

Vol 14 (9) ◽

pp. e0222271 ◽

Cited By ~ 13

Author(s):

Ardi Tampuu ◽

Zurab Bzhalava ◽

Joakim Dillner ◽

Raul Vicente

Keyword(s):

Deep Learning ◽

Dna Sequences ◽

Viral Genomes ◽

Human Samples

Download Full-text

Deep Learning Classification of Canine Behavior Using a Single Collar-Mounted Accelerometer: Real-World Validation

Animals ◽

10.3390/ani11061549 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1549

Author(s):

Robert D. Chambers ◽

Nathanael C. Yoder ◽

Aletha B. Carson ◽

Christian Junge ◽

David E. Allen ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Real World ◽

Learning Algorithm ◽

Drinking Behavior ◽

True Positive Rate ◽

Training Dataset ◽

Activity Levels ◽

Accelerometer Data ◽

Activity Monitors

Collar-mounted canine activity monitors can use accelerometer data to estimate dog activity levels, step counts, and distance traveled. With recent advances in machine learning and embedded computing, much more nuanced and accurate behavior classification has become possible, giving these affordable consumer devices the potential to improve the efficiency and effectiveness of pet healthcare. Here, we describe a novel deep learning algorithm that classifies dog behavior at sub-second resolution using commercial pet activity monitors. We built machine learning training databases from more than 5000 videos of more than 2500 dogs and ran the algorithms in production on more than 11 million days of device data. We then surveyed project participants representing 10,550 dogs, which provided 163,110 event responses to validate real-world detection of eating and drinking behavior. The resultant algorithm displayed a sensitivity and specificity for detecting drinking behavior (0.949 and 0.999, respectively) and eating behavior (0.988, 0.983). We also demonstrated detection of licking (0.772, 0.990), petting (0.305, 0.991), rubbing (0.729, 0.996), scratching (0.870, 0.997), and sniffing (0.610, 0.968). We show that the devices’ position on the collar had no measurable impact on performance. In production, users reported a true positive rate of 95.3% for eating (among 1514 users), and of 94.9% for drinking (among 1491 users). The study demonstrates the accurate detection of important health-related canine behaviors using a collar-mounted accelerometer. We trained and validated our algorithms on a large and realistic training dataset, and we assessed and confirmed accuracy in production via user validation.

Download Full-text

Comparison Study on Improved Movie Recommender Systems

Webology ◽

10.14704/web/v18si04/web18285 ◽

2021 ◽

Vol 18 (Special Issue 04) ◽

pp. 1470-1478

Author(s):

R. Lavanya ◽

Ebani Gogia ◽

Nihal Rai

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Recommendation System ◽

Singular Value ◽

Experimental Result ◽

Comparison Study ◽

Deep Learning Neural Network ◽

Value Decomposition ◽

Crucial Part

Recommendation system is a crucial part of offering items especially in services that offer streaming. For streaming movie services on OTT, RS are a helping hand for users in finding new movies for leisure. In this paper, we propose a machine learning an approach based on auto encoders to produce a CF system which outputs movie rating for a user based on a huge DB of ratings from other users. Utilising Movie Lens dataset, we explore the use of deep learning neural network based Stacked Auto encoders to predict user s ratings on new movies, thereby enabling movie recommendations. We consequently implement Singular Value Decomposition (SVD) to recommend movies to users. The experimental result showcase that our R S out performs a user-based neighbourhood baseline in terms of MSE on predicted ratings and in a survey in which user judge between recommendation s from both systems.

Download Full-text

Encoding Health Records into Pathway Representations for Deep Learning

10.3233/shti210800 ◽

2021 ◽

Author(s):

Marco Luca Sbodio ◽

Natasha Mulligan ◽

Stefanie Speichert ◽

Vanessa Lopez ◽

Joao Bettencourt-Silva

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Source Code ◽

Training Dataset ◽

Health Records ◽

Learning Tasks ◽

Patient Pathways ◽

Computational Resources ◽

The Impact

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.

Download Full-text

ROOFN3D: DEEP LEARNING TRAINING DATA FOR 3D BUILDING RECONSTRUCTION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-1191-2018 ◽

2018 ◽

Vol XLII-2 ◽

pp. 1191-1198 ◽

Cited By ~ 5

Author(s):

A. Wichmann ◽

A. Agoub ◽

M. Kada

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Point Cloud ◽

Geometric Model ◽

Training Data ◽

Computer Hardware ◽

Training Dataset ◽

3D Point Cloud ◽

Learning Methods ◽

Building Reconstruction

Machine learning methods have gained in importance through the latest development of artificial intelligence and computer hardware. Particularly approaches based on deep learning have shown that they are able to provide state-of-the-art results for various tasks. However, the direct application of deep learning methods to improve the results of 3D building reconstruction is often not possible due, for example, to the lack of suitable training data. To address this issue, we present RoofN3D which provides a new 3D point cloud training dataset that can be used to train machine learning models for different tasks in the context of 3D building reconstruction. It can be used, among others, to train semantic segmentation networks or to learn the structure of buildings and the geometric model construction. Further details about RoofN3D and the developed data preparation framework, which enables the automatic derivation of training data, are described in this paper. Furthermore, we provide an overview of other available 3D point cloud training data and approaches from current literature in which solutions for the application of deep learning to unstructured and not gridded 3D point cloud data are presented.

Download Full-text

A Comparison of Machine Learning and Deep Learning Techniques for Activity Recognition using Mobile Devices

Sensors ◽

10.3390/s19030521 ◽

2019 ◽

Vol 19 (3) ◽

pp. 521 ◽

Cited By ~ 6

Author(s):

Alejandro Baldominos ◽

Alejandro Cervantes ◽

Yago Saez ◽

Pedro Isasi

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Activity Recognition ◽

Leisure Activities ◽

Machine Learning Techniques ◽

Learning Networks ◽

Extremely Randomized Trees ◽

Learning Techniques ◽

Improved Accuracy ◽

Learning Architectures

We have compared the performance of different machine learning techniques for human activity recognition. Experiments were made using a benchmark dataset where each subject wore a device in the pocket and another on the wrist. The dataset comprises thirteen activities, including physical activities, common postures, working activities and leisure activities. We apply a methodology known as the activity recognition chain, a sequence of steps involving preprocessing, segmentation, feature extraction and classification for traditional machine learning methods; we also tested convolutional deep learning networks that operate on raw data instead of using computed features. Results show that combination of two sensors does not necessarily result in an improved accuracy. We have determined that best results are obtained by the extremely randomized trees approach, operating on precomputed features and on data obtained from the wrist sensor. Deep learning architectures did not produce competitive results with the tested architecture.

Download Full-text

Deep Neural Networks for Hierarchical Taxonomic Fossil Classification of Carbonate Skeletal Grains

10.5194/egusphere-egu21-16394 ◽

2021 ◽

Author(s):

Sidhant Idgunji ◽

Madison Ho ◽

Jonathan L. Payne ◽

Daniel Lehrmann ◽

Michele Morsilli ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Algorithm ◽

Training Dataset ◽

Single Class ◽

Deep Learning Algorithm ◽

Human Ability ◽

Wide Range ◽

Potential Applications ◽

Animal Phyla

<p>The growing digitization of fossil images has vastly improved and broadened the potential application of big data and machine learning, particularly computer vision, in paleontology. Recent studies show that machine learning is capable of approaching human abilities of classifying images, and with the increase in computational power and visual data, it stands to reason that it can match human ability but at much greater efficiency in the near future. Here we demonstrate this potential of using deep learning to identify skeletal grains at different levels of the Linnaean taxonomic hierarchy. Our approach was two-pronged. First, we built a database of skeletal grain images spanning a wide range of animal phyla and classes and used this database to train the model. We used a Python-based method to automate image recognition and extraction from published sources. Second, we developed a deep learning algorithm that can attach multiple labels to a single image. Conventionally, deep learning is used to predict a single class from an image; here, we adopted a Branch Convolutional Neural Network (B-CNN) technique to classify multiple taxonomic levels for a single skeletal grain image. Using this method, we achieved over 90% accuracy for both the coarse, phylum-level recognition and the fine, class-level recognition across diverse skeletal grains (6 phyla and 15 classes). Furthermore, we found that image augmentation improves the overall accuracy. This tool has potential applications in geology ranging from biostratigraphy to paleo-bathymetry, paleoecology, and microfacies analysis. Further improvement of the algorithm and expansion of the training dataset will continue to narrow the efficiency gap between human expertise and machine learning.</p>

Download Full-text

Recommendation System Based on Machine Learning and Deep Learning in Varied Perspectives: A Systematic Review

Information and Communication Technology for Competitive Strategies (ICTCS 2020) - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-0882-7_36 ◽

2021 ◽

pp. 419-432

Author(s):

T. B. Lalitha ◽

P. S. Sreeja

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Deep Learning ◽

Recommendation System

Download Full-text

Comparing Classical and Modern Machine Learning Techniques for Monitoring Pedestrian Workers in Top-View Construction Site Video Sequences

Applied Sciences ◽

10.3390/app10238466 ◽

2020 ◽

Vol 10 (23) ◽

pp. 8466

Author(s):

Marcel Neuhausen ◽

Dennis Pawlowski ◽

Markus König

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Kalman Filter ◽

Safety Management ◽

Machine Learning Techniques ◽

Training Dataset ◽

Learning Approaches ◽

Construction Sites ◽

Learning Techniques ◽

Different Characteristics

Keeping an overview of all ongoing processes on construction sites is almost unfeasible, especially for the construction workers executing their tasks. It is difficult for workers to concentrate on their work while paying attention to other processes. If their workflows in hazardous areas do not run properly, this can lead to dangerous accidents. Tracking pedestrian workers could improve the productivity and safety management on construction sites. For this, vision-based tracking approaches are suitable, but the training and evaluation of such a system requires a large amount of data originating from construction sites. These are rarely available, which complicates deep learning approaches. Thus, we use a small generic dataset and juxtapose a deep learning detector with an approach based on classical machine learning techniques. We identify workers using a YOLOv3 detector and compare its performance with an approach based on a soft cascaded classifier. Afterwards, tracking is done by a Kalman filter. In our experiments, the classical approach outperforms YOLOv3 on the detection task given a small training dataset. However, the Kalman filter is sufficiently robust to compensate for the drawbacks of YOLOv3. We found that both approaches generally yield a satisfying tracking performances but feature different characteristics.

Download Full-text

PlasmidHawk improves lab of origin prediction of engineered plasmids using sequence alignment

Nature Communications ◽

10.1038/s41467-021-21180-w ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Qi Wang ◽

Bryce Kille ◽

Tian Rui Liu ◽

R. A. Leo Elworth ◽

Todd J. Treangen

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Synthetic Biology ◽

Dna Sequences ◽

Prediction Accuracy ◽

Genome Engineering ◽

Learning Approach ◽

Machine Learning Approach ◽

Computationally Expensive ◽

Limited Accuracy

AbstractWith advances in synthetic biology and genome engineering comes a heightened awareness of potential misuse related to biosafety concerns. A recent study employed machine learning to identify the lab-of-origin of DNA sequences to help mitigate some of these concerns. Despite their promising results, this deep learning based approach had limited accuracy, was computationally expensive to train, and wasn’t able to provide the precise features that were used in its predictions. To address these shortcomings, we developed PlasmidHawk for lab-of-origin prediction. Compared to a machine learning approach, PlasmidHawk has higher prediction accuracy; PlasmidHawk can successfully predict unknown sequences’ depositing labs 76% of the time and 85% of the time the correct lab is in the top 10 candidates. In addition, PlasmidHawk can precisely single out the signature sub-sequences that are responsible for the lab-of-origin detection. In summary, PlasmidHawk represents an explainable and accurate tool for lab-of-origin prediction of synthetic plasmid sequences. PlasmidHawk is available at https://gitlab.com/treangenlab/plasmidhawk.git.

Download Full-text