scholarly journals Chiron: Translating nanopore raw signal directly into nucleotide sequence using deep learning

2017 ◽  
Author(s):  
Haotian Teng ◽  
Minh Duc Cao ◽  
Michael B. Hall ◽  
Tania Duarte ◽  
Sheng Wang ◽  
...  

ABSTRACTSequencing by translocating DNA fragments through an array of nanopores is a rapidly maturing technology which offers faster and cheaper sequencing than other approaches. However, accurately deciphering the DNA sequence from the noisy and complex electrical signal is challenging. Here, we report Chiron, the first deep learning model to achieve end-to-end basecalling: directly translating the raw signal to DNA sequence without the error-prone segmentation step. Trained with only a small set of 4000 reads, we show that our model provides state-of-the-art basecalling accuracy even on previously unseen species. Chiron achieves basecalling speeds of over 2000 bases per second using desktop computer graphics processing units.

2020 ◽  
Vol 41 (Supplement_2) ◽  
Author(s):  
S Rao ◽  
Y Li ◽  
R Ramakrishnan ◽  
A Hassaine ◽  
D Canoy ◽  
...  

Abstract Background/Introduction Predicting incident heart failure has been challenging. Deep learning models when applied to rich electronic health records (EHR) offer some theoretical advantages. However, empirical evidence for their superior performance is limited and they remain commonly uninterpretable, hampering their wider use in medical practice. Purpose We developed a deep learning framework for more accurate and yet interpretable prediction of incident heart failure. Methods We used longitudinally linked EHR from practices across England, involving 100,071 patients, 13% of whom had been diagnosed with incident heart failure during follow-up. We investigated the predictive performance of a novel transformer deep learning model, “Transformer for Heart Failure” (BEHRT-HF), and validated it using both an external held-out dataset and an internal five-fold cross-validation mechanism using area under receiver operating characteristic (AUROC) and area under the precision recall curve (AUPRC). Predictor groups included all outpatient and inpatient diagnoses within their temporal context, medications, age, and calendar year for each encounter. By treating diagnoses as anchors, we alternatively removed different modalities (ablation study) to understand the importance of individual modalities to the performance of incident heart failure prediction. Using perturbation-based techniques, we investigated the importance of associations between selected predictors and heart failure to improve model interpretability. Results BEHRT-HF achieved high accuracy with AUROC 0.932 and AUPRC 0.695 for external validation, and AUROC 0.933 (95% CI: 0.928, 0.938) and AUPRC 0.700 (95% CI: 0.682, 0.718) for internal validation. Compared to the state-of-the-art recurrent deep learning model, RETAIN-EX, BEHRT-HF outperformed it by 0.079 and 0.030 in terms of AUPRC and AUROC. Ablation study showed that medications were strong predictors, and calendar year was more important than age. Utilising perturbation, we identified and ranked the intensity of associations between diagnoses and heart failure. For instance, the method showed that established risk factors including myocardial infarction, atrial fibrillation and flutter, and hypertension all strongly associated with the heart failure prediction. Additionally, when population was stratified into different age groups, incident occurrence of a given disease had generally a higher contribution to heart failure prediction in younger ages than when diagnosed later in life. Conclusions Our state-of-the-art deep learning framework outperforms the predictive performance of existing models whilst enabling a data-driven way of exploring the relative contribution of a range of risk factors in the context of other temporal information. Funding Acknowledgement Type of funding source: Private grant(s) and/or Sponsorship. Main funding source(s): National Institute for Health Research, Oxford Martin School, Oxford Biomedical Research Centre


Author(s):  
Usman Ahmed ◽  
Jerry Chun-Wei Lin ◽  
Gautam Srivastava

Deep learning methods have led to a state of the art medical applications, such as image classification and segmentation. The data-driven deep learning application can help stakeholders to collaborate. However, limited labelled data set limits the deep learning algorithm to generalize for one domain into another. To handle the problem, meta-learning helps to learn from a small set of data. We proposed a meta learning-based image segmentation model that combines the learning of the state-of-the-art model and then used it to achieve domain adoption and high accuracy. Also, we proposed a prepossessing algorithm to increase the usability of the segments part and remove noise from the new test image. The proposed model can achieve 0.94 precision and 0.92 recall. The ability to increase 3.3% among the state-of-the-art algorithms.


2021 ◽  
Vol 14 (11) ◽  
pp. 1950-1963
Author(s):  
Jie Liu ◽  
Wenqian Dong ◽  
Qingqing Zhou ◽  
Dong Li

Cardinality estimation is a fundamental and critical problem in databases. Recently, many estimators based on deep learning have been proposed to solve this problem and they have achieved promising results. However, these estimators struggle to provide accurate results for complex queries, due to not capturing real inter-column and inter-table correlations. Furthermore, none of these estimators contain the uncertainty information about their estimations. In this paper, we present a join cardinality estimator called Fauce. Fauce learns the correlations across all columns and all tables in the database. It also contains the uncertainty information of each estimation. Among all studied learned estimators, our results are promising: (1) Fauce is a light-weight estimator, it has 10× faster inference speed than the state of the art estimator; (2) Fauce is robust to the complex queries, it provides 1.3×--6.7× smaller estimation errors for complex queries compared with the state of the art estimator; (3) To the best of our knowledge, Fauce is the first estimator that incorporates uncertainty information for cardinality estimation into a deep learning model.


Author(s):  
Yang Liu ◽  
Yachao Yuan ◽  
Jing Liu

Abstract Automatic defect classification is vital to ensure product quality, especially for steel production. In the real world, the amount of collected samples with labels is limited due to high labor costs, and the gathered dataset is usually imbalanced, making accurate steel defect classification very challenging. In this paper, a novel deep learning model for imbalanced multi-label surface defect classification, named ImDeep, is proposed. It can be deployed easily in steel production lines to identify different defect types on the steel's surface. ImDeep incorporates three key techniques, i.e., Imbalanced Sampler, Fussy-FusionNet, and Transfer Learning. It improves the model's classification performance with multi-label and reduces the model's complexity over small datasets with low latency. The performance of different fusion strategies and three key techniques of ImDeep is verified. Simulation results prove that ImDeep accomplishes better performance than the state-of-the-art over the public dataset with varied sizes. Specifically, ImDeep achieves about 97% accuracy of steel surface defect classification over a small imbalanced dataset with a low latency, which improves about 10% compared with that of the state-of-the-art.


2020 ◽  
Vol 12 (2) ◽  
pp. 21-34
Author(s):  
Mostefai Abdelkader

In recent years, increasing attention is being paid to sentiment analysis on microblogging platforms such as Twitter. Sentiment analysis refers to the task of detecting whether a textual item (e.g., a tweet) contains an opinion about a topic. This paper proposes a probabilistic deep learning approach for sentiments analysis. The deep learning model used is a convolutional neural network (CNN). The main contribution of this approach is a new probabilistic representation of the text to be fed as input to the CNN. This representation is a matrix that stores for each word composing the message the probability that it belongs to a positive class and the probability that it belongs to a negative class. The proposed approach is evaluated on four well-known datasets HCR, OMD, STS-gold, and a dataset provided by the SemEval-2017 Workshop. The results of the experiments show that the proposed approach competes with the state-of-the-art sentiment analyzers and has the potential to detect sentiments from textual data in an effective manner.


2020 ◽  
Vol 12 (18) ◽  
pp. 3020
Author(s):  
Piotr Szymak ◽  
Paweł Piskur ◽  
Krzysztof Naus

Video image processing and object classification using a Deep Learning Neural Network (DLNN) can significantly increase the autonomy of underwater vehicles. This paper describes the results of a project focused on using DLNN for Object Classification in Underwater Video (OCUV) implemented in a Biomimetic Underwater Vehicle (BUV). The BUV is intended to be used to detect underwater mines, explore shipwrecks or observe the process of corrosion of munitions abandoned on the seabed after World War II. Here, the pretrained DLNNs were used for classification of the following type of objects: fishes, underwater vehicles, divers and obstacles. The results of our research enabled us to estimate the effectiveness of using pretrained DLNNs for classification of different objects under the complex Baltic Sea environment. The Genetic Algorithm (GA) was used to establish tuning parameters of the DLNNs. Three different training methods were compared for AlexNet, then one training method was chosen for fifteen networks and the tests were provided with the description of the final results. The DLNNs were trained on servers with six medium class Graphics Processing Units (GPUs). Finally, the trained DLNN was implemented in the Nvidia JetsonTX2 platform installed on board of the BUV, and one of the network was verified in a real environment.


2010 ◽  
Vol 18 (1) ◽  
pp. 1-33 ◽  
Author(s):  
Andre R. Brodtkorb ◽  
Christopher Dyken ◽  
Trond R. Hagen ◽  
Jon M. Hjelmervik ◽  
Olaf O. Storaasli

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.


2018 ◽  
Author(s):  
John-William Sidhom ◽  
Alexander S. Baras

ABSTRACTDeep learning is an area of artificial intelligence that has received much attention in the past few years due to both an increase in computational power with the increased use of graphics processing units (GPU’s) for computational analyses and the performance of these class of algorithms on visual recognition tasks. They have found utility in applications ranging from image search to facial recognition for security and social media purposes. Their continued success has propelled their use across many new domains including the medical field, in areas of radiology and pathology in particular, as these fields are thought to be driven by visual recognition tasks. In this paper, we present an application of deep learning, termed ‘transfer learning’, using ResNet50, a pre-trained convolutional neural network (CNN) to act as a ‘feature-detector’ at various magnifications to identify low and high level features in digital pathology images of various breast lesions for the purpose of classifying them correctly into the labels of normal, benign, in-situ, or invasive carcinoma as provided in the ICIAR 2018 Breast Cancer Histology Challenge (BACH).


2021 ◽  
Author(s):  
Ranit Karmakar ◽  
Saeid Nooshabadi

Abstract Colon polyps, small clump of cells on the lining of the colon can lead to Colorectal cancer (CRC), one of the leading types of cancer globally. Hence, early detection of these polyps is crucial in the prevention of CRC. This paper proposes a lightweight deep learning model for colorectal polyp segmentation that achieved state-of-the-art accuracy while significantly reducing the model size and complexity. The proposed deep learning autoencoder model employs a set of state-of-the-art architectural blocks and optimization objective functions to achieve the desired efficiency. The model is trained and tested on five publicly available colorectal polyp segmentation datasets (CVC-ClinicDB, CVC-ColonDB, EndoScene, Kvasir, and ETIS). We also performed ablation testing on the model to test various aspects of the autoencoder architecture. We performed the model evaluation using most of the common image segmentation metrics. The backbone model achieved a dice score of 0.935 on the Kvasir dataset and 0.945 on the CVC-ClinicDB dataset improving the accuracy by 4.12% and 5.12% respectively over the current state-of-the-art network, while using 88 times fewer parameters, 40 times less storage space, and being computationally 17 times more efficient. Our ablation study showed that the addition of ConvSkip in the autoencoder slightly improves the model’s performance but it was not significant (p-value=0.815).


Author(s):  
George Kolokolnikov ◽  
Anna Borde ◽  
Victor Skuratov ◽  
Roman Gaponov ◽  
Anastasiya Rumyantseva

The paper is devoted to the development of QRS segmentation system based on deep learning approach. The considered segmentation problem plays an important role in the automatic analysis of heart rhythms, which makes it possible to identify life-threatening pathologies. The main goal of the research is to choose the best segmentation pipeline in terms of accuracy and time-efficiency. Process of ECG-signal analysis is described, and the problem of QRS segmentation is discussed. State-of-the-art algorithms are analyzed in literature review section and the most prominent are chosen for further research. In the course of the research, four hypotheses about appropriate deep learning model are checked: LSTM-based model, 2-input 1-dimensional CNN model, “signal-to-picture” approach based on 2-dimensional CNN, and the simplest 1-dimensional CNN model. All the architectures are tested, and their advantages and disadvantages are discussed. The proposed ECG segmentation pipeline is developed for Holter monitor software.


Sign in / Sign up

Export Citation Format

Share Document