Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data

Jamal;  Xianqiao;  Aldabbas

doi:10.3390/fi11090190

Deep Learning-Based Sentimental Analysis for Large-Scale Imbalanced Twitter Data

Future Internet ◽

10.3390/fi11090190 ◽

2019 ◽

Vol 11 (9) ◽

pp. 190 ◽

Cited By ~ 3

Author(s):

Jamal ◽

Xianqiao ◽

Aldabbas

Keyword(s):

Deep Learning ◽

Large Scale ◽

State Of The Art ◽

Hybrid Approach ◽

Principal Component ◽

Specific Topic ◽

Weighting Method ◽

Psychological Conditions ◽

Twitter Data ◽

Wide Range

Emotions detection in social media is very effective to measure the mood of people about a specific topic, news, or product. It has a wide range of applications, including identifying psychological conditions such as anxiety or depression in users. However, it is a challenging task to distinguish useful emotions’ features from a large corpus of text because emotions are subjective, with limited fuzzy boundaries that may be expressed in different terminologies and perceptions. To tackle this issue, this paper presents a hybrid approach of deep learning based on TensorFlow with Keras for emotions detection on a large scale of imbalanced tweets’ data. First, preprocessing steps are used to get useful features from raw tweets without noisy data. Second, the entropy weighting method is used to compute the importance of each feature. Third, class balancer is applied to balance each class. Fourth, Principal Component Analysis (PCA) is applied to transform high correlated features into normalized forms. Finally, the TensorFlow based deep learning with Keras algorithm is proposed to predict high-quality features for emotions classification. The proposed methodology is analyzed on a dataset of 1,600,000 tweets collected from the website ‘kaggle’. Comparison is made of the proposed approach with other state of the art techniques on different training ratios. It is proved that the proposed approach outperformed among other techniques.

Download Full-text

Uncertainty-Aware Deep Learning-Based Cardiac Arrhythmias Classification Model of Electrocardiogram Signals

Computers ◽

10.3390/computers10060082 ◽

2021 ◽

Vol 10 (6) ◽

pp. 82

Author(s):

Ahmad O. Aseeri

Keyword(s):

Deep Learning ◽

Cardiac Arrhythmias ◽

Large Scale ◽

Clinical Decision Making ◽

Probabilistic Approach ◽

Classification Model ◽

Gating Mechanism ◽

Uncertainty Estimates ◽

Wide Range

Deep Learning-based methods have emerged to be one of the most effective and practical solutions in a wide range of medical problems, including the diagnosis of cardiac arrhythmias. A critical step to a precocious diagnosis in many heart dysfunctions diseases starts with the accurate detection and classification of cardiac arrhythmias, which can be achieved via electrocardiograms (ECGs). Motivated by the desire to enhance conventional clinical methods in diagnosing cardiac arrhythmias, we introduce an uncertainty-aware deep learning-based predictive model design for accurate large-scale classification of cardiac arrhythmias successfully trained and evaluated using three benchmark medical datasets. In addition, considering that the quantification of uncertainty estimates is vital for clinical decision-making, our method incorporates a probabilistic approach to capture the model’s uncertainty using a Bayesian-based approximation method without introducing additional parameters or significant changes to the network’s architecture. Although many arrhythmias classification solutions with various ECG feature engineering techniques have been reported in the literature, the introduced AI-based probabilistic-enabled method in this paper outperforms the results of existing methods in outstanding multiclass classification results that manifest F1 scores of 98.62% and 96.73% with (MIT-BIH) dataset of 20 annotations, and 99.23% and 96.94% with (INCART) dataset of eight annotations, and 97.25% and 96.73% with (BIDMC) dataset of six annotations, for the deep ensemble and probabilistic mode, respectively. We demonstrate our method’s high-performing and statistical reliability results in numerical experiments on the language modeling using the gating mechanism of Recurrent Neural Networks.

Download Full-text

SHEDR: An End-to-End Deep Neural Event Detection and Recommendation Framework for Hyperlocal News Using Social Media

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1112 ◽

2021 ◽

Author(s):

Yuheng Hu ◽

Yili Hong

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Event Detection ◽

Large Scale ◽

Short Term Memory ◽

State Of The Art ◽

Neural Network Models ◽

Neural Event ◽

End To End

Residents often rely on newspapers and television to gather hyperlocal news for community awareness and engagement. More recently, social media have emerged as an increasingly important source of hyperlocal news. Thus far, the literature on using social media to create desirable societal benefits, such as civic awareness and engagement, is still in its infancy. One key challenge in this research stream is to timely and accurately distill information from noisy social media data streams to community members. In this work, we develop SHEDR (social media–based hyperlocal event detection and recommendation), an end-to-end neural event detection and recommendation framework with a particular use case for Twitter to facilitate residents’ information seeking of hyperlocal events. The key model innovation in SHEDR lies in the design of the hyperlocal event detector and the event recommender. First, we harness the power of two popular deep neural network models, the convolutional neural network (CNN) and long short-term memory (LSTM), in a novel joint CNN-LSTM model to characterize spatiotemporal dependencies for capturing unusualness in a region of interest, which is classified as a hyperlocal event. Next, we develop a neural pairwise ranking algorithm for recommending detected hyperlocal events to residents based on their interests. To alleviate the sparsity issue and improve personalization, our algorithm incorporates several types of contextual information covering topic, social, and geographical proximities. We perform comprehensive evaluations based on two large-scale data sets comprising geotagged tweets covering Seattle and Chicago. We demonstrate the effectiveness of our framework in comparison with several state-of-the-art approaches. We show that our hyperlocal event detection and recommendation models consistently and significantly outperform other approaches in terms of precision, recall, and F-1 scores. Summary of Contribution: In this paper, we focus on a novel and important, yet largely underexplored application of computing—how to improve civic engagement in local neighborhoods via local news sharing and consumption based on social media feeds. To address this question, we propose two new computational and data-driven methods: (1) a deep learning–based hyperlocal event detection algorithm that scans spatially and temporally to detect hyperlocal events from geotagged Twitter feeds; and (2) A personalized deep learning–based hyperlocal event recommender system that systematically integrates several contextual cues such as topical, geographical, and social proximity to recommend the detected hyperlocal events to potential users. We conduct a series of experiments to examine our proposed models. The outcomes demonstrate that our algorithms are significantly better than the state-of-the-art models and can provide users with more relevant information about the local neighborhoods that they live in, which in turn may boost their community engagement.

Download Full-text

DeepMAsED: evaluating the quality of metagenomic assemblies

Bioinformatics ◽

10.1093/bioinformatics/btaa124 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3011-3017 ◽

Cited By ~ 5

Author(s):

Olga Mineeva ◽

Mateo Rojas-Carulla ◽

Ruth E Ley ◽

Bernhard Schölkopf ◽

Nicholas D Youngblut

Keyword(s):

Large Scale ◽

State Of The Art ◽

Ground Truth ◽

Supplementary Information ◽

Learning Approach ◽

Wide Range ◽

Metagenome Assembly ◽

Model Training ◽

Reference Genomes

Abstract Motivation Methodological advances in metagenome assembly are rapidly increasing in the number of published metagenome assemblies. However, identifying misassemblies is challenging due to a lack of closely related reference genomes that can act as pseudo ground truth. Existing reference-free methods are no longer maintained, can make strong assumptions that may not hold across a diversity of research projects, and have not been validated on large-scale metagenome assemblies. Results We present DeepMAsED, a deep learning approach for identifying misassembled contigs without the need for reference genomes. Moreover, we provide an in silico pipeline for generating large-scale, realistic metagenome assemblies for comprehensive model training and testing. DeepMAsED accuracy substantially exceeds the state-of-the-art when applied to large and complex metagenome assemblies. Our model estimates a 1% contig misassembly rate in two recent large-scale metagenome assembly publications. Conclusions DeepMAsED accurately identifies misassemblies in metagenome-assembled contigs from a broad diversity of bacteria and archaea without the need for reference genomes or strong modeling assumptions. Running DeepMAsED is straight-forward, as well as is model re-training with our dataset generation pipeline. Therefore, DeepMAsED is a flexible misassembly classifier that can be applied to a wide range of metagenome assembly projects. Availability and implementation DeepMAsED is available from GitHub at https://github.com/leylabmpi/DeepMAsED. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Self-Supervised Pre-Training of Transformers for Satellite Image Time Series Classification

10.36227/techrxiv.13025039.v1 ◽

2020 ◽

Author(s):

Yuan Yuan ◽

Lei Lin

Keyword(s):

Time Series ◽

Deep Learning ◽

Large Scale ◽

Temporal Structure ◽

Satellite Image ◽

Fine Tuning ◽

Small Scale ◽

Model Parameters ◽

Learning Approaches ◽

Wide Range

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>

Download Full-text

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6174 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6917-6924 ◽

Cited By ~ 1

Author(s):

Ya Zhao ◽

Rui Xu ◽

Xinchao Wang ◽

Peng Hou ◽

Haihong Tang ◽

...

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Error Rate ◽

Large Scale ◽

State Of The Art ◽

Lip Reading ◽

Speech Recognizers ◽

Lip Movement ◽

Knowledge Distillation ◽

The One

Lip reading has witnessed unparalleled development in recent years thanks to deep learning and the availability of large-scale datasets. Despite the encouraging results achieved, the performance of lip reading, unfortunately, remains inferior to the one of its counterpart speech recognition, due to the ambiguous nature of its actuations that makes it challenging to extract discriminant features from the lip movement videos. In this paper, we propose a new method, termed as Lip by Speech (LIBS), of which the goal is to strengthen lip reading by learning from speech recognizers. The rationale behind our approach is that the features extracted from speech recognizers may provide complementary and discriminant clues, which are formidable to be obtained from the subtle movements of the lips, and consequently facilitate the training of lip readers. This is achieved, specifically, by distilling multi-granularity knowledge from speech recognizers to lip readers. To conduct this cross-modal knowledge distillation, we utilize an efficacious alignment scheme to handle the inconsistent lengths of the audios and videos, as well as an innovative filtering strategy to refine the speech recognizer's prediction. The proposed method achieves the new state-of-the-art performance on the CMLR and LRS2 datasets, outperforming the baseline by a margin of 7.66% and 2.75% in character error rate, respectively.

Download Full-text

A New Hybrid Approach to Forecast Wind Power for Large Scale Wind Turbine Data Using Deep Learning with TensorFlow Framework and Principal Component Analysis

Energies ◽

10.3390/en12122229 ◽

2019 ◽

Vol 12 (12) ◽

pp. 2229 ◽

Cited By ~ 3

Author(s):

Mansoor Khan ◽

Tianqi Liu ◽

Farhan Ullah

Keyword(s):

Principal Component Analysis ◽

Renewable Energy ◽

Deep Learning ◽

Wind Power ◽

Learning Algorithm ◽

Hybrid Approach ◽

Principal Component ◽

Component Analysis ◽

Wind Data ◽

Deep Learning Algorithm

Wind power forecasting plays a vital role in renewable energy production. Accurately forecasting wind energy is a significant challenge due to the uncertain and complex behavior of wind signals. For this purpose, accurate prediction methods are required. This paper presents a new hybrid approach of principal component analysis (PCA) and deep learning to uncover the hidden patterns from wind data and to forecast accurate wind power. PCA is applied to wind data to extract the hidden features from wind data and to identify meaningful information. It is also used to remove high correlation among the values. Further, an optimized deep learning algorithm with a TensorFlow framework is used to accurately forecast wind power from significant features. Finally, the deep learning algorithm is fine-tuned with learning error rate, optimizer function, dropout layer, activation and loss function. The algorithm uses a neural network and intelligent algorithm to predict the wind signals. The proposed idea is applied to three different datasets (hourly, monthly, yearly) gathered from the National Renewable Energy Laboratory (NREL) transforming energy database. The forecasting results show that the proposed research can accurately predict wind power using a span ranging from hours to years. A comparison is made with popular state of the art algorithms and it is demonstrated that the proposed research yields better predictions results.

Download Full-text

Enhancing Multi-tissue and Multi-scale Cell Nuclei Segmentation with Deep Metric Learning

Applied Sciences ◽

10.3390/app10020615 ◽

2020 ◽

Vol 10 (2) ◽

pp. 615 ◽

Cited By ~ 2

Author(s):

Tomas Iesmantas ◽

Agne Paulauskaite-Taraseviciene ◽

Kristina Sutiene

Keyword(s):

Deep Learning ◽

Large Scale ◽

Metric Learning ◽

Cell Nuclei ◽

Similarity Coefficients ◽

Clinical Practices ◽

Nuclei Segmentation ◽

Wide Range ◽

Triplet Loss ◽

Deep Metric Learning

(1) Background: The segmentation of cell nuclei is an essential task in a wide range of biomedical studies and clinical practices. The full automation of this process remains a challenge due to intra- and internuclear variations across a wide range of tissue morphologies, differences in staining protocols and imaging procedures. (2) Methods: A deep learning model with metric embeddings such as contrastive loss and triplet loss with semi-hard negative mining is proposed in order to accurately segment cell nuclei in a diverse set of microscopy images. The effectiveness of the proposed model was tested on a large-scale multi-tissue collection of microscopy image sets. (3) Results: The use of deep metric learning increased the overall segmentation prediction by 3.12% in the average value of Dice similarity coefficients as compared to no metric learning. In particular, the largest gain was observed for segmenting cell nuclei in H&E -stained images when deep learning network and triplet loss with semi-hard negative mining were considered for the task. (4) Conclusion: We conclude that deep metric learning gives an additional boost to the overall learning process and consequently improves the segmentation performance. Notably, the improvement ranges approximately between 0.13% and 22.31% for different types of images in the terms of Dice coefficients when compared to no metric deep learning.

Download Full-text

Scale-Covariant and Scale-Invariant Gaussian Derivative Networks

Journal of Mathematical Imaging and Vision ◽

10.1007/s10851-021-01057-9 ◽

2021 ◽

Author(s):

Tony Lindeberg

Keyword(s):

Deep Learning ◽

Network Architecture ◽

Large Scale ◽

Hybrid Approach ◽

Multiple Scale ◽

Scale Space ◽

Training Data ◽

Scale Invariant ◽

Gaussian Derivative ◽

Space Operations

AbstractThis paper presents a hybrid approach between scale-space theory and deep learning, where a deep learning architecture is constructed by coupling parameterized scale-space operations in cascade. By sharing the learnt parameters between multiple scale channels, and by using the transformation properties of the scale-space primitives under scaling transformations, the resulting network becomes provably scale covariant. By in addition performing max pooling over the multiple scale channels, or other permutation-invariant pooling over scales, a resulting network architecture for image classification also becomes provably scale invariant. We investigate the performance of such networks on the MNIST Large Scale dataset, which contains rescaled images from the original MNIST dataset over a factor of 4 concerning training data and over a factor of 16 concerning testing data. It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not spanned by the training data.

Download Full-text

Advanced Multifocus Image Fusion algorithm using FPDCT with Modified PCA

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5312.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 175-184

Keyword(s):

Image Fusion ◽

Wavelet Transforms ◽

Quality Evaluation ◽

State Of The Art ◽

Principal Component ◽

Fusion Algorithm ◽

Pixel Intensity ◽

Wide Range ◽

Value Decomposition ◽

Processing Techniques

Image fusion has been performed and reported in this paper for multi-focused images using Frequency Partition Discrete Cosine Transform (FP-DCT) with Modified Principal component analysis (MPCA) technique. The image fusion with decomposition at fixed levels may be treated as a very critical rule in the earlier image processing techniques. The frequency partitioning approach was used in this study to select the decomposition levels based on the pixel intensity and clarity. This paper also presents the modified PCA technique which provides dimensionality reduction. The wide range of quality evaluation metrics was computed to compare the fusion performance on the five images. Different techniques such as PCA, wavelet transforms with PCA, Multiresolution Singular Value Decomposition (MSVD) with PCA, Multiresolution DCT (MRDCT) with PCA, Frequency partitioning DCT (FP-DCT) with PCA were computed for comparison with the proposed FP-DCT Modified PCA (MPCA) technique. Images obtained after fusion process obtained by the method proposed shows enhanced visual quality, negligible information loss and discontinuities in the image than compared to other state of the art methods.

Download Full-text

Crop Rotation Modeling for Deep Learning-Based Parcel Classification from Satellite Time Series

Remote Sensing ◽

10.3390/rs13224599 ◽

2021 ◽

Vol 13 (22) ◽

pp. 4599

Author(s):

Félix Quinton ◽

Loic Landrieu

Keyword(s):

Time Series ◽

Deep Learning ◽

Crop Rotation ◽

Large Scale ◽

State Of The Art ◽

Crop Rotations ◽

Learning Approach ◽

Type Mapping ◽

Current State ◽

Crop Type

While annual crop rotations play a crucial role for agricultural optimization, they have been largely ignored for automated crop type mapping. In this paper, we take advantage of the increasing quantity of annotated satellite data to propose to model simultaneously the inter- and intra-annual agricultural dynamics of yearly parcel classification with a deep learning approach. Along with simple training adjustments, our model provides an improvement of over 6.3% mIoU over the current state-of-the-art of crop classification, and a reduction of over 21% of the error rate. Furthermore, we release the first large-scale multi-year agricultural dataset with over 300,000 annotated parcels.

Download Full-text