Impact of Video Compression and Multimodal Embedding on Scene Description

Jin Young Lee

doi:10.3390/electronics8090963

Impact of Video Compression and Multimodal Embedding on Scene Description

Electronics ◽

10.3390/electronics8090963 ◽

2019 ◽

Vol 8 (9) ◽

pp. 963

Author(s):

Jin Young Lee

Keyword(s):

Deep Learning ◽

Video Compression ◽

Automatic Generation ◽

Image Motion ◽

Label Information ◽

Scene Description ◽

The Impact ◽

Potential Issue ◽

Simple Network ◽

Compression Artifacts

Scene description refers to the automatic generation of natural language descriptions from videos. In general, deep learning-based scene description networks utilize multimodalities, such as image, motion, audio, and label information, to improve the description quality. In particular, image information plays an important role in scene description. However, scene description has a potential issue, because it may handle images with severe compression artifacts. Hence, this paper analyzes the impact of video compression on scene description, and then proposes a simple network that is robust to compression artifacts. In addition, a network cascading more encoding layers for efficient multimodal embedding is also proposed. Experimental results show that the proposed network is more efficient than conventional networks.

Download Full-text

Impact of Scene Content on High Resolution Video Quality

Sensors ◽

10.3390/s21082872 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2872

Author(s):

Miroslav Uhrina ◽

Anna Holesova ◽

Juraj Bienik ◽

Lukas Sevcik

Keyword(s):

Video Compression ◽

Spatial Information ◽

Video Quality ◽

Video Sequences ◽

Compression Efficiency ◽

Opinion Score ◽

Scene Description ◽

Two Parameters ◽

The University ◽

The Impact

This paper deals with the impact of content on the perceived video quality evaluated using the subjective Absolute Category Rating (ACR) method. The assessment was conducted on eight types of video sequences with diverse content obtained from the SJTU dataset. The sequences were encoded at 5 different constant bitrates in two widely video compression standards H.264/AVC and H.265/HEVC at Full HD and Ultra HD resolutions, which means 160 annotated video sequences were created. The length of Group of Pictures (GOP) was set to half the framerate value, as is typical for video intended for transmission over a noisy communication channel. The evaluation was performed in two laboratories: one situated at the University of Zilina, and the second at the VSB—Technical University in Ostrava. The results acquired in both laboratories reached/showed a high correlation. Notwithstanding the fact that the sequences with low Spatial Information (SI) and Temporal Information (TI) values reached better Mean Opinion Score (MOS) score than the sequences with higher SI and TI values, these two parameters are not sufficient for scene description, and this domain should be the subject of further research. The evaluation results led us to the conclusion that it is unnecessary to use the H.265/HEVC codec for compression of Full HD sequences and the compression efficiency of the H.265 codec by the Ultra HD resolution reaches the compression efficiency of both codecs by the Full HD resolution. This paper also includes the recommendations for minimum bitrate thresholds at which the video sequences at both resolutions retain good and fair subjectively perceived quality.

Download Full-text

Yet Another Automated Gleason Grading System (YAAGGS) by weakly supervised deep learning

npj Digital Medicine ◽

10.1038/s41746-021-00469-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Yechan Mun ◽

Inyoung Paik ◽

Su-Jin Shin ◽

Tae-Yeong Kwak ◽

Hyeyoon Chang

Keyword(s):

Deep Learning ◽

Automatic Generation ◽

Grading System ◽

Case Volume ◽

Grade Group ◽

Kappa Score ◽

Method Performance ◽

Gleason Grading ◽

Extensive Region ◽

The Impact

AbstractThe Gleason score contributes significantly in predicting prostate cancer outcomes and selecting the appropriate treatment option, which is affected by well-known inter-observer variations. We present a novel deep learning-based automated Gleason grading system that does not require extensive region-level manual annotations by experts and/or complex algorithms for the automatic generation of region-level annotations. A total of 6664 and 936 prostate needle biopsy single-core slides (689 and 99 cases) from two institutions were used for system discovery and validation, respectively. Pathological diagnoses were converted into grade groups and used as the reference standard. The grade group prediction accuracy of the system was 77.5% (95% confidence interval (CI): 72.3–82.7%), the Cohen’s kappa score (κ) was 0.650 (95% CI: 0.570–0.730), and the quadratic-weighted kappa score (κquad) was 0.897 (95% CI: 0.815–0.979). When trained on 621 cases from one institution and validated on 167 cases from the other institution, the system’s accuracy reached 67.4% (95% CI: 63.2–71.6%), κ 0.553 (95% CI: 0.495–0.610), and the κquad 0.880 (95% CI: 0.822–0.938). In order to evaluate the impact of the proposed method, performance comparison with several baseline methods was also performed. While limited by case volume and a few more factors, the results of this study can contribute to the potential development of an artificial intelligence system to diagnose other cancers without extensive region-level annotations.

Download Full-text

Predictive Analysis of the Impact of Corporate R&D Support Using Deep Learning

Journal of Korea Technology Innovation Society ◽

10.35978/jktis.2020.2.23.1.20 ◽

2020 ◽

Vol 23 (1) ◽

pp. 20-41

Author(s):

Pilseong Jang ◽

Jaeyoun You ◽

Seung Hwan Oh

Keyword(s):

Deep Learning ◽

Predictive Analysis ◽

The Impact

Download Full-text

An Ensemble Deep Learning Approach to Explore the Impact of Enticement, Engagement and Experience in Reward Based Crowdfunding

SSRN Electronic Journal ◽

10.2139/ssrn.3615176 ◽

2020 ◽

Author(s):

Arvind Srinivasan ◽

Akilandeswari P

Keyword(s):

Deep Learning ◽

Learning Approach ◽

The Impact

Download Full-text

The impact of deep learning on document classification using semantically rich representations

Information Processing & Management ◽

10.1016/j.ipm.2019.05.003 ◽

2019 ◽

Vol 56 (5) ◽

pp. 1618-1632 ◽

Cited By ~ 19

Author(s):

Zenun Kastrati ◽

Ali Shariq Imran ◽

Sule Yildirim Yayilgan

Keyword(s):

Deep Learning ◽

Document Classification ◽

The Impact

Download Full-text

Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media

Journal Of Big Data ◽

10.1186/s40537-021-00488-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yahya Albalawi ◽

Jim Buckley ◽

Nikola S. Nikolov

Keyword(s):

Social Media ◽

Deep Learning ◽

Comprehensive Evaluation ◽

Classification Problem ◽

Data Sets ◽

Word Embeddings ◽

Data Set ◽

Lower Accuracy ◽

Health Related ◽

The Impact

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.

Download Full-text

Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2020-0015 ◽

2020 ◽

Vol 6 (1) ◽

Author(s):

Malte Seemann ◽

Lennart Bargsten ◽

Alexander Schlaefer

Keyword(s):

Computed Tomography ◽

Neural Networks ◽

Deep Learning ◽

Medical Imaging ◽

Computed Tomography Angiography ◽

Data Augmentation ◽

Domain Adaptation ◽

Synthetic Image ◽

Wide Range ◽

The Impact

AbstractDeep learning methods produce promising results when applied to a wide range of medical imaging tasks, including segmentation of artery lumen in computed tomography angiography (CTA) data. However, to perform sufficiently, neural networks have to be trained on large amounts of high quality annotated data. In the realm of medical imaging, annotations are not only quite scarce but also often not entirely reliable. To tackle both challenges, we developed a two-step approach for generating realistic synthetic CTA data for the purpose of data augmentation. In the first step moderately realistic images are generated in a purely numerical fashion. In the second step these images are improved by applying neural domain adaptation. We evaluated the impact of synthetic data on lumen segmentation via convolutional neural networks (CNNs) by comparing resulting performances. Improvements of up to 5% in terms of Dice coefficient and 20% for Hausdorff distance represent a proof of concept that the proposed augmentation procedure can be used to enhance deep learning-based segmentation for artery lumen in CTA images.

Download Full-text

Are e-learning Webinars the future of medical education? An exploratory study of a disruptive innovation in the COVID-19 era

Cardiology in the Young ◽

10.1017/s1047951120004503 ◽

2020 ◽

pp. 1-10

Author(s):

Colin J. McMahon ◽

Justin T. Tretter ◽

Theresa Faulkner ◽

R. Krishna Kumar ◽

Andrew N. Redington ◽

...

Keyword(s):

Deep Learning ◽

Carbon Footprint ◽

Quantitative Research ◽

Survey Design ◽

Hybrid Approach ◽

Disruptive Innovation ◽

Cross Sectional Survey ◽

Cross Sectional ◽

E Learning ◽

The Impact

Abstract Objective: This study investigated the impact of the Webinar on deep human learning of CHD. Materials and methods: This cross-sectional survey design study used an open and closed-ended questionnaire to assess the impact of the Webinar on deep learning of topical areas within the management of the post-operative tetralogy of Fallot patients. This was a quantitative research methodology using descriptive statistical analyses with a sequential explanatory design. Results: One thousand-three-hundred and seventy-four participants from 100 countries on 6 continents joined the Webinar, 557 (40%) of whom completed the questionnaire. Over 70% of participants reported that they “agreed” or “strongly agreed” that the Webinar format promoted deep learning for each of the topics compared to other standard learning methods (textbook and journal learning). Two-thirds expressed a preference for attending a Webinar rather than an international conference. Over 80% of participants highlighted significant barriers to attending conferences including cost (79%), distance to travel (49%), time commitment (51%), and family commitments (35%). Strengths of the Webinar included expertise, concise high-quality presentations often discussing contentious issues, and the platform quality. The main weakness was a limited time for questions. Just over 53% expressed a concern for the carbon footprint involved in attending conferences and preferred to attend a Webinar. Conclusion: E-learning Webinars represent a disruptive innovation, which promotes deep learning, greater multidisciplinary participation, and greater attendee satisfaction with fewer barriers to participation. Although Webinars will never fully replace conferences, a hybrid approach may reduce the need for conferencing, reduce carbon footprint. and promote a “sustainable academia”.

Download Full-text

The Impact of Arabic Part of Speech Tagging on Sentiment Analysis: A New Corpus and Deep Learning Approach

Procedia Computer Science ◽

10.1016/j.procs.2021.03.026 ◽

2021 ◽

Vol 184 ◽

pp. 148-155

Author(s):

Abdul Munem Nerabie ◽

Manar AlKhatib ◽

Sujith Samuel Mathew ◽

May El Barachi ◽

Farhad Oroumchian

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Learning Approach ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

The Impact ◽

Speech Tagging

Download Full-text

Development of deep learning‐based equipment heat load detection for energy demand estimation and investigation of the impact of illumination

International Journal of Energy Research ◽

10.1002/er.6306 ◽

2020 ◽

Cited By ~ 1

Author(s):

Shuangyu Wei ◽

John Calautit

Keyword(s):

Deep Learning ◽

Energy Demand ◽

Heat Load ◽

Demand Estimation ◽

Load Detection ◽

The Impact

Download Full-text