CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope

Dulari Bhatt; Chirag Patel; Hardik Talsania; Jigar Patel; Rasmika Vaghela; Sharnil Pandya; Kirit Modi; Hemant Ghayvat

doi:10.3390/electronics10202470

CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope

Electronics ◽

10.3390/electronics10202470 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2470

Author(s):

Dulari Bhatt ◽

Chirag Patel ◽

Hardik Talsania ◽

Jigar Patel ◽

Rasmika Vaghela ◽

...

Keyword(s):

Computer Vision ◽

Language Processing ◽

Video Processing ◽

Spatial Information ◽

Regularization Parameter ◽

Research Direction ◽

Alternative Activation ◽

Future Research ◽

Survey Paper ◽

Deep Cnn

Computer vision is becoming an increasingly trendy word in the area of image processing. With the emergence of computer vision applications, there is a significant demand to recognize objects automatically. Deep CNN (convolution neural network) has benefited the computer vision community by producing excellent results in video processing, object recognition, picture classification and segmentation, natural language processing, speech recognition, and many other fields. Furthermore, the introduction of large amounts of data and readily available hardware has opened new avenues for CNN study. Several inspirational concepts for the progress of CNN have been investigated, including alternative activation functions, regularization, parameter optimization, and architectural advances. Furthermore, achieving innovations in architecture results in a tremendous enhancement in the capacity of the deep CNN. Significant emphasis has been given to leveraging channel and spatial information, with a depth of architecture and information processing via multi-path. This survey paper focuses mainly on the primary taxonomy and newly released deep CNN architectures, and it divides numerous recent developments in CNN architectures into eight groups. Spatial exploitation, multi-path, depth, breadth, dimension, channel boosting, feature-map exploitation, and attention-based CNN are the eight categories. The main contribution of this manuscript is in comparing various architectural evolutions in CNN by its architectural change, strengths, and weaknesses. Besides, it also includes an explanation of the CNN’s components, the strengths and weaknesses of various CNN variants, research gap or open challenges, CNN applications, and the future research direction.

Download Full-text

COMPUTER VISION FOR COVID-19 CONTROL: A SURVEY

10.31224/osf.io/yt9sx ◽

2020 ◽

Cited By ~ 3

Author(s):

Anwaar Ulhaq ◽

Asim Khan ◽

Douglas Pinto Sampaio Gomes ◽

Manoranjan Paul

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Human Population ◽

Preliminary Review ◽

Future Research ◽

Research Directions ◽

Survey Paper ◽

Recent Success ◽

Future Research Directions ◽

Research Resources

The COVID-19 pandemic has triggered an urgent need to contribute to the fight against an immense threat to the human population. Computer Vision, as a subfield of Artificial Intelligence, has enjoyed recent success in solvingvarious complex problems in health care and has the potential to contribute to the fight of controlling COVID-19. In response to this call, computer vision researchers are putting their knowledge base at work to devise effective ways to counter COVID-19 challenge and serve the global community. New contributions are being shared with everypassing day. It motivated us to review the recent work, collect information about available research resources and an indication of future research directions. We want to make it available to computer vision researchers to save precious time. This survey paper is intended to provide a preliminary review of the available literature on the computer vision efforts against COVID-19 pandemic.

Download Full-text

Overview on Image Captioning Techniques

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2021/15982021 ◽

2021 ◽

Vol 9 (8) ◽

pp. 1118-1123

Keyword(s):

Computer Vision ◽

Language Processing ◽

Multimodal Learning ◽

Image Captioning ◽

Survey Paper ◽

The Core ◽

Deep Learning Neural Network ◽

The Given ◽

Source Of Information ◽

Object Attribute

Image captioning is a process to assign a meaningful title for a given image with the help of Natural Language Processing (NLP) and Computer Vision techniques. Captioning of an image first need to identify object, attribute and relationship among these in image and second is to generate relevant description for the given image. So it require both NLP and Computer vision techniques to perform image captioning task. Due to complexity of finding relationship between the attribute of the object and its feature makes it a challenging task. Also for machine it is difficult to emulate human brain however researches have shown a prominent achievement in this field and made it easy to solve such problems. The foremost aim of this survey paper is to describe several methods to achieve the same, the core involvement of this paper is to categorise different existing approaches for image captioning, further discussed their subcategories of this method and classify them, also discussed some of their strength and limitations. This survey paper gives theoretical analysis of image captioning methods and defines some earlier and newly approach for image captioning. This survey paper is basically a source of information for researchers in order to get idea of different approaches that were developed so far in the field of image captioning. Key words : Computer Vision, Deep Learning, Neural Network, NLP, Image Captioning, Multimodal Learning.

Download Full-text

Artificial Intelligence in News Media: Current Perceptions and Future Outlook

10.20944/preprints202110.0020.v2 ◽

2021 ◽

Author(s):

Mathias-Felipe de-Lima-Santos ◽

Wilson Ceron

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Computer Vision ◽

Science Fiction ◽

Language Processing ◽

News Media ◽

Future Research ◽

Production And Distribution ◽

News Industry ◽

Journalistic Field

In recent years, news media has been greatly disrupted by the potential of technologically driven approaches in the creation, production, and distribution of news products and services. Artificial intelligence (AI) has emerged from the realm of science fiction and has become a very real tool that can aid society in addressing many issues, including the challenges faced by the news industry. The ubiquity of computing has become apparent and has demonstrated the different approaches that can be achieved using AI. We analyzed the news industry’s AI adoption based on the seven subfields of AI: (i) machine learning; (ii) computer vision (CV); (iii) speech recognition; (iv) natural language processing (NLP); (v) planning, scheduling, and optimization; (vi) expert systems; and (vii) robotics. Our findings suggest that three subfields are being developed more in the news media: machine learning, computer vision, as well as planning, scheduling, and optimization. Other areas have not been fully deployed in the journalistic field. Most AI news projects rely on funds from tech companies such as Google. This limits AI’s potential to a small number of players in the news industry. We make conclusions by providing examples of how these subfields are being developed in journalism and present an agenda for future research.

Download Full-text

Artificial Intelligence in News Media: Current Perceptions and Future Outlook

Journalism and Media ◽

10.3390/journalmedia3010002 ◽

2021 ◽

Vol 3 (1) ◽

pp. 13-26

Author(s):

Mathias-Felipe de-Lima-Santos ◽

Wilson Ceron

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Computer Vision ◽

Science Fiction ◽

Language Processing ◽

News Media ◽

Future Research ◽

Production And Distribution ◽

News Industry ◽

Journalistic Field

In recent years, news media has been greatly disrupted by the potential of technologically driven approaches in the creation, production, and distribution of news products and services. Artificial intelligence (AI) has emerged from the realm of science fiction and has become a very real tool that can aid society in addressing many issues, including the challenges faced by the news industry. The ubiquity of computing has become apparent and has demonstrated the different approaches that can be achieved using AI. We analyzed the news industry’s AI adoption based on the seven subfields of AI: (i) machine learning; (ii) computer vision (CV); (iii) speech recognition; (iv) natural language processing (NLP); (v) planning, scheduling, and optimization; (vi) expert systems; and (vii) robotics. Our findings suggest that three subfields are being developed more in the news media: machine learning, computer vision, and planning, scheduling, and optimization. Other areas have not been fully deployed in the journalistic field. Most AI news projects rely on funds from tech companies such as Google. This limits AI’s potential to a small number of players in the news industry. We made conclusions by providing examples of how these subfields are being developed in journalism and presented an agenda for future research.

Download Full-text

Camouflaged Object Detection and Tracking: A Survey

International Journal of Image and Graphics ◽

10.1142/s021946782050028x ◽

2020 ◽

Vol 20 (04) ◽

pp. 2050028

Author(s):

Ajoy Mondal

Keyword(s):

Computer Vision ◽

Object Detection ◽

Research Direction ◽

Point Of View ◽

Future Research ◽

Theoretical Point ◽

Future Research Direction ◽

Object Detection And Tracking ◽

Detection And Tracking ◽

Survey Papers

Moving object detection and tracking have various applications, including surveillance, anomaly detection, vehicle navigation, etc. The literature on object detection and tracking is rich enough, and there exist several essential survey papers. However, the research on camouflage object detection and tracking is limited due to the complexity of the problem. Existing work on this problem has been done based on either biological characteristics of the camouflaged objects or computer vision techniques. In this paper, we review the existing camouflaged object detection and tracking techniques using computer vision algorithms from the theoretical point of view. This paper also addresses several issues of interest as well as future research direction in this area. We hope this paper will help the reader to learn the recent advances in camouflaged object detection and tracking.

Download Full-text

Towards generalisable hate speech detection: a review on obstacles and solutions

PeerJ Computer Science ◽

10.7717/peerj-cs.598 ◽

2021 ◽

Vol 7 ◽

pp. e598

Author(s):

Wenjie Yin ◽

Arkaitz Zubiaga

Keyword(s):

Natural Language Processing ◽

Sexual Orientation ◽

Language Processing ◽

Hate Speech ◽

Future Research ◽

Individual Member ◽

Speech Detection ◽

Survey Paper ◽

Unseen Data ◽

Natural Language Processing Task

Hate speech is one type of harmful online content which directly attacks or promotes hate towards a group or an individual member based on their actual or perceived aspects of identity, such as ethnicity, religion, and sexual orientation. With online hate speech on the rise, its automatic detection as a natural language processing task is gaining increasing interest. However, it is only recently that it has been shown that existing models generalise poorly to unseen data. This survey paper attempts to summarise how generalisable existing hate speech detection models are and the reasons why hate speech models struggle to generalise, sums up existing attempts at addressing the main obstacles, and then proposes directions of future research to improve generalisation in hate speech detection.

Download Full-text

Image- vs. histogram-based considerations in semantic segmentation of pulmonary hyperpolarized gas images

10.1101/2021.03.04.21252588 ◽

2021 ◽

Author(s):

Nicholas J. Tustison ◽

Talissa A. Altes ◽

Kun Qing ◽

Mu He ◽

G. Wilson Miller ◽

...

Keyword(s):

Spatial Information ◽

Research Effort ◽

Semantic Segmentation ◽

Research Direction ◽

Image Features ◽

Evaluation Framework ◽

Lung Imaging ◽

Future Research ◽

Image Domain ◽

Hyperpolarized Gas

AbstractMagnetic resonance imaging (MRI) using hyperpolarized gases has made possible the novel visualization of airspaces in the human lung, which has advanced research into the growth, development, and pathologies of the pulmonary system. In conjunction with the innovations associated with image acquisition, multiple image analysis strategies have been proposed and refined for the quantification of such lung imaging with much research effort devoted to semantic segmentation, or voxelwise classification, into clinically oriented categories based on ventilation levels. Given the functional nature of these images and the consequent sophistication of the segmentation task, many of these algorithmic approaches reduce the complex spatial image information to intensity-only considerations, which can be contextualized in terms of the intensity histogram. Although facilitating computational processing, this simplifying transformation results in the loss of important spatial cues for identifying salient image features, such as ventilation defects (a well-studied correlate of lung pathophysiology), as spatial objects. In this work, we discuss the interrelatedness of the most common approaches for histogram-based optimization of hyperpolarized gas lung imaging segmentation and demonstrate how certain assumptions lead to suboptimal performance, particularly in terms of measurement precision. In contrast, we illustrate how a convolutional neural network is optimized (i.e., trained) directly within the image domain to leverage spatial information. This image-based optimization mitigates the problematic issues associated with histogram-based approaches and suggests a preferred future research direction. Importantly, we provide the entire processing and evaluation framework, including the newly reported deep learning functionality, as open-source through the well-known Advanced Normalization Tools ecosystem.

Download Full-text

Artificial Intelligence in News Media: Current Perceptions and Future Outlook

10.20944/preprints202110.0020.v1 ◽

2021 ◽

Author(s):

Mathias-Felipe de-Lima-Santos ◽

Wilson Ceron

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Computer Vision ◽

Science Fiction ◽

Language Processing ◽

News Media ◽

Future Research ◽

Production And Distribution ◽

News Industry ◽

Journalistic Field

In recent years, news media have been hugely disrupted by the potential of technological-driven approaches in the creation, production, and distribution of news products and services. Artificial intelligence (AI) has emerged from the realm of science fiction and has become a very real tool that can aid society in addressing many issues, including the challenges faced by the news industry. The ubiquity of computing has become apparent and has shown the different approaches that can be achieved using AI. We analyzed the news industry AI adoption based on the seven subfields emanated from AI: (i) machine learning; (ii) computer vision (CV); (iii) speech recognition; (iv) natural language processing (NLP); (v) planning, scheduling, and optimization; (vi) expert systems; and (vii) robotics. Our findings suggest that three subfields are being more developed in the news media: machine learning, planning, scheduling &amp; optimization, and computer vision. Other areas are still not fully deployed in the journalistic field. Most of the AI news projects rely on funds from tech companies, such as Google. This limits the potential of AI in the news industry to a small number of players. We conclude by providing examples of how these subfields are being developed in journalism and present an agenda for future research.

Download Full-text

A Systematic Review on Data Scarcity Problem in Deep Learning: Solution and Applications

ACM Computing Surveys ◽

10.1145/3502287 ◽

2022 ◽

Author(s):

Ms. Aayushi Bansal ◽

Dr. Rewa Sharma ◽

Dr. Mamta Kathuria

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Language Processing ◽

Data Augmentation ◽

Medical Science ◽

Real Life ◽

Survey Paper ◽

Augmentation Techniques ◽

Comprehensive Survey

Recent advancements in deep learning architecture have increased its utility in real-life applications. Deep learning models require a large amount of data to train the model. In many application domains, there is a limited set of data available for training neural networks as collecting new data is either not feasible or requires more resources such as in marketing, computer vision, and medical science. These models require a large amount of data to avoid the problem of overfitting. One of the data space solutions to the problem of limited data is data augmentation. The purpose of this study focuses on various data augmentation techniques that can be used to further improve the accuracy of a neural network. This saves the cost and time consumption required to collect new data for the training of deep neural networks by augmenting available data. This also regularizes the model and improves its capability of generalization. The need for large datasets in different fields such as computer vision, natural language processing, security and healthcare is also covered in this survey paper. The goal of this paper is to provide a comprehensive survey of recent advancements in data augmentation techniques and their application in various domains.

Download Full-text

Review on Generative Adversarial Networks: Focusing on Computer Vision and Its Applications

Electronics ◽

10.3390/electronics10101216 ◽

2021 ◽

Vol 10 (10) ◽

pp. 1216

Author(s):

Sung-Wook Park ◽

Jae-Sub Ko ◽

Jun-Ho Huh ◽

Jong-Chan Kim

Keyword(s):

Machine Learning ◽

Computer Vision ◽

Random Noise ◽

Image Synthesis ◽

Research Direction ◽

Input Image ◽

Generative Model ◽

Generative Adversarial Networks ◽

Future Research ◽

Adversarial Networks

The emergence of deep learning model GAN (Generative Adversarial Networks) is an important turning point in generative modeling. GAN is more powerful in feature and expression learning compared to machine learning-based generative model algorithms. Nowadays, it is also used to generate non-image data, such as voice and natural language. Typical technologies include BERT (Bidirectional Encoder Representations from Transformers), GPT-3 (Generative Pretrained Transformer-3), and MuseNet. GAN differs from the machine learning-based generative model and the objective function. Training is conducted by two networks: generator and discriminator. The generator converts random noise into a true-to-life image, whereas the discriminator distinguishes whether the input image is real or synthetic. As the training continues, the generator learns more sophisticated synthesis techniques, and the discriminator grows into a more accurate differentiator. GAN has problems, such as mode collapse, training instability, and lack of evaluation matrix, and many researchers have tried to solve these problems. For example, solutions such as one-sided label smoothing, instance normalization, and minibatch discrimination have been proposed. The field of application has also expanded. This paper provides an overview of GAN and application solutions for computer vision and artificial intelligence healthcare field researchers. The structure and principle of operation of GAN, the core models of GAN proposed to date, and the theory of GAN were analyzed. Application examples of GAN such as image classification and regression, image synthesis and inpainting, image-to-image translation, super-resolution and point registration were then presented. The discussion tackled GAN’s problems and solutions, and the future research direction was finally proposed.

Download Full-text