scholarly journals EnsemV3X: a novel ensembled deep learning architecture for multi-label scene classification

2021 ◽  
Vol 7 ◽  
pp. e557
Author(s):  
Priyal Sobti ◽  
Anand Nayyar ◽  
Niharika ◽  
Preeti Nagrath

Convolutional neural network is widely used to perform the task of image classification, including pretraining, followed by fine-tuning whereby features are adapted to perform the target task, on ImageNet. ImageNet is a large database consisting of 15 million images belonging to 22,000 categories. Images collected from the Web are labeled using Amazon Mechanical Turk crowd-sourcing tool by human labelers. ImageNet is useful for transfer learning because of the sheer volume of its dataset and the number of object classes available. Transfer learning using pretrained models is useful because it helps to build computer vision models in an accurate and inexpensive manner. Models that have been pretrained on substantial datasets are used and repurposed for our requirements. Scene recognition is a widely used application of computer vision in many communities and industries, such as tourism. This study aims to show multilabel scene classification using five architectures, namely, VGG16, VGG19, ResNet50, InceptionV3, and Xception using ImageNet weights available in the Keras library. The performance of different architectures is comprehensively compared in the study. Finally, EnsemV3X is presented in this study. The proposed model with reduced number of parameters is superior to state-of-of-the-art models Inception and Xception because it demonstrates an accuracy of 91%.

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7950
Author(s):  
Radhakrishnan Gopalapillai ◽  
Deepa Gupta ◽  
Mohammed Zakariah ◽  
Yousef Ajami Alotaibi

Classification of indoor environments is a challenging problem. The availability of low-cost depth sensors has opened up a new research area of using depth information in addition to color image (RGB) data for scene understanding. Transfer learning of deep convolutional networks with pairs of RGB and depth (RGB-D) images has to deal with integrating these two modalities. Single-channel depth images are often converted to three-channel images by extracting horizontal disparity, height above ground, and the angle of the pixel’s local surface normal (HHA) to apply transfer learning using networks trained on the Places365 dataset. The high computational cost of HHA encoding can be a major disadvantage for the real-time prediction of scenes, although this may be less important during the training phase. We propose a new, computationally efficient encoding method that can be integrated with any convolutional neural network. We show that our encoding approach performs equally well or better in a multimodal transfer learning setup for scene classification. Our encoding is implemented in a customized and pretrained VGG16 Net. We address the class imbalance problem seen in the image dataset using a method based on the synthetic minority oversampling technique (SMOTE) at the feature level. With appropriate image augmentation and fine-tuning, our network achieves scene classification accuracy comparable to that of other state-of-the-art architectures.


2021 ◽  
Vol 5 (3) ◽  
pp. 325
Author(s):  
Hendra Bunyamin

Inductive transfer learning technique has made a huge impact on the computer vision field. Particularly, computer vision  applications including object detection, classification, and segmentation, are rarely trained from scratch; instead, they are fine-tuned from pretrained models, which are products of learning from huge datasets. In contrast to computer vision, state-of-the-art natural language processing models are still generally trained from the ground up. Accordingly, this research attempts to investigate an adoption of the transfer learning technique for natural language processing. Specifically, we utilize a transfer learning technique called Universal Language Model Fine-tuning (ULMFiT) for doing an Indonesian news text classification task. The dataset for constructing the language model is collected from several news providers from January to December 2017 whereas the dataset employed for text classification task comes from news articles provided by the Agency for the Assessment and Application of Technology (BPPT). To examine the impact of ULMFiT, we provide a baseline that is a vanilla neural network with two hidden layers. Although the performance of ULMFiT on validation set is lower than the one of our baseline, we find that the benefits of ULMFiT for the classification task significantly reduce the overfitting, that is the difference between train and validation accuracies from 4% to nearly zero.


2020 ◽  
Vol 9 (2) ◽  
pp. 100-110
Author(s):  
Ahmad Mustafid ◽  
Muhammad Murah Pamuji ◽  
Siti Helmiyah

Deep Learning is an essential technique in the classification problem in machine learning based on artificial neural networks. The general issue in deep learning is data-hungry, which require a plethora of data to train some model. Wayang is a shadow puppet art theater from Indonesia, especially in the Javanese culture. It has several indistinguishable characters. In this paper, We tried proposing some steps and techniques on how to classify the characters and handle the issue on a small wayang dataset by using model selection, transfer learning, and fine-tuning to obtain efficient and precise accuracy on our classification problem. The research used 50 images for each class and a total of 24 wayang characters classes. We collected and implemented various architectures from the initial version of deep learning to the latest proposed model and their state-of-art. The transfer learning and fine-tuning method showed a significant increase in accuracy, validation accuracy. By using Transfer Learning, it was possible to design the deep learning model with good classifiers within a short number of times on a small dataset. It performed 100% on their training on both EfficientNetB0 and MobileNetV3-small. On validation accuracy, gave 98.33% and 98.75%, respectively.


2020 ◽  
Vol 10 (21) ◽  
pp. 7641
Author(s):  
Jaeho Oh ◽  
Mincheol Kim ◽  
Sang-Woo Ban

In this paper, we propose a deep convolutional neural network model with transfer learning that reflects personal preferences from inter-domain databases of images having atypical visual characteristics. The proposed model utilized three public image databases (Fashion-MNIST, Labeled Faces in the Wild [LFW], and Indoor Scene Recognition) that include images with atypical visual characteristics in order to train and infer personal visual preferences. The effectiveness of transfer learning for incremental preference learning was verified by experiments using inter-domain visual datasets with different visual characteristics. Moreover, a gradient class activation mapping (Grad-CAM) approach was applied to the proposed model, providing explanations about personal visual preference possibilities. Experiments showed that the proposed preference-learning model using transfer learning outperformed a preference model not using transfer learning. In terms of the accuracy of preference recognition, the proposed model showed a maximum of about 7.6% improvement for the LFW database and a maximum of about 9.4% improvement for the Indoor Scene Recognition database, compared to the model that did not reflect transfer learning.


2021 ◽  
Vol 12 ◽  
Author(s):  
Yeong Hyeon Gu ◽  
Helin Yin ◽  
Dong Jin ◽  
Jong-Han Park ◽  
Seong Joon Yoo

Past studies of plant disease and pest recognition used classification methods that presented a singular recognition result to the user. Unfortunately, incorrect recognition results may be output, which may lead to further crop damage. To address this issue, there is a need for a system that suggest several candidate results and allow the user to make the final decision. In this study, we propose a method for diagnosing plant diseases and identifying pests using deep features based on transfer learning. To extract deep features, we employ pre-trained VGG and ResNet 50 architectures based on the ImageNet dataset, and output disease and pest images similar to a query image via a k-nearest-neighbor algorithm. In this study, we use a total of 23,868 images of 19 types of hot-pepper diseases and pests, for which, the proposed model achieves accuracies of 96.02 and 99.61%, respectively. We also measure the effects of fine-tuning and distance metrics. The results show that the use of fine-tuning-based deep features increases accuracy by approximately 0.7–7.38%, and the Bray–Curtis distance achieves an accuracy of approximately 0.65–1.51% higher than the Euclidean distance.


Author(s):  
Anusha B

The rapid development in the technology of Internet and the increase in the usage of mobile devices, it is very easy for users to capture, communicate and share the images through the networks. The spectacular achievement of convolution neural networks in the area of computer vision, will help us to match the features that are very similar between the images for detecting the duplicate version of the image. In this project we use Image Net model that mainly provide a large database that contains many images of different categories. Flask framework is used in this project, which includes many libraries, modules that helps the web developer to write web application. In this project the user is allowed to upload the image, then the image features will be extracted and fed to the CNN model. The CNN model will calculate the similarity distance between the images that is already present in the database and detect the top four images that are duplicate version of the uploaded image.


2019 ◽  
Vol 3 (Special Issue on First SACEE'19) ◽  
pp. 165-172
Author(s):  
Vincenzo Bianco ◽  
Giorgio Monti ◽  
Nicola Pio Belfiore

The use of friction pendulum devices has recently attracted the attention of both academic and professional engineers for the protection of structures in seismic areas. Although the effectiveness of these has been shown by the experimental testing carried out worldwide, many aspects still need to be investigated for further improvement and optimisation. A thermo-mechanical model of a double friction pendulum device (based on the most recent modelling techniques adopted in multibody dynamics) is presented in this paper. The proposed model is based on the observation that sliding may not take place as ideally as is indicated in the literature. On the contrary, the fulfilment of geometrical compatibility between the constitutive bodies (during an earthquake) suggests a very peculiar dynamic behaviour composed of a continuous alternation of sticking and slipping phases. The thermo-mechanical model of a double friction pendulum device (based on the most recent modelling techniques adopted in multibody dynamics) is presented. The process of fine-tuning of the selected modelling strategy (available to date) is also described.


Sensors ◽  
2019 ◽  
Vol 19 (22) ◽  
pp. 4850 ◽  
Author(s):  
Carlos S. Pereira ◽  
Raul Morais ◽  
Manuel J. C. S. Reis

Frequently, the vineyards in the Douro Region present multiple grape varieties per parcel and even per row. An automatic algorithm for grape variety identification as an integrated software component was proposed that can be applied, for example, to a robotic harvesting system. However, some issues and constraints in its development were highlighted, namely, the images captured in natural environment, low volume of images, high similarity of the images among different grape varieties, leaf senescence, and significant changes on the grapevine leaf and bunch images in the harvest seasons, mainly due to adverse climatic conditions, diseases, and the presence of pesticides. In this paper, the performance of the transfer learning and fine-tuning techniques based on AlexNet architecture were evaluated when applied to the identification of grape varieties. Two natural vineyard image datasets were captured in different geographical locations and harvest seasons. To generate different datasets for training and classification, some image processing methods, including a proposed four-corners-in-one image warping algorithm, were used. The experimental results, obtained from the application of an AlexNet-based transfer learning scheme and trained on the image dataset pre-processed through the four-corners-in-one method, achieved a test accuracy score of 77.30%. Applying this classifier model, an accuracy of 89.75% on the popular Flavia leaf dataset was reached. The results obtained by the proposed approach are promising and encouraging in helping Douro wine growers in the automatic task of identifying grape varieties.


Author(s):  
Santosh Kumar Mishra ◽  
Rijul Dhir ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Image captioning is the process of generating a textual description of an image that aims to describe the salient parts of the given image. It is an important problem, as it involves computer vision and natural language processing, where computer vision is used for understanding images, and natural language processing is used for language modeling. A lot of works have been done for image captioning for the English language. In this article, we have developed a model for image captioning in the Hindi language. Hindi is the official language of India, and it is the fourth most spoken language in the world, spoken in India and South Asia. To the best of our knowledge, this is the first attempt to generate image captions in the Hindi language. A dataset is manually created by translating well known MSCOCO dataset from English to Hindi. Finally, different types of attention-based architectures are developed for image captioning in the Hindi language. These attention mechanisms are new for the Hindi language, as those have never been used for the Hindi language. The obtained results of the proposed model are compared with several baselines in terms of BLEU scores, and the results show that our model performs better than others. Manual evaluation of the obtained captions in terms of adequacy and fluency also reveals the effectiveness of our proposed approach. Availability of resources : The codes of the article are available at https://github.com/santosh1821cs03/Image_Captioning_Hindi_Language ; The dataset will be made available: http://www.iitp.ac.in/∼ai-nlp-ml/resources.html .


Sign in / Sign up

Export Citation Format

Share Document