convolution structure
Recently Published Documents


TOTAL DOCUMENTS

68
(FIVE YEARS 9)

H-INDEX

11
(FIVE YEARS 0)

Author(s):  
Kaixuan Cui ◽  
Shuchai Su ◽  
Jiawei Cai ◽  
Fengjun Chen

To realize rapid and accurate ripeness detection for walnut on mobile terminals such as mobile phones, we propose a method based on coupling information and lightweight YOLOv4. First, we collected 50 walnuts at each ripeness (Unripe, Mid-ripe, Ripe, Over-ripe) to determine the kernel oil content. Pearson correlation analysis and one-way analysis of variance (ANOVA) prove that the division of walnut ripeness reflects the change in kernel oil content. It is feasible to estimate the kernel oil content by detecting the ripeness of walnut. Next, we achieve ripeness detection based on lightweight YOLOv4. We adopt MobileNetV3 as the backbone feature extractor and adopt depthwise separable convolution to replace the traditional convolution. We design a parallel convolution structure with depthwise convolution stacking (PCSDCS) to reduce parameters and improve feature extraction ability. To enhance the model’s detection ability for walnuts in the growth-intensive areas, we design a Gaussian Soft DIoU non-maximum suppression (GSDIoU-NMS) algorithm. The dataset used for model optimization contains 3600 images, of which 2880 images in the training set, 320 images in the validation set, and 400 images in the test set. We adopt a multi-training strategy based on dynamic learning rate and transfer learning to get training weights. The lightweight YOLOv4 model achieves 94.05%, 90.72%, 88.30%, 76.92 FPS, and 38.14 MB in mean average precision, precision, recall, average detection speed, and weight capacity, respectively. Compared with the Faster R-CNN model, EfficientDet-D1 model, YOLOv3 model, and YOLOv4 model, the lightweight YOLOv4 model improves 8.77%, 4.84%, 5.43%, and 0.06% in mean average precision, 74.60 FPS, 55.60 FPS, 38.83 FPS, and 46.63 FPS in detection speed, respectively. And the lightweight YOLOv4 is 84.4% smaller than the original YOLOv4 model in terms of weight capacity. This paper provides a theoretical reference for the rapid ripeness detection of walnut and exploration for the model’s lightweight.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jie Gan

With the advancement of multimedia and digital technologies, music resources are rapidly increasing over the Internet, which changed listeners’ habits from hard drives to online music platforms. It has allowed the researchers to use classification technologies for efficient storage, organization, retrieval, and recommendation of music resources. The traditional music classification methods use many artificially designed acoustic features, which require knowledge in the music field. The features of different classification tasks are often not universal. This paper provides a solution to this problem by proposing a novel recurrent neural network method with a channel attention mechanism for music feature classification. The music classification method based on a convolutional neural network ignores the timing characteristics of the audio itself. Therefore, this paper combines convolution structure with the bidirectional recurrent neural network and uses the attention mechanism to assign different attention weights to the output of the recurrent neural network at different times; the weights are assigned for getting a better representation of the overall characteristics of the music. The classification accuracy of the model on the GTZAN data set has increased to 93.1%. The AUC on the multilabel labeling data set MagnaTagATune has reached 92.3%, surpassing other comparison methods. The labeling of different music labels has been analyzed. This method has good labeling ability for most of the labels of music genres. Also, it has good performance on some labels of musical instruments, singing, and emotion categories.


Author(s):  
Bogdan Alexandru Radulescu ◽  
Victorita Radulescu

Abstract Action recognition infrastructure can be applied anywhere behavior analysis is required and represents presently a domain of maximum actuality in security and surveillance. The model based on 3D Convolutions is a middle ground between simple key-frame approaches based on 2D convolutions, and other more complex approaches based on Recurrent Neural Networks. Behavior analysis represents a domain greatly improved by action recognition. By placing human actions in different categories it is possible to extract statistics regarding a person’s behavior, characteristics, abilities and preferences which can be processed later by specialized personnel, depending on the selected domain. The proposed model follows simple 3D convolution architecture. Hidden layers are composed of a convolution operation, an activation function and, sometimes, a pooling layer. Leaky ReLU was used as activation function to alleviate the problem of vanishing gradients. Batch Normalization is a technique used for scaling and adjusting the output of an activation layer, and it has been used to reduce over-fitting and decrease the training time. The 3D Convolution structure has the advantage of learning spatio-temporal features, because the convolution is applied over a sequence of frames. In the present paper is presented a proposed 3D convolution model that has average results, with an accuracy of approximately 55% on the NTU RGB+D dataset.


2021 ◽  
Vol 15 ◽  
Author(s):  
Yanxian He ◽  
Jun Wu ◽  
Li Zhou ◽  
Yi Chen ◽  
Fang Li ◽  
...  

Alzheimer disease (AD) is mainly manifested as insidious onset, chronic progressive cognitive decline and non-cognitive neuropsychiatric symptoms, which seriously affects the quality of life of the elderly and causes a very large burden on society and families. This paper uses graph theory to analyze the constructed brain network, and extracts the node degree, node efficiency, and node betweenness centrality parameters of the two modal brain networks. The T test method is used to analyze the difference of graph theory parameters between normal people and AD patients, and brain regions with significant differences in graph theory parameters are selected as brain network features. By analyzing the calculation principles of the conventional convolutional layer and the depth separable convolution unit, the computational complexity of them is compared. The depth separable convolution unit decomposes the traditional convolution process into spatial convolution for feature extraction and point convolution for feature combination, which greatly reduces the number of multiplication and addition operations in the convolution process, while still being able to obtain comparisons. Aiming at the special convolution structure of the depth separable convolution unit, this paper proposes a channel pruning method based on the convolution structure and explains its pruning process. Multimodal neuroimaging can provide complete information for the quantification of Alzheimer’s disease. This paper proposes a cascaded three-dimensional neural network framework based on single-modal and multi-modal images, using MRI and PET images to distinguish AD and MCI from normal samples. Multiple three-dimensional CNN networks are used to extract recognizable information in local image blocks. The high-level two-dimensional CNN network fuses multi-modal features and selects the features of discriminative regions to perform quantitative predictions on samples. The algorithm proposed in this paper can automatically extract and fuse the features of multi-modality and multi-regions layer by layer, and the visual analysis results show that the abnormally changed regions affected by Alzheimer’s disease provide important information for clinical quantification.


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0248303
Author(s):  
ShuFen Liang ◽  
Huilin Liu ◽  
Chen Chen ◽  
Chuanbo Qin ◽  
FangChen Yang ◽  
...  

Accurate and robust segmentation of anatomical structures from magnetic resonance images is valuable in many computer-aided clinical tasks. Traditional codec networks are not satisfactory because of their low accuracy of edge segmentation, the low recognition rate of the target, and loss of detailed information. To address these problems, this study proposes a series of improved models for semantic segmentation and progressively optimizes them from the three aspects of convolution module, codec unit, and feature fusion. Instead of the standard convolution structure, we apply a new type of convolution module for the feature extraction. The networks integrate a multi-path method to obtain richer-detail edge information. Finally, a dense network is utilized to strengthen the ability of the feature fusion and integrate more different-level information. The evaluation of the Accuracy, Dice coefficient, and Jaccard index led to values of 0.9855, 0.9185, and 0.8507, respectively. These metrics of the best network increased by 1.0%, 4.0%, and 6.1%, respectively. Boundary F1-Score reached 0.9124 indicating that the proposed networks can segment smaller targets to obtain smoother edges. Our methods obtain more key information than traditional methods and achieve superiority in segmentation performance.


2021 ◽  
Vol 13 (4) ◽  
pp. 676
Author(s):  
Li Yan ◽  
Xingfen Tang ◽  
Yi Zhang

Digital elevation model (DEM) interpolation is aimed at predicting the elevation values of unobserved locations, given a series of collected points. Over the years, the traditional interpolation methods have been widely used but can easily lead to accuracy degradation. In recent years, generative adversarial networks (GANs) have been proven to be more efficient than the traditional methods. However, the interpolation accuracy is not guaranteed. In this paper, we propose a GAN-based network named gated and symmetric-dilated U-net GAN (GSUGAN) for improved DEM interpolation, which performs visibly and quantitatively better than the traditional methods and the conditional encoder-decoder GAN (CEDGAN). We also discuss combinations of new techniques in the generator. This shows that the gated convolution and symmetric dilated convolution structure perform slightly better. Furthermore, based on the performance of the different methods, it was concluded that the Convolutional Neural Network (CNN)-based method has an advantage in the quantitative accuracy but the GAN-based method can obtain a better visual quality, especially in complex terrains. In summary, in this paper, we propose a GAN-based network for improved DEM interpolation and we further illustrate the GAN-based method’s performance compared to that of the CNN-based method.


2021 ◽  
Vol 64 (1) ◽  
pp. 87-98
Author(s):  
Manoj Kumar ◽  
N. Shravan Kumar

The aim of this paper is to present some results about the space $L^{\varPhi }(\nu ),$ where $\nu$ is a vector measure on a compact (not necessarily abelian) group and $\varPhi$ is a Young function. We show that under natural conditions, the space $L^{\varPhi }(\nu )$ becomes an $L^{1}(G)$-module with respect to the usual convolution of functions. We also define one more convolution structure on $L^{\varPhi }(\nu ).$


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Jie Cao ◽  
Zhidong He ◽  
Jinhua Wang ◽  
Ping Yu

The bearing state signal collected by the vibration sensor contains a large amount of environmental noise in actual processes, which leads to a reduction in the accuracy of the convolutional network in identifying bearing faults. To solve this problem, a one-dimensional convolutional neural network with a multiscale kernel (MSK-1DCNN) is proposed for the classification information enhancement of the input. A two-layer multiscale convolution structure (MSK) is used at the front of the network. MSK has five convolutional kernels with different sizes, and those kernels are used to extract features with varying resolutions in the original signal. In the multiscale convolution structure, the ELU activation function is used instead of the ReLU function to improve the antinoise ability of MSK-1DCNN, also by adding pepper noise to the training set data to destroy the input data and forcing the network to learn more representative features to improve the robustness of the network. Experimental results illustrate that the improved methods proposed in this paper effectively enhance the diagnostic performance of MSK-1DCNN under intense noise, and the diagnostic accuracy is higher than that of other comparison algorithms.


Sign in / Sign up

Export Citation Format

Share Document