scholarly journals FDCNet: Frontend-Backend Fusion Dilated Network Through Channel-Attention Mechanism

2019 ◽  
Vol 9 (17) ◽  
pp. 3466
Author(s):  
Yuqian Zhang ◽  
Guohui Li ◽  
Jun Lei ◽  
Jiayu He

Crowd counting has attracted much attention in computer vision owing to its fundamental contribution in public security. But due to occlusions, perspective distortions, scale variations, and background interference it faces a great challenge. In this paper we propose a novel model to count crowds, named FDCNet: frontend-backend fusion dilated network through channel-attention mechanism. It merges the frontend feature map with the backend feature map, achieving a fusion of various scale features without additional branches or extra subtasks. The fusion is fed into the channel-attention block to optimize the procedure and to conduct feature recalibration to use global and spatial information. Furthermore, we utilize dilated layers to obtain a high-quality density map, and the SSIM-based loss function is added to compare the local correlation between the estimated density map and the ground truth. Our FDCNet is verified in four common datasets and gets a brilliant estimation.

Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1737 ◽  
Author(s):  
Tae-young Ko ◽  
Seung-ho Lee

This paper proposes a novel method of semantic segmentation, consisting of modified dilated residual network, atrous pyramid pooling module, and backpropagation, that is applicable to augmented reality (AR). In the proposed method, the modified dilated residual network extracts a feature map from the original images and maintains spatial information. The atrous pyramid pooling module places convolutions in parallel and layers feature maps in a pyramid shape to extract objects occupying small areas in the image; these are converted into one channel using a 1 × 1 convolution. Backpropagation compares the semantic segmentation obtained through convolution from the final feature map with the ground truth provided by a database. Losses can be reduced by applying backpropagation to the modified dilated residual network to change the weighting. The proposed method was compared with other methods on the Cityscapes and PASCAL VOC 2012 databases. The proposed method achieved accuracies of 82.8 and 89.8 mean intersection over union (mIOU) and frame rates of 61 and 64.3 frames per second (fps) for the Cityscapes and PASCAL VOC 2012 databases, respectively. These results prove the applicability of the proposed method for implementing natural AR applications at actual speeds because the frame rate is greater than 60 fps.


2021 ◽  
Vol 13 (3) ◽  
pp. 72
Author(s):  
Shengbo Chen ◽  
Hongchang Zhang ◽  
Zhou Lei

Person re-identification (ReID) plays a significant role in video surveillance analysis. In the real world, due to illumination, occlusion, and deformation, pedestrian features extraction is the key to person ReID. Considering the shortcomings of existing methods in pedestrian features extraction, a method based on attention mechanism and context information fusion is proposed. A lightweight attention module is introduced into ResNet50 backbone network equipped with a small number of network parameters, which enhance the significant characteristics of person and suppress irrelevant information. Aiming at the problem of person context information loss due to the over depth of the network, a context information fusion module is designed to sample the shallow feature map of pedestrians and cascade with the high-level feature map. In order to improve the robustness, the model is trained by combining the loss of margin sample mining with the loss function of cross entropy. Experiments are carried out on datasets Market1501 and DukeMTMC-reID, our method achieves rank-1 accuracy of 95.9% on the Market1501 dataset, and 90.1% on the DukeMTMC-reID dataset, outperforming the current mainstream method in case of only using global feature.


2010 ◽  
Vol 40-41 ◽  
pp. 453-456
Author(s):  
Xin Hui Wu ◽  
Jing Li ◽  
Chang Hai Qin ◽  
Zhong Hai Zhang

This paper proposes a method of the coupling modal, which is able to miniaturize the tunable cavity filter while keeping its bandwidth balancing. The filter consists of a tunable cavity dual-bandpass filter and a triangular twin-loop as its inter-cavities coupling structure. We analyzed and calculated the bandwidth of the filter changing with the size and position of the triangular twin-loop. To prove the advancement of the design, a tunable coaxial cavity dual-bandpass filter operating at 230MHz and 409MHz was fabricated and measured. The size is less then a half that of the conventional tunable filter with same specifications. The insertion loss is lower than 1.2dB at operating frequencies. And the bandwidth in lowband and highband are both more than 2.5MHz with the insertion loss less then 3dB. Experiment results and theoretic analysis agree well. This novel model can contribute to the miniaturization of RF and microwave systems with high quality.


2022 ◽  
Vol 41 (1) ◽  
pp. 1-17
Author(s):  
Xin Chen ◽  
Anqi Pang ◽  
Wei Yang ◽  
Peihao Wang ◽  
Lan Xu ◽  
...  

In this article, we present TightCap, a data-driven scheme to capture both the human shape and dressed garments accurately with only a single three-dimensional (3D) human scan, which enables numerous applications such as virtual try-on, biometrics, and body evaluation. To break the severe variations of the human poses and garments, we propose to model the clothing tightness field—the displacements from the garments to the human shape implicitly in the global UV texturing domain. To this end, we utilize an enhanced statistical human template and an effective multi-stage alignment scheme to map the 3D scan into a hybrid 2D geometry image. Based on this 2D representation, we propose a novel framework to predict clothing tightness field via a novel tightness formulation, as well as an effective optimization scheme to further reconstruct multi-layer human shape and garments under various clothing categories and human postures. We further propose a new clothing tightness dataset of human scans with a large variety of clothing styles, poses, and corresponding ground-truth human shapes to stimulate further research. Extensive experiments demonstrate the effectiveness of our TightCap to achieve the high-quality human shape and dressed garments reconstruction, as well as the further applications for clothing segmentation, retargeting, and animation.


Agronomy ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1951
Author(s):  
Brianna B. Posadas ◽  
Mamatha Hanumappa ◽  
Kim Niewolny ◽  
Juan E. Gilbert

Precision agriculture is highly dependent on the collection of high quality ground truth data to validate the algorithms used in prescription maps. However, the process of collecting ground truth data is labor-intensive and costly. One solution to increasing the collection of ground truth data is by recruiting citizen scientists through a crowdsourcing platform. In this study, a crowdsourcing platform application was built using a human-centered design process. The primary goals were to gauge users’ perceptions of the platform, evaluate how well the system satisfies their needs, and observe whether the classification rate of lambsquarters by the users would match that of an expert. Previous work demonstrated a need for ground truth data on lambsquarters in the D.C., Maryland, Virginia (DMV) area. Previous social interviews revealed users who would want a citizen science platform to expand their skills and give them access to educational resources. Using a human-centered design protocol, design iterations of a mobile application were created in Kinvey Studio. The application, Mission LQ, taught people how to classify certain characteristics of lambsquarters in the DMV and allowed them to submit ground truth data. The final design of Mission LQ received a median system usability scale (SUS) score of 80.13, which indicates a good design. The classification rate of lambsquarters was 72%, which is comparable to expert classification. This demonstrates that a crowdsourcing mobile application can be used to collect high quality ground truth data for use in precision agriculture.


Symmetry ◽  
2020 ◽  
Vol 12 (3) ◽  
pp. 449 ◽  
Author(s):  
Can Li ◽  
Liejun Wang ◽  
Shuli Cheng ◽  
Naixiang Ao

In recent years, the common algorithms for image super-resolution based on deep learning have been increasingly successful, but there is still a large gap between the results generated by each algorithm and the ground-truth. Even some algorithms that are dedicated to image perception produce more textures that do not exist in the original image, and these artefacts also affect the visual perceptual quality of the image. We believe that in the existing perceptual-based image super-resolution algorithm, it is necessary to consider Super-Resolution (SR) image quality, which can restore the important structural parts of the original picture. This paper mainly improves the Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) algorithm in the following aspects: adding a shallow network structure, adding the dual attention mechanism in the generator and the discriminator, including the second-order channel mechanism and spatial attention mechanism and optimizing perceptual loss by adding second-order covariance normalization at the end of feature extractor. The results of this paper ensure image perceptual quality while reducing image distortion and artefacts, improving the perceived similarity of images and making the images more in line with human visual perception.


2019 ◽  
Vol 50 (4) ◽  
pp. 1073-1085
Author(s):  
Zheyi Fan ◽  
Yixuan Zhu ◽  
Yu Song ◽  
Zhiwen Liu

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Qingren Wang ◽  
Min Zhang ◽  
Tao Tao ◽  
Victor S. Sheng

The supervised learning-based recommendation models, whose infrastructures are sufficient training samples with high quality, have been widely applied in many domains. In the era of big data with the explosive growth of data volume, training samples should be labelled timely and accurately to guarantee the excellent recommendation performance of supervised learning-based models. Machine annotation cannot complete the tasks of labelling training samples with high quality because of limited machine intelligence. Although expert annotation can achieve a high accuracy, it requires a long time as well as more resources. As a new way of human intelligence to participate in machine computing, crowdsourcing annotation makes up for shortages of machine annotation and expert annotation. Therefore, in this paper, we utilize crowdsourcing annotation to label training samples. First, a suitable crowdsourcing mechanism is designed to create crowdsourcing annotation-based tasks for training sample labelling, and then two entropy-based ground truth inference algorithms (i.e., HILED and HILI) are proposed to achieve quality improvement of noise labels provided by the crowd. In addition, the descending and random order manners in crowdsourcing annotation-based tasks are also explored. The experimental results demonstrate that crowdsourcing annotation significantly improves the performance of machine annotation. Among the ground truth inference algorithms, both HILED and HILI improve the performance of baselines; meanwhile, HILED performs better than HILI.


Information ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 583
Author(s):  
Mingtao Guo ◽  
Donghui Xue ◽  
Peng Li ◽  
He Xu

Object detection for vehicles and pedestrians is extremely difficult to achieve in autopilot applications for the Internet of vehicles, and it is a task that requires the ability to locate and identify smaller targets even in complex environments. This paper proposes a single-stage object detection network (YOLOv3-promote) for the detection of vehicles and pedestrians in complex environments in cities, which improves on the traditional You Only Look Once version 3 (YOLOv3). First, spatial pyramid pooling is used to fuse local and global features in an image to better enrich the expression ability of the feature map and to more effectively detect targets with large size differences in the image; second, an attention mechanism is added to the feature map to weight each channel, thereby enhancing key features and removing redundant features, which allows for strengthening the ability of the feature network to discriminate between target objects and backgrounds; lastly, the anchor box derived from the K-means clustering algorithm is fitted to the final prediction box to complete the positioning and identification of target vehicles and pedestrians. The experimental results show that the proposed method achieved 91.4 mAP (mean average precision), 83.2 F1 score, and 43.7 frames per second (FPS) on the KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset, and the detection performance was superior to the conventional YOLOv3 algorithm in terms of both accuracy and speed.


Sensors ◽  
2020 ◽  
Vol 20 (22) ◽  
pp. 6493
Author(s):  
Song-Kyu Park ◽  
Joon-Hyuk Chang

In this paper, we propose a multi-channel cross-tower with attention mechanisms in latent domain network (Multi-TALK) that suppresses both the acoustic echo and background noise. The proposed approach consists of the cross-tower network, a parallel encoder with an auxiliary encoder, and a decoder. For the multi-channel processing, a parallel encoder is used to extract latent features of each microphone, and the latent features including the spatial information are compressed by a 1D convolution operation. In addition, the latent features of the far-end are extracted by the auxiliary encoder, and they are effectively provided to the cross-tower network by using the attention mechanism. The cross tower network iteratively estimates the latent features of acoustic echo and background noise in each tower. To improve the performance at each iteration, the outputs of each tower are transmitted as the input for the next iteration of the neighboring tower. Before passing through the decoder, to estimate the near-end speech, attention mechanisms are further applied to remove the estimated acoustic echo and background noise from the compressed mixture to prevent speech distortion by over-suppression. Compared to the conventional algorithms, the proposed algorithm effectively suppresses the acoustic echo and background noise and significantly lowers the speech distortion.


Sign in / Sign up

Export Citation Format

Share Document