scholarly journals Real-Time Fruit Recognition and Grasping Estimation for Robotic Apple Harvesting

Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5670
Author(s):  
Hanwen Kang ◽  
Hongyu Zhou ◽  
Xing Wang ◽  
Chao Chen

Robotic harvesting shows a promising aspect in future development of agricultural industry. However, there are many challenges which are still presented in the development of a fully functional robotic harvesting system. Vision is one of the most important keys among these challenges. Traditional vision methods always suffer from defects in accuracy, robustness, and efficiency in real implementation environments. In this work, a fully deep learning-based vision method for autonomous apple harvesting is developed and evaluated. The developed method includes a light-weight one-stage detection and segmentation network for fruit recognition and a PointNet to process the point clouds and estimate a proper approach pose for each fruit before grasping. Fruit recognition network takes raw inputs from RGB-D camera and performs fruit detection and instance segmentation on RGB images. The PointNet grasping network combines depth information and results from the fruit recognition as input and outputs the approach pose of each fruit for robotic arm execution. The developed vision method is evaluated on RGB-D image data which are collected from both laboratory and orchard environments. Robotic harvesting experiments in both indoor and outdoor conditions are also included to validate the performance of the developed harvesting system. Experimental results show that the developed vision method can perform highly efficient and accurate to guide robotic harvesting. Overall, the developed robotic harvesting system achieves 0.8 on harvesting success rate and cycle time is 6.5 s.

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Kyoungmin Ko ◽  
Hyunmin Gwak ◽  
Nalinh Thoummala ◽  
Hyun Kwon ◽  
SungHwan Kim

In this paper, we propose a robust and reliable face recognition model that incorporates depth information such as data from point clouds and depth maps into RGB image data to avoid false facial verification caused by face spoofing attacks while increasing the model’s performance. The proposed model is driven by the spatially adaptive convolution (SAC) block of SqueezeSegv3; this is the attention block that enables the model to weight features according to their importance of spatial location. We also utilize large-margin loss instead of softmax loss as a supervision signal for the proposed method, to enforce high discriminatory power. In the experiment, the proposed model, which incorporates depth information, had 99.88% accuracy and an F 1 score of 93.45%, outperforming the baseline models, which used RGB data alone.


Author(s):  
Jiayong Yu ◽  
Longchen Ma ◽  
Maoyi Tian, ◽  
Xiushan Lu

The unmanned aerial vehicle (UAV)-mounted mobile LiDAR system (ULS) is widely used for geomatics owing to its efficient data acquisition and convenient operation. However, due to limited carrying capacity of a UAV, sensors integrated in the ULS should be small and lightweight, which results in decrease in the density of the collected scanning points. This affects registration between image data and point cloud data. To address this issue, the authors propose a method for registering and fusing ULS sequence images and laser point clouds, wherein they convert the problem of registering point cloud data and image data into a problem of matching feature points between the two images. First, a point cloud is selected to produce an intensity image. Subsequently, the corresponding feature points of the intensity image and the optical image are matched, and exterior orientation parameters are solved using a collinear equation based on image position and orientation. Finally, the sequence images are fused with the laser point cloud, based on the Global Navigation Satellite System (GNSS) time index of the optical image, to generate a true color point cloud. The experimental results show the higher registration accuracy and fusion speed of the proposed method, thereby demonstrating its accuracy and effectiveness.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dominik Jens Elias Waibel ◽  
Sayedali Shetab Boushehri ◽  
Carsten Marr

Abstract Background Deep learning contributes to uncovering molecular and cellular processes with highly performant algorithms. Convolutional neural networks have become the state-of-the-art tool to provide accurate and fast image data processing. However, published algorithms mostly solve only one specific problem and they typically require a considerable coding effort and machine learning background for their application. Results We have thus developed InstantDL, a deep learning pipeline for four common image processing tasks: semantic segmentation, instance segmentation, pixel-wise regression and classification. InstantDL enables researchers with a basic computational background to apply debugged and benchmarked state-of-the-art deep learning algorithms to their own data with minimal effort. To make the pipeline robust, we have automated and standardized workflows and extensively tested it in different scenarios. Moreover, it allows assessing the uncertainty of predictions. We have benchmarked InstantDL on seven publicly available datasets achieving competitive performance without any parameter tuning. For customization of the pipeline to specific tasks, all code is easily accessible and well documented. Conclusions With InstantDL, we hope to empower biomedical researchers to conduct reproducible image processing with a convenient and easy-to-use pipeline.


Author(s):  
Zhiyong Gao ◽  
Jianhong Xiang

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.


2021 ◽  
Vol 65 (1) ◽  
pp. 10501-1-10501-9
Author(s):  
Jiayong Yu ◽  
Longchen Ma ◽  
Maoyi Tian ◽  
Xiushan Lu

Abstract The unmanned aerial vehicle (UAV)-mounted mobile LiDAR system (ULS) is widely used for geomatics owing to its efficient data acquisition and convenient operation. However, due to limited carrying capacity of a UAV, sensors integrated in the ULS should be small and lightweight, which results in decrease in the density of the collected scanning points. This affects registration between image data and point cloud data. To address this issue, the authors propose a method for registering and fusing ULS sequence images and laser point clouds, wherein they convert the problem of registering point cloud data and image data into a problem of matching feature points between the two images. First, a point cloud is selected to produce an intensity image. Subsequently, the corresponding feature points of the intensity image and the optical image are matched, and exterior orientation parameters are solved using a collinear equation based on image position and orientation. Finally, the sequence images are fused with the laser point cloud, based on the Global Navigation Satellite System (GNSS) time index of the optical image, to generate a true color point cloud. The experimental results show the higher registration accuracy and fusion speed of the proposed method, thereby demonstrating its accuracy and effectiveness.


Author(s):  
Jesús Benito-Picazo ◽  
Ezequiel López-Rubio ◽  
Enrique Domínguez

Although last improvements in both physical storage technologies and image handling techniques have eased image managing processes, the large amount of information handled nowadays constantly demands more efficient ways to store and transmit image data streams. Among other alternatives for such purpose, the authors find color quantization, which consists of color indexing for minimal perceptual distortion image compression. In this context, artificial intelligence-based algorithms and more specifically, Artificial Neural Networks, have been consolidated as a powerful tool for unsupervised tasks, and therefore, for color quantization purposes. In this work, a novel approach to color quantization is presented based on the Growing Neural Forest (GNF), which is a Growing Neural Gas (GNG) variation where a set of trees is learnt instead of a general graph. Experimental results support the use of GNF for image quantization tasks where it overcomes other self-organized models including SOM, GHSOM and GNG. Future work will include more datasets and different competitive models to compare to.


2020 ◽  
Vol 12 (10) ◽  
pp. 1544 ◽  
Author(s):  
Fabien H. Wagner ◽  
Ricardo Dalagnol ◽  
Yuliya Tarabalka ◽  
Tassiana Y. F. Segantine ◽  
Rogério Thomé ◽  
...  

Currently, there exists a growing demand for individual building mapping in regions of rapid urban growth in less-developed countries. Most existing methods can segment buildings but cannot discriminate adjacent buildings. Here, we present a new convolutional neural network architecture (CNN) called U-net-id that performs building instance segmentation. The proposed network is trained with WorldView-3 satellite RGB images (0.3 m) and three different labeled masks. The first is the building mask; the second is the border mask, which is the border of the building segment with 4 pixels added outside and 3 pixels inside; and the third is the inner segment mask, which is the segment of the building diminished by 2 pixels. The architecture consists of three parallel paths, one for each mask, all starting with a U-net model. To accurately capture the overlap between the masks, all activation layers of the U-nets are copied and concatenated on each path and sent to two additional convolutional layers before the output activation layers. The method was tested with a dataset of 7563 manually delineated individual buildings of the city of Joanópolis-SP, Brazil. On this dataset, the semantic segmentation showed an overall accuracy of 97.67% and an F1-Score of 0.937 and the building individual instance segmentation showed good performance with a mean intersection over union (IoU) of 0.582 (median IoU = 0.694).


2013 ◽  
Vol 760-762 ◽  
pp. 1556-1561
Author(s):  
Ting Wei Du ◽  
Bo Liu

Indoor scene understanding based on the depth image data is a cutting-edge issue in the field of three-dimensional computer vision. Taking the layout characteristics of the indoor scenes and more plane features in these scenes into account, this paper presents a depth image segmentation method based on Gauss Mixture Model clustering. First, transform the Kinect depth image data into point cloud which is in the form of discrete three-dimensional point data, and denoise and down-sample the point cloud data; second, calculate the point normal of all points in the entire point cloud, then cluster the entire normal using Gaussian Mixture Model, and finally implement the entire point clouds segmentation by RANSAC algorithm. Experimental results show that the divided regions have obvious boundaries and segmentation quality is above normal, and lay a good foundation for object recognition.


Sign in / Sign up

Export Citation Format

Share Document