CNN Training with Twenty Samples for Crack Detection via Data Augmentation

Zirui Wang; Jingjing Yang; Haonan Jiang; Xueling Fan

doi:10.3390/s20174849

CNN Training with Twenty Samples for Crack Detection via Data Augmentation

Sensors ◽

10.3390/s20174849 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4849

Author(s):

Zirui Wang ◽

Jingjing Yang ◽

Haonan Jiang ◽

Xueling Fan

Keyword(s):

Crack Detection ◽

Data Augmentation ◽

Training Data ◽

Limited Resources ◽

Inference Process ◽

Two Stage ◽

Rotation Method ◽

Model Training ◽

Crack Detector ◽

The Greedy Algorithm

The excellent generalization ability of deep learning methods, e.g., convolutional neural networks (CNNs), depends on a large amount of training data, which is difficult to obtain in industrial practices. Data augmentation is regarded commonly as an effective strategy to address this problem. In this paper, we attempt to construct a crack detector based on CNN with twenty images via a two-stage data augmentation method. In detail, nine data augmentation methods are compared for crack detection in the model training, respectively. As a result, the rotation method outperforms these methods for augmentation, and by an in-depth exploration of the rotation method, the performance of the detector is further improved. Furthermore, data augmentation is also applied in the inference process to improve the recall of trained models. The identical object has more chances to be detected in the series of augmented images. This trick is essentially a performance–resource trade-off. For more improvement with limited resources, the greedy algorithm is adopted for searching a better combination of data augmentation. The results show that the crack detectors trained on the small dataset are significantly improved via the proposed two-stage data augmentation. Specifically, using 20 images for training, recall in detecting the cracks achieves 96% and Fext(0.8), which is a variant of F-score for crack detection, achieves 91.18%.

Download Full-text

The Evaluation of Acute Myeloid Leukaemia (AML) Blood Cell Detection Models Using Different YOLO Approaches

10.1101/2021.08.04.455113 ◽

2021 ◽

Author(s):

Kaung Myat Naing ◽

Veerayuth Kittichai ◽

Teerawat Tongloy ◽

Santhad Chuwongin Chuwongin ◽

Siridech Boonsang

Keyword(s):

Acute Myeloid Leukaemia ◽

Object Detection ◽

Myeloid Leukaemia ◽

Data Augmentation ◽

Training Data ◽

Training Dataset ◽

Cell Detection ◽

Augmentation Techniques ◽

Model Training ◽

Acute Myeloid

This study proposes to evaluate the performance of Acute Myeloid Leukaemia (AML) blast cell detection models in microscopic examination images for faster diagnosis and disease monitoring. One of the popular deep learning algorithms such as You Only Look Once (YOLO) developed for object detection is the successful state-of-the-art algorithms in real-time object detection systems. We employ four versions of the YOLO algorithm: YOLOv3, YOLOv3-Tiny, YOLOv2 and YOLOv2-Tiny for detection of 15-class of AML blood cells in examination images. We also acquired the publicly available dataset from The Cancer Imaging Archive (TCIA), which consists of 18,365 expert-labelled single-cell images. Data augmentation techniques are additionally applied to enhance and balance the training images in the dataset. The overall results indicated that four types of YOLO approach have outstanding performances of more than 92% in precision and sensitivity. In comparison, YOLOv3 has more reliable performance than the other three approaches. Consistently, the AUC values for the four YOLO models are 0.969 (YOLOv3), 0.967 (YOLOv3-Tiny), 0.963 (YOLOv2), and 0.948 (YOLOv2-Tiny). Furthermore, we compare the best model's performance between approaches that use the entire training dataset without using data augmentation techniques and image division with data augmentation techniques. Remarkably, by using 33.51 percent of the training data in model training, the prediction outcomes from the model that used image partitioning with data augmentation were similar to those obtained using the complete training dataset. This work potentially provides a beneficial digital rapid tool in the screening and evaluation of numerous haematological disorders.

Download Full-text

Deep Learning with Open Data for Desert Road Mapping

Remote Sensing ◽

10.3390/rs12142274 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2274

Author(s):

Christopher Stewart ◽

Michele Lazzarini ◽

Adrian Luna ◽

Sergio Albani

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Augmentation ◽

Low Cost ◽

Open Data ◽

Cost Effective ◽

Training Data ◽

Desert Sand ◽

Model Training ◽

Shifting Sand

The availability of free and open data from Earth observation programmes such as Copernicus, and from collaborative projects such as Open Street Map (OSM), enables low cost artificial intelligence (AI) based monitoring applications. This creates opportunities, particularly in developing countries with scarce economic resources, for large–scale monitoring in remote regions. A significant portion of Earth’s surface comprises desert dune fields, where shifting sand affects infrastructure and hinders movement. A robust, cost–effective and scalable methodology is proposed for road detection and monitoring in regions covered by desert sand. The technique uses Copernicus Sentinel–1 synthetic aperture radar (SAR) satellite data as an input to a deep learning model based on the U–Net architecture for image segmentation. OSM data is used for model training. The method comprises two steps: The first involves processing time series of Sentinel–1 SAR interferometric wide swath (IW) acquisitions in the same geometry to produce multitemporal backscatter and coherence averages. These are divided into patches and matched with masks of OSM roads to form the training data, the quantity of which is increased through data augmentation. The second step includes the U–Net deep learning workflow. The methodology has been applied to three different dune fields in Africa and Asia. A performance evaluation through the calculation of the Jaccard similarity coefficient was carried out for each area, and ranges from 84% to 89% for the best available input. The rank distance, calculated from the completeness and correctness percentages, was also calculated and ranged from 75% to 80%. Over all areas there are more missed detections than false positives. In some cases, this was due to mixed infrastructure in the same resolution cell of the input SAR data. Drift sand and dune migration covering infrastructure is a concern in many desert regions, and broken segments in the resulting road detections are sometimes due to sand burial. The results also show that, in most cases, the Sentinel–1 vertical transmit–vertical receive (VV) backscatter averages alone constitute the best input to the U–Net model. The detection and monitoring of roads in desert areas are key concerns, particularly given a growing population increasingly on the move.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Automated Ventricular System Segmentation in Paediatric Patients Treated for Hydrocephalus Using Deep Learning Methods

BioMed Research International ◽

10.1155/2019/3059170 ◽

2019 ◽

Vol 2019 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

Michał Klimont ◽

Mateusz Flieger ◽

Jacek Rzeszutek ◽

Joanna Stachera ◽

Aleksandra Zakrzewska ◽

...

Keyword(s):

Neural Network ◽

Data Augmentation ◽

Ct Images ◽

Policy Transfer ◽

Training Data ◽

Intraobserver Variability ◽

Practical Applications ◽

Brain Scans ◽

Rate Policy ◽

Ct Brain

Hydrocephalus is a common neurological condition that can have traumatic ramifications and can be lethal without treatment. Nowadays, during therapy radiologists have to spend a vast amount of time assessing the volume of cerebrospinal fluid (CSF) by manual segmentation on Computed Tomography (CT) images. Further, some of the segmentations are prone to radiologist bias and high intraobserver variability. To improve this, researchers are exploring methods to automate the process, which would enable faster and more unbiased results. In this study, we propose the application of U-Net convolutional neural network in order to automatically segment CT brain scans for location of CSF. U-Net is a neural network that has proven to be successful for various interdisciplinary segmentation tasks. We optimised training using state of the art methods, including “1cycle” learning rate policy, transfer learning, generalized dice loss function, mixed float precision, self-attention, and data augmentation. Even though the study was performed using a limited amount of data (80 CT images), our experiment has shown near human-level performance. We managed to achieve a 0.917 mean dice score with 0.0352 standard deviation on cross validation across the training data and a 0.9506 mean dice score on a separate test set. To our knowledge, these results are better than any known method for CSF segmentation in hydrocephalic patients, and thus, it is promising for potential practical applications.

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

Concrete Crack Detection Based on Well-Known Feature Extractor Model and the YOLO_v2 Network

Applied Sciences ◽

10.3390/app11020813 ◽

2021 ◽

Vol 11 (2) ◽

pp. 813

Author(s):

Shuai Teng ◽

Zongchao Liu ◽

Gongfa Chen ◽

Li Cheng

Keyword(s):

Feature Extraction ◽

Crack Detection ◽

Computational Cost ◽

Concrete Structures ◽

Detection Algorithm ◽

Computational Time ◽

Image Size ◽

Important Indicator ◽

Feature Extractor ◽

Model Training

This paper compares the crack detection performance (in terms of precision and computational cost) of the YOLO_v2 using 11 feature extractors, which provides a base for realizing fast and accurate crack detection on concrete structures. Cracks on concrete structures are an important indicator for assessing their durability and safety, and real-time crack detection is an essential task in structural maintenance. The object detection algorithm, especially the YOLO series network, has significant potential in crack detection, while the feature extractor is the most important component of the YOLO_v2. Hence, this paper employs 11 well-known CNN models as the feature extractor of the YOLO_v2 for crack detection. The results confirm that a different feature extractor model of the YOLO_v2 network leads to a different detection result, among which the AP value is 0.89, 0, and 0 for ‘resnet18’, ‘alexnet’, and ‘vgg16’, respectively meanwhile, the ‘googlenet’ (AP = 0.84) and ‘mobilenetv2’ (AP = 0.87) also demonstrate comparable AP values. In terms of computing speed, the ‘alexnet’ takes the least computational time, the ‘squeezenet’ and ‘resnet18’ are ranked second and third respectively; therefore, the ‘resnet18’ is the best feature extractor model in terms of precision and computational cost. Additionally, through the parametric study (influence on detection results of the training epoch, feature extraction layer, and testing image size), the associated parameters indeed have an impact on the detection results. It is demonstrated that: excellent crack detection results can be achieved by the YOLO_v2 detector, in which an appropriate feature extractor model, training epoch, feature extraction layer, and testing image size play an important role.

Download Full-text

Fully automated contrast and non-contrast cardiac view detection in echocardiography a multi-centre, multi-vendor study

European Heart Journal ◽

10.1093/ehjci/ehaa946.0078 ◽

2020 ◽

Vol 41 (Supplement_2) ◽

Author(s):

S Gao ◽

D Stojanovski ◽

A Parker ◽

P Marques ◽

S Heitner ◽

...

Keyword(s):

Neural Network ◽

Training Data ◽

Classification Model ◽

Validation Dataset ◽

Funding Source ◽

Private Company ◽

Validation Data ◽

Independent Test ◽

Model Training ◽

Confusion Matrices

Abstract Background Correctly identifying views acquired in a 2D echocardiographic examination is paramount to post-processing and quantification steps often performed as part of most clinical workflows. In many exams, particularly in stress echocardiography, microbubble contrast is used which greatly affects the appearance of the cardiac views. Here we present a bespoke, fully automated convolutional neural network (CNN) which identifies apical 2, 3, and 4 chamber, and short axis (SAX) views acquired with and without contrast. The CNN was tested in a completely independent, external dataset with the data acquired in a different country than that used to train the neural network. Methods Training data comprised of 2D echocardiograms was taken from 1014 subjects from a prospective multisite, multi-vendor, UK trial with the number of frames in each view greater than 17,500. Prior to view classification model training, images were processed using standard techniques to ensure homogenous and normalised image inputs to the training pipeline. A bespoke CNN was built using the minimum number of convolutional layers required with batch normalisation, and including dropout for reducing overfitting. Before processing, the data was split into 90% for model training (211,958 frames), and 10% used as a validation dataset (23,946 frames). Image frames from different subjects were separated out entirely amongst the training and validation datasets. Further, a separate trial dataset of 240 studies acquired in the USA was used as an independent test dataset (39,401 frames). Results Figure 1 shows the confusion matrices for both validation data (left) and independent test data (right), with an overall accuracy of 96% and 95% for the validation and test datasets respectively. The accuracy for the non-contrast cardiac views of >99% exceeds that seen in other works. The combined datasets included images acquired across ultrasound manufacturers and models from 12 clinical sites. Conclusion We have developed a CNN capable of automatically accurately identifying all relevant cardiac views used in “real world” echo exams, including views acquired with contrast. Use of the CNN in a routine clinical workflow could improve efficiency of quantification steps performed after image acquisition. This was tested on an independent dataset acquired in a different country to that used to train the model and was found to perform similarly thus indicating the generalisability of the model. Figure 1. Confusion matrices Funding Acknowledgement Type of funding source: Private company. Main funding source(s): Ultromics Ltd.

Download Full-text

Robust Approach to Supervised Deep Neural Network Training for Real-Time Object Classification in Cluttered Indoor Environment

Applied Sciences ◽

10.3390/app11157148 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7148

Author(s):

Bedada Endale ◽

Abera Tullu ◽

Hayoung Shi ◽

Beom-Soo Kang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Input Data ◽

Deep Neural Network ◽

Data Augmentation ◽

Object Classification ◽

Training Data ◽

Gradient Descent Algorithm

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.

Download Full-text

Rethinking the Random Cropping Data Augmentation Method Used in the Training of CNN-Based SAR Image Ship Detector

Remote Sensing ◽

10.3390/rs13010034 ◽

2020 ◽

Vol 13 (1) ◽

pp. 34

Author(s):

Rong Yang ◽

Robert Wang ◽

Yunkai Deng ◽

Xiaoxue Jia ◽

Heng Zhang

Keyword(s):

Neural Network ◽

Data Augmentation ◽

Back Propagation ◽

Detection Performance ◽

Training Data ◽

Sar Image ◽

Optical Images ◽

The Neural Network ◽

Effective Training ◽

Standard Configuration

The random cropping data augmentation method is widely used to train convolutional neural network (CNN)-based target detectors to detect targets in optical images (e.g., COCO datasets). It can expand the scale of the dataset dozens of times while consuming only a small amount of calculations when training the neural network detector. In addition, random cropping can also greatly enhance the spatial robustness of the model, because it can make the same target appear in different positions of the sample image. Nowadays, random cropping and random flipping have become the standard configuration for those tasks with limited training data, which makes it natural to introduce them into the training of CNN-based synthetic aperture radar (SAR) image ship detectors. However, in this paper, we show that the introduction of traditional random cropping methods directly in the training of the CNN-based SAR image ship detector may generate a lot of noise in the gradient during back propagation, which hurts the detection performance. In order to eliminate the noise in the training gradient, a simple and effective training method based on feature map mask is proposed. Experiments prove that the proposed method can effectively eliminate the gradient noise introduced by random cropping and significantly improve the detection performance under a variety of evaluation indicators without increasing inference cost.

Download Full-text

Gravity Control-Based Data Augmentation Technique for Improving VR User Activity Recognition

Symmetry ◽

10.3390/sym13050845 ◽

2021 ◽

Vol 13 (5) ◽

pp. 845

Author(s):

Dongheun Han ◽

Chulwoo Lee ◽

Hyeongyeop Kang

Keyword(s):

Activity Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Measurement Unit ◽

Gravitational Acceleration ◽

The Neural Network ◽

Typical Data ◽

Robust Recognition ◽

Gravity Acceleration

The neural-network-based human activity recognition (HAR) technique is being increasingly used for activity recognition in virtual reality (VR) users. The major issue of a such technique is the collection large-scale training datasets which are key for deriving a robust recognition model. However, collecting large-scale data is a costly and time-consuming process. Furthermore, increasing the number of activities to be classified will require a much larger number of training datasets. Since training the model with a sparse dataset can only provide limited features to recognition models, it can cause problems such as overfitting and suboptimal results. In this paper, we present a data augmentation technique named gravity control-based augmentation (GCDA) to alleviate the sparse data problem by generating new training data based on the existing data. The benefits of the symmetrical structure of the data are that it increased the number of data while preserving the properties of the data. The core concept of GCDA is two-fold: (1) decomposing the acceleration data obtained from the inertial measurement unit (IMU) into zero-gravity acceleration and gravitational acceleration, and augmenting them separately, and (2) exploiting gravity as a directional feature and controlling it to augment training datasets. Through the comparative evaluations, we validated that the application of GCDA to training datasets showed a larger improvement in classification accuracy (96.39%) compared to the typical data augmentation methods (92.29%) applied and those that did not apply the augmentation method (85.21%).

Download Full-text