Data Balancing Based on Pre-Training Strategy for Liver Segmentation from CT Scans

Yong Zhang; Yi Wang; Yizhu Wang; Bin Fang; Wei Yu; Hongyu Long; Hancheng Lei

doi:10.3390/app9091825

Data Balancing Based on Pre-Training Strategy for Liver Segmentation from CT Scans

Applied Sciences ◽

10.3390/app9091825 ◽

2019 ◽

Vol 9 (9) ◽

pp. 1825 ◽

Cited By ~ 1

Author(s):

Yong Zhang ◽

Yi Wang ◽

Yizhu Wang ◽

Bin Fang ◽

Wei Yu ◽

...

Keyword(s):

Ct Scans ◽

Training Dataset ◽

Liver Segmentation ◽

Prediction Ability ◽

Training Strategy ◽

Data Imbalance ◽

Model Training ◽

Segmentation Task ◽

Spleen Segmentation ◽

Hard Samples

Data imbalance is often encountered in deep learning process and is harmful to model training. The imbalance of hard and easy samples in training datasets often occurs in the segmentation tasks from Contrast Tomography (CT) scans. However, due to the strong similarity between adjacent slices in volumes and different segmentation tasks (the same slice may be classified as a hard sample in liver segmentation task, but an easy sample in the kidney or spleen segmentation task), it is hard to solve this imbalance of training dataset using traditional methods. In this work, we use a pre-training strategy to distinguish hard and easy samples, and then increase the proportion of hard slices in training dataset, which could mitigate imbalance of hard samples and easy samples in training dataset, and enhance the contribution of hard samples in training process. Our experiments on liver, kidney and spleen segmentation show that increasing the ratio of hard samples in the training dataset could enhance the prediction ability of model by improving its ability to deal with hard samples. The main contribution of this work is the application of pre-training strategy, which enables us to select training samples online according to different tasks and to ease data imbalance in the training dataset.

Download Full-text

CNN BASED DETECTION OF BUILDING ROOFS FROM HIGH RESOLUTION SATELLITE IMAGES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w10-187-2020 ◽

2020 ◽

Vol XLII-3/W10 ◽

pp. 187-192

Author(s):

L. Hang ◽

G. Y. Cai

Keyword(s):

High Resolution ◽

Residential Building ◽

Training Image ◽

Training Dataset ◽

Classification Approach ◽

Object Based ◽

High Resolution Satellite Images ◽

Different Types ◽

Model Training ◽

Very High

Abstract. The detection and reconstruction of building have attracted more attention in the community of remote sensing and computer vision. Light detection and ranging (LiDAR) has been proved to be a good way to extract building roofs, while we have to face the problem of data shortage for most of the time. In this paper, we tried to extract the building roofs from very high resolution (VHR) images of Chinese satellite Gaofen-2 by employing convolutional neural network (CNN). It has been proved that the CNN is of a higher capability of recognizing detailed features which may not be classified out by object-based classification approach. Several major steps are concerned in this study, such as generation of training dataset, model training, image segmentation and building roofs recognition. First, urban objects such as trees, roads, squares and buildings were classified based on random forest algorithm by an object-oriented classification approach, the building regions were separated from other classes at the aid of visually interpretation and correction; Next, different types of building roofs mainly categorized by color and size information were trained using the trained CNN. Finally, the industrial and residential building roofs have been recognized individually and the results have been validated individually. The assessment results prove effectiveness of the proposed method with approximately 91% and 88% of quality rates in detection industrial and residential building roofs, respectively. Which means that the CNN approach is prospecting in detecting buildings with a very higher accuracy.

Download Full-text

Liver Segmentation from CT Scans: A Survey

Applications of Fuzzy Sets Theory - Lecture Notes in Computer Science ◽

10.1007/978-3-540-73400-0_66 ◽

2007 ◽

pp. 520-528 ◽

Cited By ~ 17

Author(s):

Paola Campadelli ◽

Elena Casiraghi

Keyword(s):

Ct Scans ◽

Liver Segmentation

Download Full-text

Enhanced Early Warning Diagnostic Rules for Gas Turbines Leveraging on Bayesian Networks

Volume 5: Controls, Diagnostics, and Instrumentation; Cycle Innovations; Cycle Innovations: Energy Storage ◽

10.1115/gt2020-16082 ◽

2020 ◽

Author(s):

Ernesto Escobedo ◽

Liliana Arguello ◽

Marzia Sepe ◽

Ilaria Parrella ◽

Stefano Cioncolini ◽

...

Keyword(s):

Gas Turbines ◽

Probabilistic Approach ◽

Training Dataset ◽

Auxiliary Equipment ◽

Data Set ◽

Deep Dive ◽

Industrial Systems ◽

Business Application ◽

Model Training ◽

Selection Of

Abstract The monitoring and diagnostics of Industrial systems is increasing in complexity with larger volume of data collected and with many methods and analytics able to correlate data and events. The setup and training of these methods and analytics are one of the impacting factors in the selection of the most appropriate solution to provide an efficient and effective service, that requires the selection of the most suitable data set for training of models with consequent need of time and knowledge. The study and the related experiences proposed in this paper describe a methodology for tracking features, detecting outliers and derive, in a probabilistic way, diagnostic thresholds to be applied by means of hierarchical models that simplify or remove the selection of the proper training dataset by a subject matter expert at any deployment. This method applies to Industrial systems employing a large number of similar machines connected to a remote data center, with the purpose to alert one or more operators when a feature exceeds the healthy distribution. Some relevant use cases are presented for an aeroderivative gas turbine covering also its auxiliary equipment, with deep dive on the hydraulic starting system. The results, in terms of early anomaly detection and reduced model training effort, are compared with traditional monitoring approaches like fixed threshold. Moreover, this study explains the advantages of this probabilistic approach in a business application like the fleet monitoring and diagnostic advanced services.

Download Full-text

Automatic liver segmentation from CT scans based on a statistical shape model

2010 Annual International Conference of the IEEE Engineering in Medicine and Biology ◽

10.1109/iembs.2010.5626470 ◽

2010 ◽

Cited By ~ 1

Author(s):

Xing Zhang ◽

Jie Tian ◽

Kexin Deng ◽

Yongfang Wu ◽

Xiuli Li

Keyword(s):

Statistical Shape Model ◽

Ct Scans ◽

Liver Segmentation ◽

Shape Model ◽

Statistical Shape

Download Full-text

Cooperative Hybrid Semi-Supervised Learning for Text Sentiment Classification

Symmetry ◽

10.3390/sym11020133 ◽

2019 ◽

Vol 11 (2) ◽

pp. 133 ◽

Cited By ~ 2

Author(s):

Yang Li ◽

Ying Lv ◽

Suge Wang ◽

Jiye Liang ◽

Juanzi Li ◽

...

Keyword(s):

Supervised Learning ◽

Large Scale ◽

Ensemble Classifier ◽

Sentiment Classification ◽

Training Dataset ◽

Support Vector ◽

Seed Selection ◽

Training Strategy ◽

Whole Process ◽

Self Learning

A large-scale and high-quality training dataset is an important guarantee to learn an ideal classifier for text sentiment classification. However, manually constructing such a training dataset with sentiment labels is a labor-intensive and time-consuming task. Therefore, based on the idea of effectively utilizing unlabeled samples, a synthetical framework that covers the whole process of semi-supervised learning from seed selection, iterative modification of the training text set, to the co-training strategy of the classifier is proposed in this paper for text sentiment classification. To provide an important basis for selecting the seed texts and modifying the training text set, three kinds of measures—the cluster similarity degree of an unlabeled text, the cluster uncertainty degree of a pseudo-label text to a learner, and the reliability degree of a pseudo-label text to a learner—are defined. With these measures, a seed selection method based on Random Swap clustering, a hybrid modification method of the training text set based on active learning and self-learning, and an alternately co-training strategy of the ensemble classifier of the Maximum Entropy and Support Vector Machine are proposed and combined into our framework. The experimental results on three Chinese datasets (COAE2014, COAE2015, and a Hotel review, respectively) and five English datasets (Books, DVD, Electronics, Kitchen, and MR, respectively) in the real world verify the effectiveness of the proposed framework.

Download Full-text

ENHANCED TRAINING FOR THE LOCALLY RECURRENT PROBABILISTIC NEURAL NETWORKS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213009000433 ◽

2009 ◽

Vol 18 (06) ◽

pp. 853-881 ◽

Cited By ~ 4

Author(s):

TODOR GANCHEV

Keyword(s):

Neural Networks ◽

Fitness Function ◽

Training Data ◽

Training Dataset ◽

Automatic Process ◽

Training Procedure ◽

Posterior Probabilities ◽

Probabilistic Neural Networks ◽

Training Strategy ◽

Locally Recurrent

In the present contribution we propose an integral training procedure for the Locally Recurrent Probabilistic Neural Networks (LR PNNs). Specifically, the adjustment of the smoothing factor "sigma" in the pattern layer of the LR PNN and the training of the recurrent layer weights are integrated in an automatic process that iteratively estimates all adjustable parameters of the LR PNN from the available training data. Furthermore, in contrast to the original LR PNN, whose recurrent layer was trained to provide optimum separation among the classes on the training dataset, while striving to keep a balance between the learning rates for all classes, here the training strategy is oriented towards optimizing the overall classification accuracy, straightforwardly. More precisely, the new training strategy directly targets at maximizing the posterior probabilities for the target class and minimizing the posterior probabilities estimated for the non-target classes. The new fitness function requires fewer computations for each evaluation, and therefore the overall computational demands for training the recurrent layer weights are reduced. The performance of the integrated training procedure is illustrated on three different speech processing tasks: emotion recognition, speaker identification and speaker verification.

Download Full-text

A new iterative method for liver segmentation from perfusion CT scans

10.1117/12.2043576 ◽

2014 ◽

Author(s):

Ahmed Draoua ◽

Adélaïde Albouy-Kissi ◽

Antoine Vacavant ◽

Vincent Sauvage

Keyword(s):

Iterative Method ◽

Perfusion Ct ◽

Ct Scans ◽

Liver Segmentation ◽

New Iterative Method

Download Full-text

Methodology for Collecting a Training Dataset for an Intrusion Detection Model

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(5)-5 ◽

2021 ◽

Vol 33 (5) ◽

pp. 83-104

Author(s):

Aleksandr Igorevich Getman ◽

Maxim Nikolaevich Goryunov ◽

Andrey Georgievich Matskevich ◽

Dmitry Aleksandrovich Rybolovlev

Keyword(s):

Attack Detection ◽

Training Data ◽

Training Dataset ◽

Training Models ◽

The Public ◽

Detection Model ◽

Computer Attacks ◽

Model Training ◽

Public Datasets

The paper discusses the issues of training models for detecting computer attacks based on the use of machine learning methods. The results of the analysis of publicly available training datasets and tools for analyzing network traffic and identifying features of network sessions are presented sequentially. The drawbacks of existing tools and possible errors in the datasets formed with their help are noted. It is concluded that it is necessary to collect own training data in the absence of guarantees of the public datasets reliability and the limited use of pre-trained models in networks with characteristics that differ from the characteristics of the network in which the training traffic was collected. A practical approach to generating training data for computer attack detection models is proposed. The proposed solutions have been tested to evaluate the quality of model training on the collected data and the quality of attack detection in conditions of real network infrastructure.

Download Full-text

A Five-Gene Prognostic Nomogram Predicting Disease-Free Survival of Differentiated Thyroid Cancer

Disease Markers ◽

10.1155/2021/5510780 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Pan Ruchong ◽

Tang Haiping ◽

Wang Xiang

Keyword(s):

Gene Expression ◽

Thyroid Cancer ◽

Differentiated Thyroid Cancer ◽

Disease Free Survival ◽

Gene Signature ◽

Training Dataset ◽

Prediction Ability ◽

Free Survival ◽

Testing Dataset ◽

Disease Free

Background. Differentiated thyroid cancer (DTC) is the most common type of thyroid tumor with a high recurrence rate. Here, we developed a nomogram to effectively predict postoperative disease-free survival (DFS) in DTC patients. Methods. The mRNA expressions and clinical data of DTC patients were downloaded from the Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO) database. Seventy percent of patients were randomly selected as the training dataset, and thirty percent of patients were classified into the testing dataset. Multivariate Cox regression analysis was adopted to establish a nomogram to predict 1-year, 3-year, and 5-year DFS rate of DTC patients. Results. A five-gene signature comprised of TENM1, FN1, APOD, F12, and BTNL8 genes was established to predict the DFS rate of DTC patients. Results from the concordance index (C-index), area under curve (AUC), and calibration curve showed that both the training dataset and the testing dataset exhibited good prediction ability, and they were superior to other traditional models. The risk score and distant metastasis (M) of the five-gene signature were independent risk factors that affected DTC recurrence. A nomogram that could predict 1-year, 3-year, and 5-year DFS rate of DTC patients was established with a C-index of 0.801 (95% CI: 0.736, 0.866). Conclusion. Our study developed a prediction model based on the gene expression and clinical characteristics to predict the DFS rate of DTC patients, which may be applied to more accurately assess patient prognosis and individualized treatment.

Download Full-text

Systematic comparison of incomplete-supervision approaches for biomedical imaging classification

10.21203/rs.3.rs-798207/v1 ◽

2021 ◽

Author(s):

Sayedali Shetab Boushehri ◽

Ahmad Qasim ◽

Dominik Waibel ◽

Fabian Schmich ◽

Carsten Marr

Keyword(s):

Active Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

Image Data ◽

Classification Performance ◽

Natural Image ◽

Training Methods ◽

Training Strategy ◽

Biomedical Image ◽

Model Training

Abstract Deep learning based classification of biomedical images requires manual annotation by experts, which is time-consuming and expensive. Incomplete-supervision approaches including active learning, pre-training and semi-supervised learning address this issue and aim to increase classification performance with a limited number of annotated images. Up to now, these approaches have been mostly benchmarked on natural image datasets, where image complexity and class balance typically differ considerably from biomedical classification tasks. In addition, it is not clear how to combine them to improve classification performance on biomedical image data. We thus performed an extensive grid search combining seven active learning algorithms, three pre-training methods and two training strategies as well as respective baselines (random sampling, random initialization, and supervised learning). For four biomedical datasets, we started training with 1% of labeled data and increased it by 5% iteratively, using 4-fold cross-validation in each cycle. We found that the contribution of pre-training and semi-supervised learning can reach up to 20% macro F1-score in each cycle. In contrast, the state-of-the-art active learning algorithms contribute less than 5% to macro F1-score in each cycle. Based on performance, implementation ease and computation requirements, we recommend the combination of BADGE active learning, ImageNet-weights pre-training, and pseudo-labeling as training strategy, which reached over 90% of fully supervised results with only 25% of annotated data for three out of four datasets. We believe that our study is an important step towards annotation and resource efficient model training for biomedical classification challenges.

Download Full-text