Modelling current parna distribution in a local area

G. K. Summerell; T. I. Dowling; D. P. Richardson; J. Walker; B. Lees

doi:10.1071/sr98100

Modelling current parna distribution in a local area

Soil Research ◽

10.1071/sr98100 ◽

2000 ◽

Vol 38 (4) ◽

pp. 867 ◽

Cited By ~ 10

Author(s):

G. K. Summerell ◽

T. I. Dowling ◽

D. P. Richardson ◽

J. Walker ◽

B. Lees

Keyword(s):

Slope Angle ◽

Training Data ◽

Leeward Side ◽

Continuous Variables ◽

Learning Program ◽

Training Set ◽

Data Set ◽

Landscape Variables ◽

Eastern Australia ◽

Central Ridge

Parna is a wind-blown clay, mobilised from inland Australia as the result of a series of intermittent high wind events during the Quaternary. Parna can be recognised on the basis of colour, texture, distributional patterns, and pedology. Parna deposits have been recorded across a wide area of south eastern Australia and have influenced the local pedology and hydrology. In some cases parna has increased soil sodicity and the potential for dryland salinisation. Predicting its spatial distribution is useful when considering agricultural potential and in assessing the risk and spatial spread of dryland salinity. Here we present the results of modelling to predict its local distribution in an area covering 291 km2 in the Young district of NSW. Two conceptual models of parna deposition and subsequent redistribution were used to develop a current parna distribution map: (a) deposition = f(topography, aspect) after assuming that interactions of rainfall, vegetation, and wind speed were relatively the same at the local scale; (b) removal or retention = f (slope angle, catchment size, slope length) as a representation of the erosive energy of gravity. Five landscape variables, elevation, aspect, slope, flow accumulation, and flow length, were derived from a 20 m digital elevation model (DEM). A training set of parna deposits was established using air photos and field survey from limited exposures in the Young district of NSW. These areas were digitised and converted to a grid of areas of parna and no-parna. This training set for parna and the 5 landscape variable grids were processed in the IDRISI for WINDOWS Geographic Information System (GIS). Spatial relationships between the parna and no-parna deposits and the 5 landscape variables were extracted from this training set. This information was imported into an inductive learning program called KnowledgeSEEKER. A decision tree was built by recursive partitioning of the data set using Chi-squares to categorise variables, and an F test for continuous variables to best replicate the training data classification of ‘parna’ and ‘no-parna’. The rules derived from this process were applied to the study area to predict the occurrence of parna in the broader landscape. Predictions were field checked and the rules adjusted until they best represented the occurrence of parna in the field. The final model showed predictions of parna deposits as follows: (i) higher elevations in the Young landscape were the dominant sites of parna deposits; (ii) thicker deposits of parna occurred on the windward south-west and north-west; (iii) thinner deposits occurred on the leeward side of a central ridge feature; (iv) because the training set concentrated around the major central ridge feature, poorer predictions were obtained on gently undulating country.

Download Full-text

Study on Consistency Analysis in Text Categorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.539.181 ◽

2014 ◽

Vol 539 ◽

pp. 181-184

Author(s):

Wan Li Zuo ◽

Zhi Yan Wang ◽

Ning Ma ◽

Hong Liang

Keyword(s):

Text Categorization ◽

Training Data ◽

Experimental Result ◽

Final Decision ◽

Consistency Analysis ◽

Training Set ◽

Weak Classifier ◽

Data Set ◽

Basic Premise

Accurate classification of text is a basic premise of extracting various types of information on the Web efficiently and utilizing the network resources properly. In this paper, a brand new text classification method was proposed. Consistency analysis method is a type of iterative algorithm, which mainly trains different classifiers (weak classifier) by aiming at the same training set, and then these classifiers will be gathered for testing the consistency degrees of various classification methods for the same text, thus to manifest the knowledge of each type of classifier. It main determines the weight of each sample according to the fact is the classification of each sample is accurate in each training set, as well as the accuracy of the last overall classification, and then sends the new data set whose weight has been modified to the subordinate classifier for training. In the end, the classifier gained in the training will be integrated as the final decision classifier. The classifier with consistency analysis can eliminate some unnecessary training data characteristics and place the key words on key training data. According to the experimental result, the average accuracy of this method is 91.0%, while the average recall rate is 88.1%.

Download Full-text

On Realistically Attacking Tor with Website Fingerprinting

Proceedings on Privacy Enhancing Technologies ◽

10.1515/popets-2016-0027 ◽

2016 ◽

Vol 2016 (4) ◽

pp. 21-36 ◽

Cited By ~ 25

Author(s):

Tao Wang ◽

Ian Goldberg

Keyword(s):

Background Noise ◽

Laboratory Tests ◽

Training Data ◽

Web Traffic ◽

Training Set ◽

Data Set ◽

Laboratory Conditions ◽

Testing Data ◽

In The Wild ◽

New Algorithms

Abstract Website fingerprinting allows a local, passive observer monitoring a web-browsing client’s encrypted channel to determine her web activity. Previous attacks have shown that website fingerprinting could be a threat to anonymity networks such as Tor under laboratory conditions. However, there are significant differences between laboratory conditions and realistic conditions. First, in laboratory tests we collect the training data set together with the testing data set, so the training data set is fresh, but an attacker may not be able to maintain a fresh data set. Second, laboratory packet sequences correspond to a single page each, but for realistic packet sequences the split between pages is not obvious. Third, packet sequences may include background noise from other types of web traffic. These differences adversely affect website fingerprinting under realistic conditions. In this paper, we tackle these three problems to bridge the gap between laboratory and realistic conditions for website fingerprinting. We show that we can maintain a fresh training set with minimal resources. We demonstrate several classification-based techniques that allow us to split full packet sequences effectively into sequences corresponding to a single page each. We describe several new algorithms for tackling background noise. With our techniques, we are able to build the first website fingerprinting system that can operate directly on packet sequences collected in the wild.

Download Full-text

Genome-Wide Identification of a Novel Autophagy-Related Signature for Colorectal Cancer

Dose-Response ◽

10.1177/1559325819894179 ◽

2019 ◽

Vol 17 (4) ◽

pp. 155932581989417 ◽

Cited By ~ 6

Author(s):

Zhi Huang ◽

Jie Liu ◽

Liang Luo ◽

Pan Sheng ◽

Biao Wang ◽

...

Keyword(s):

Colorectal Cancer ◽

Signaling Pathway ◽

Risk Score ◽

Low Risk ◽

Training Data ◽

The Cancer Genome Atlas ◽

Training Set ◽

Data Set ◽

Validation Set ◽

Cox Analysis

Background: Plenty of evidence has suggested that autophagy plays a crucial role in the biological processes of cancers. This study aimed to screen autophagy-related genes (ARGs) and establish a novel a scoring system for colorectal cancer (CRC). Methods: Autophagy-related genes sequencing data and the corresponding clinical data of CRC in The Cancer Genome Atlas were used as training data set. The GSE39582 data set from the Gene Expression Omnibus was used as validation set. An autophagy-related signature was developed in training set using univariate Cox analysis followed by stepwise multivariate Cox analysis and assessed in the validation set. Then we analyzed the function and pathways of ARGs using Gene Ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Finally, a prognostic nomogram combining the autophagy-related risk score and clinicopathological characteristics was developed according to multivariate Cox analysis. Results: After univariate and multivariate analysis, 3 ARGs were used to construct autophagy-related signature. The KEGG pathway analyses showed several significantly enriched oncological signatures, such as p53 signaling pathway, apoptosis, human cytomegalovirus infection, platinum drug resistance, necroptosis, and ErbB signaling pathway. Patients were divided into high- and low-risk groups, and patients with high risk had significantly shorter overall survival (OS) than low-risk patients in both training set and validation set. Furthermore, the nomogram for predicting 3- and 5-year OS was established based on autophagy-based risk score and clinicopathologic factors. The area under the curve and calibration curves indicated that the nomogram showed well accuracy of prediction. Conclusions: Our proposed autophagy-based signature has important prognostic value and may provide a promising tool for the development of personalized therapy.

Download Full-text

Flash-Flood Susceptibility Assessment Using Multi-Criteria Decision Making and Machine Learning Supported by Remote Sensing and GIS Techniques

Remote Sensing ◽

10.3390/rs12010106 ◽

2019 ◽

Vol 12 (1) ◽

pp. 106 ◽

Cited By ~ 22

Author(s):

Romulus Costache ◽

Quoc Bao Pham ◽

Ehsan Sharifi ◽

Nguyen Thi Thuy Linh ◽

S.I. Abba ◽

...

Keyword(s):

Remote Sensing ◽

Information Gain ◽

Flash Flood ◽

Slope Angle ◽

Training Data ◽

Operating Characteristics ◽

Predictive Capability ◽

Data Set ◽

Flood Susceptibility ◽

Remote Sensing Techniques

Concerning the significant increase in the negative effects of flash-floods worldwide, the main goal of this research is to evaluate the power of the Analytical Hierarchy Process (AHP), fi (kNN), K-Star (KS) algorithms and their ensembles in flash-flood susceptibility mapping. To train the two stand-alone models and their ensembles, for the first stage, the areas affected in the past by torrential phenomena are identified using remote sensing techniques. Approximately 70% of these areas are used as a training data set along with 10 flash-flood predictors. It should be remarked that the remote sensing techniques play a crucial role in obtaining eight out of 10 flash-flood conditioning factors. The predictive capability of predictors is evaluated through the Information Gain Ratio (IGR) method. As expected, the slope angle results in the factor with the highest predictive capability. The application of the AHP model implies the construction of ten pair-wise comparison matrices for calculating the normalized weights of each flash-flood predictor. The computed weights are used as input data in kNN–AHP and KS–AHP ensemble models for calculating the Flash-Flood Potential Index (FFPI). The FFPI also is determined through kNN and KS stand-alone models. The performance of the models is evaluated using statistical metrics (i.e., sensitivity, specificity and accuracy) while the validation of the results is done by constructing the Receiver Operating Characteristics (ROC) Curve and Area Under Curve (AUC) values and by calculating the density of torrential pixels within FFPI classes. Overall, the best performance is obtained by the kNN–AHP ensemble model.

Download Full-text

Utilizing the Road Mark Training Set from Ground-Based Mapping System to Airborne Imagery in Deep Learning Framework

Abstracts of the ICA ◽

10.5194/ica-abs-1-364-2019 ◽

2019 ◽

Vol 1 ◽

pp. 1-1

Author(s):

Tee-Ann Teo

Keyword(s):

Neural Network ◽

Deep Learning ◽

Spatial Resolution ◽

Image Features ◽

Training Data ◽

Training Set ◽

Data Set ◽

The Road ◽

Mapping System ◽

Close Range

Abstract. Deep Learning is a kind of Machine Learning technology which utilizing the deep neural network to learn a promising model from a large training data set. Convolutional Neural Network (CNN) has been successfully applied in image segmentation and classification with high accuracy results. The CNN applies multiple kernels (also called filters) to extract image features via image convolution. It is able to determine multiscale features through the multiple layers of convolution and pooling processes. The variety of training data plays an important role to determine a reliable CNN model. The benchmarking training data for road mark extraction is mainly focused on close-range imagery because it is easier to obtain a close-range image rather than an airborne image. For example, KITTI Vision Benchmark Suite. This study aims to transfer the road mark training data from mobile lidar system to aerial orthoimage in Fully Convolutional Networks (FCN). The transformation of the training data from ground-based system to airborne system may reduce the effort of producing a large training data set.This study uses FCN technology and aerial orthoimage to localize road marks on the road regions. The road regions are first extracted from 2-D large-scale vector map. The input aerial orthoimage is 10&thinsp;cm spatial resolution and the non-road regions are masked out before the road mark localization. The training data are road mark’s polygons, which are originally digitized from ground-based mobile lidar and prepared for the road mark extraction using mobile mapping system. This study reuses these training data and applies them for the road mark extraction using aerial orthoimage. The digitized training road marks are then transformed to road polygon based on mapping coordinates. As the detail of ground-based lidar is much better than the airborne system, the partially occulted parking lot in aerial orthoimage can also be obtained from the ground-based system. The labels (also called annotations) for FCN include road region, non-regions and road mark. The size of a training batch is 500&thinsp;pixel by 500&thinsp;pixel (50&thinsp;m by 50&thinsp;m on the ground), and the total number of training batches for training is 75 batches. After the FCN training stage, an independent aerial orthoimage (Figure 1a) is applied to predict the road marks. The results of FCN provide initial regions for road marks (Figure 1b). Usually, road marks show higher reflectance than road asphalts. Therefore, this study uses this characteristic to refine the road marks (Figure 1c) by a binary classification inside the initial road mark’s region.To compare the automatically extracted road marks (Figure 1c) and manually digitized road marks (Figure 1d), most road marks can be extracted using the training set from ground-based system. This study also selects an area of 600&thinsp;m&thinsp;&times;&thinsp;200&thinsp;m in quantitative analysis. Among the 371 reference road marks, 332 can be extracted from proposed scheme, and the completeness reached 89%. The preliminary experiment demonstrated that most road marks can be successfully extracted by the proposed scheme. Therefore, the training data from the ground-based mapping system can be utilized in airborne orthoimage in similar spatial resolution.

Download Full-text

Neural-Network-Based System for Novel Fault Detection in Rotating Machinery

Journal of Vibration and Control ◽

10.1177/1077546304043543 ◽

2004 ◽

Vol 10 (8) ◽

pp. 1137-1150 ◽

Cited By ~ 23

Author(s):

V. Crupi ◽

E. Guglielmino ◽

G. Milazzo

Keyword(s):

Neural Networks ◽

Health Monitoring ◽

Rotating Machinery ◽

Training Data ◽

Training Set ◽

Data Set ◽

The Past ◽

Machine Health Monitoring ◽

Machine Health ◽

Vibration Signatures

The purpose of this research is the realization of a method for machine health monitoring. The rotating machinery of the Refinery of Milazzo (Italy) was analyzed. A new procedure, incorporating neural networks, was designed and realized to evaluate the vibration signatures and recognize the fault presence. Neural networks have replaced the traditional expert systems, used in the past for the fault diagnosis, because they are a dynamic system and thus adaptable to continuously variable data. The disadvantage of common neural networks is that they need to be trained by real examples of different fault typologies. The innovative aspect of the new procedure is that it allows us to diagnose faults, which are not considered in the training set. This ability was demonstrated by our analysis; the net was able to detect the presence of imbalance and bearing wear, even if these typologies of faults were not present in the training data set.

Download Full-text

A method for adequate selection of training data sets to reconstruct seismic field data using a convolutional U-Net

Geophysics ◽

10.1190/geo2019-0708.1 ◽

2021 ◽

pp. 1-103

Author(s):

Jiho Park ◽

Jihun Choi ◽

Soon Jee Seol ◽

Joongmoo Byun ◽

Young Kim

Keyword(s):

Data Analysis ◽

Seismic Data ◽

Field Data ◽

Training Data ◽

Data Reconstruction ◽

Data Sets ◽

Training Set ◽

Data Set ◽

Seismic Data Reconstruction ◽

Target Data

Deep learning (DL) methods are recently introduced for seismic signal processing. Using DL methods, many researchers have adopted these novel techniques in an attempt to construct a DL model for seismic data reconstruction. The performance of DL-based methods depends heavily on what is learned from the training data. We focus on constructing the DL model that well reflect the features of target data sets. The main goal is to integrate DL with an intuitive data analysis approach that compares similar patterns prior to the DL training stage. We have developed a two-sequential method consisting of two stage: (i) analyzing training and target data sets simultaneously for determining target-informed training set and (ii) training the DL model with this training data set to effectively interpolate the seismic data. Here, we introduce the convolutional autoencoder t-distributed stochastic neighbor embedding (CAE t-SNE) analysis that can provide the insight into the results of interpolation through the analysis of both the training and target data sets prior to DL model training. The proposed method were tested with synthetic and field data. Dense seismic gathers (e.g. common-shot gathers; CSGs) were used as a labeled training data set, and relatively sparse seismic gather (e.g. common-receiver gathers; CRGs) were reconstructed in both cases. The reconstructed results and SNRs demonstrated that the training data can be efficiently selected using CAE t-SNE analysis and the spatial aliasing of CRGs was successfully alleviated by the trained DL model with this training data, which contain target features. These results imply that the data analysis for selecting target-informed training set is very important for successful DL interpolation. Additionally, the proposed analysis method can also be applied to investigate the similarities between training and target data sets for another DL-based seismic data reconstruction tasks.

Download Full-text

An artificial intelligence model (euploid prediction algorithm) can predict embryo ploidy status based on time-lapse data

Reproductive Biology and Endocrinology ◽

10.1186/s12958-021-00864-4 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Bo Huang ◽

Wei Tan ◽

Zhou Li ◽

Lei Jin

Keyword(s):

Artificial Intelligence ◽

Time Lapse ◽

Training Data ◽

Prediction Algorithm ◽

Preliminary Evaluation ◽

Training Set ◽

Data Set ◽

Test Set ◽

Validation Set ◽

Ploidy Status

Abstract Background For the association between time-lapse technology (TLT) and embryo ploidy status, there has not yet been fully understood. TLT has the characteristics of large amount of data and non-invasiveness. If we want to accurately predict embryo ploidy status from TLT, artificial intelligence (AI) technology is a good choice. However, the current work of AI in this field needs to be strengthened. Methods A total of 469 preimplantation genetic testing (PGT) cycles and 1803 blastocysts from April 2018 to November 2019 were included in the study. All embryo images are captured during 5 or 6 days after fertilization before biopsy by time-lapse microscope system. All euploid embryos or aneuploid embryos are used as data sets. The data set is divided into training set, validation set and test set. The training set is mainly used for model training, the validation set is mainly used to adjust the hyperparameters of the model and the preliminary evaluation of the model, and the test set is used to evaluate the generalization ability of the model. For better verification, we used data other than the training data for external verification. A total of 155 PGT cycles from December 2019 to December 2020 and 523 blastocysts were included in the verification process. Results The euploid prediction algorithm (EPA) was able to predict euploid on the testing dataset with an area under curve (AUC) of 0.80. Conclusions The TLT incubator has gradually become the choice of reproductive centers. Our AI model named EPA that can predict embryo ploidy well based on TLT data. We hope that this system can serve all in vitro fertilization and embryo transfer (IVF-ET) patients in the future, allowing embryologists to have more non-invasive aids when selecting the best embryo to transfer.

Download Full-text

FREQUENCY RATIO LANDSLIDE SUSCEPTIBILITY ESTIMATION IN A TROPICAL MOUNTAIN REGION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w8-173-2019 ◽

2019 ◽

Vol XLII-3/W8 ◽

pp. 173-179

Author(s):

D. N. Javier ◽

L. Kumar

Keyword(s):

Landslide Susceptibility ◽

Frequency Ratio ◽

Slope Angle ◽

Training Data ◽

Mountain Region ◽

Slope Aspect ◽

Ratio Method ◽

Validation Data ◽

Data Set ◽

Tropical Mountain

Abstract. In a high-rainfall, landslide-prone region in this tropical mountain region, a landslide database was constructed from high resolution satellite imagery (HRSI), local reports and field observations. The landslide data was divided into training (80%) and validation sets (20%). From the digital elevation model (DEM), scanned maps and HRSI, twelve landslide conditioning factors were derived and analysed in a GIS environment: elevation, slope angle, slope aspect, plan curvature, profile curvature, distance to drainage, soil type, lithology, distance to fault/lineament, land use/land cover, distance to road and normalized difference vegetation index (NDVI). Landslide susceptibility was then estimated using the frequency ratio method as applied on the training data. The detailed procedure is explained herein. The landslide model generated was then evaluated using the validation data set. Results demonstrate that the very high, high, moderate, low and very low susceptibility classes included an average of 86%, 7%, 4%, 3% and 1% of the training cells, and 84%, 7%, 5%, 3% and 1% of the validation cells, respectively. Success and prediction rates obtained were 90% and 89%, respectively. The sound output has discriminated well the landslide prone areas and thus may be used in landslide hazard mitigation for local planning.

Download Full-text

The Effect of Clustering with a minimum Pattern of Features Extraction for Person Recognition

Diyala Journal of Engineering Sciences ◽

10.24237/djes.2021.14211 ◽

2021 ◽

Vol 14 (2) ◽

pp. 120-128

Author(s):

Mohammed Ehsan Safi ◽

Eyad I. Abbas

Keyword(s):

Recognition Rate ◽

Principal Component ◽

Training Data ◽

Data Sets ◽

Training Set ◽

Data Set ◽

Person Recognition ◽

Personal Recognition ◽

Application Requirements ◽

Training Sets

In personal image recognition algorithms, two effective factors govern the system’s evaluation, recognition rate and size of the database. Unfortunately, the recognition rate proportional to the increase in training sets. Consequently, that increases the processing time and memory limitation problems. This paper’s main goal was to present a robust algorithm with minimum data sets and a high recognition rate. Images for ten persons were chosen as a database, nine images for each individual as the full version of the training data set, and one image for each person out of the training set as a test pattern before the database reduction procedure. The proposed algorithm integrates Principal Component Analysis (PCA) as a feature extraction technique with the minimum means of clusters and Euclidean Distance to achieve personal recognition. After indexing the training set for each person, the clustering of the differences is determined. The recognition of the person represented by the minimum mean index; this process returned with each reduction. The experimental results show that the recognition rate is 100% despite reducing the training sets to 44%, while the recognition rate decrease to 70% when the reduction reaches 89%. The clear picture out is the results of the proposed system support the idea of the redaction of training sets in addition to obtaining a high recognition rate based on application requirements.

Download Full-text