Risk Assessment of Modern Pipelines

Volume 4: Pipelining in Northern and Offshore Environments; Strain-Based Design; Risk and Reliability; Standards and Regulations ◽

10.1115/ipc2012-90072 ◽

2012 ◽

Author(s):

James N. Mihell ◽

Cameron Rout

Keyword(s):

Material Properties ◽

Limit State ◽

Data Sets ◽

Feature Size ◽

Data Set ◽

Reliability Estimates ◽

Source Data ◽

Adjustment Factors ◽

Incident Data ◽

Selection Of

Proponents of new pipeline projects are often asked by regulators to provide estimates of risk and reliability for their proposed pipeline. On existing pipelines, the availability of operating and assessment data is generally considered to be essential to the task of performing an accurate and defendable risk or reliability assessment. For proposed or new pipelines, the absence of these data presents a significant challenge to those performing the analysis. The reliance on industry incident data presents problems, since the vast majority of loss-of-containment incidents relate to older pipelines in which the design, routing criteria, material properties, material manufacturing processes, and early operating practices differ significantly from those that are characteristic of modern pipelines. As a consequence, much of the available failure incident data does not accurately reflect the threats or the magnitudes of the threats that are associated with modern pipelines. In order to address this problem, ‘adjustment factors’ are often applied against incident data to try to account for threat differences between the source data and the intended application. The selection of these adjustment factors can often be quite subjective, however, and open to judgment; therefore, they can be difficult to justify. With the rapidly growing practice of regular in-line inspection (ILI) on transmission pipelines, an extensive repository of ILI data has been accumulated — much of it relating to modern pipelines. Through the judicious selection of source data, ILI data sets can be mined so that an analogue data set can be created that constitutes a reasonable representation of the attributes of reliability of a specific new pipeline of interest. Key reliability properties, such as tool error distribution, feature incidence rate, feature size distribution, and apparent feature growth rate distribution can be derived from such analogue data. By applying these reliability properties in an analysis along with known pipeline design and material properties and their associated distributions, and by taking consideration of planned inspection intervals, a reliability basis can be derived for estimating pipeline risk and reliability. Estimates of risk and reliability that are derived in this manner employ methodologies that are repeatable, defendable, transparent, and free of subjectivity. This paper outlines an approach for completing risk and reliability estimates on new pipelines, and presents the results of some sample calculations. The reliability estimates illustrated are based on an approach whereby corrosion feature size and growth rates are obtained from analogue ILI datasets, and treated as random variables. In that regard, they constitute the probability of exceeding a limit state that represents an approximation of the condition for failure.

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text

The Gamma Test

Heuristic and Optimization for Knowledge Discovery ◽

10.4018/978-1-930708-26-6.ch009 ◽

2011 ◽

pp. 142-167 ◽

Cited By ~ 10

Author(s):

Antonia J. Jones ◽

Dafydd Evans ◽

Steve Margetts ◽

Peter J. Durrant

Keyword(s):

Predictive Models ◽

Time Complexity ◽

Numerical Data ◽

Data Sets ◽

Analysis Tool ◽

Embedding Dimension ◽

Gamma Test ◽

Data Set ◽

Modelling Analysis ◽

Selection Of

The Gamma Test is a non-linear modelling analysis tool that allows us to quantify the extent to which a numerical input/output data set can be expressed as a smooth relationship. In essence, it allows us to efficiently calculate that part of the variance of the output that cannot be accounted for by the existence of any smooth model based on the inputs, even though this model is unknown. A key aspect of this tool is its speed: the Gamma Test has time complexity O(Mlog M), where M is the number of datapoints. For data sets consisting of a few thousand points and a reasonable number of attributes, a single run of the Gamma Test typically takes a few seconds. In this chapter we will show how the Gamma Test can be used in the construction of predictive models and classifiers for numerical data. In doing so, we will demonstrate the use of this technique for feature selection, and for the selection of embedding dimension when dealing with a time-series.

Download Full-text

A model of the distribution and metabolism of corticotropin-releasing factor

AJP Endocrinology and Metabolism ◽

10.1152/ajpendo.1988.254.1.e104 ◽

1988 ◽

Vol 254 (1) ◽

pp. E104-E112

Author(s):

B. Candas ◽

J. Lalonde ◽

M. Normand

Keyword(s):

Measurement Errors ◽

Compartment Model ◽

Corticotropin Releasing Factor ◽

Data Sets ◽

Data Set ◽

Rapid Injection ◽

Three Compartment Model ◽

Rival Models ◽

Sampling Schedule ◽

Selection Of

The aim of this study is the selection of the number of compartments required for a model to represent the distribution and metabolism of corticotropin-releasing factor (CRF) in rats. The dynamics of labeled rat CRF were measured in plasma for seven rats after a rapid injection. The sampling schedule resulted from the combination of the two D-optimal sampling sets of times corresponding to both rival models. This protocol improved the numerical identifiability of the parameters and consequently facilitated the selection of the relevant model. A three-compartment model fits adequately to the seven individual dynamics and better represents four of them compared with the lower-order model. It was demonstrated, using simulations in which the measurement errors and the interindividual variability of the parameters are included, that his four-to-seven ratio of data sets is consistent with the relevance of the three-compartment model for every individual kinetic data set. Kinetic and metabolic parameters were then derived for each individual rat, their values being consistent with the prolonged effects of CRF on pituitary-adrenocortical secretion.

Download Full-text

QUALITY ASSESSMENT OF A NATIONWIDE DATA SET CONTAINING AUTOMATICALLY RECONSTRUCTED 3D BUILDING MODELS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlvi-4-w4-2021-17-2021 ◽

2021 ◽

Vol XLVI-4/W4-2021 ◽

pp. 17-24

Author(s):

B. Dukai ◽

R. Peters ◽

S. Vitalis ◽

J. van Liempt ◽

J. Stoter

Keyword(s):

Quality Assessment ◽

Quality Metrics ◽

Data Sets ◽

Reconstruction Method ◽

Data Set ◽

Building Models ◽

Source Data ◽

Relative Quality ◽

Building Reconstruction

Abstract. Fully automated reconstruction of high-detail building models on a national scale is challenging. It raises a set of problems that are seldom found when processing smaller areas, single cities. Often there is no reference, ground truth available to evaluate the quality of the reconstructed models. Therefore, only relative quality metrics are computed, comparing the models to the source data sets. In the paper we present a set of relative quality metrics that we use for assessing the quality of 3D building models, that were reconstructed in a fully automated process, in Levels of Detail 1.2, 1.3, 2.2 for the whole of the Netherlands. The source data sets for the reconstruction are the Dutch Building and Address Register (BAG) and the National Height Model (AHN). The quality assessment is done by comparing the building models to these two data sources. The work presented in this paper lays the foundation for future research on the quality control and management of automated building reconstruction. Additionally, it serves as an important step in our ongoing effort for a fully automated building reconstruction method of high-detail, high-quality models.

Download Full-text

Use of artificial intelligence for public health surveillance: a case study to develop a machine Learning-algorithm to estimate the incidence of diabetes mellitus in France

Archives of Public Health ◽

10.1186/s13690-021-00687-0 ◽

2021 ◽

Vol 79 (1) ◽

Author(s):

Romana Haneef ◽

Sofiane Kab ◽

Rok Hrzic ◽

Sonsoles Fuentes ◽

Sandrine Fosse-Edorh ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Test Data ◽

Public Health Surveillance ◽

Health Surveillance ◽

Data Sets ◽

Data Set ◽

Linear Discriminant ◽

Final Data ◽

Selection Of

Abstract Background The use of machine learning techniques is increasing in healthcare which allows to estimate and predict health outcomes from large administrative data sets more efficiently. The main objective of this study was to develop a generic machine learning (ML) algorithm to estimate the incidence of diabetes based on the number of reimbursements over the last 2 years. Methods We selected a final data set from a population-based epidemiological cohort (i.e., CONSTANCES) linked with French National Health Database (i.e., SNDS). To develop this algorithm, we adopted a supervised ML approach. Following steps were performed: i. selection of final data set, ii. target definition, iii. Coding variables for a given window of time, iv. split final data into training and test data sets, v. variables selection, vi. training model, vii. Validation of model with test data set and viii. Selection of the model. We used the area under the receiver operating characteristic curve (AUC) to select the best algorithm. Results The final data set used to develop the algorithm included 44,659 participants from CONSTANCES. Out of 3468 variables from SNDS linked to CONSTANCES cohort were coded, 23 variables were selected to train different algorithms. The final algorithm to estimate the incidence of diabetes was a Linear Discriminant Analysis model based on number of reimbursements of selected variables related to biological tests, drugs, medical acts and hospitalization without a procedure over the last 2 years. This algorithm has a sensitivity of 62%, a specificity of 67% and an accuracy of 67% [95% CI: 0.66–0.68]. Conclusions Supervised ML is an innovative tool for the development of new methods to exploit large health administrative databases. In context of InfAct project, we have developed and applied the first time a generic ML-algorithm to estimate the incidence of diabetes for public health surveillance. The ML-algorithm we have developed, has a moderate performance. The next step is to apply this algorithm on SNDS to estimate the incidence of type 2 diabetes cases. More research is needed to apply various MLTs to estimate the incidence of various health conditions.

Download Full-text

A Comparison of Emotional Neural Network (ENN) and Artificial Neural Network (ANN) Approach for Rainfall-Runoff Modelling

Civil Engineering Journal ◽

10.28991/cej-2019-03091398 ◽

2019 ◽

Vol 5 (10) ◽

pp. 2120-2130 ◽

Cited By ~ 5

Author(s):

Suraj Kumar ◽

Thendiyath Roshni ◽

Dar Himayoun

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Data Sets ◽

Rainfall Runoff ◽

Data Set ◽

Discharge Data ◽

Artificial Neural ◽

Artificial Neural Network Ann ◽

Emotional Neural Network ◽

Selection Of

Reliable method of rainfall-runoff modeling is a prerequisite for proper management and mitigation of extreme events such as floods. The objective of this paper is to contrasts the hydrological execution of Emotional Neural Network (ENN) and Artificial Neural Network (ANN) for modelling rainfall-runoff in the Sone Command, Bihar as this area experiences flood due to heavy rainfall. ENN is a modified version of ANN as it includes neural parameters which enhance the network learning process. Selection of inputs is a crucial task for rainfall-runoff model. This paper utilizes cross correlation analysis for the selection of potential predictors. Three sets of input data: Set 1, Set 2 and Set 3 have been prepared using weather and discharge data of 2 raingauge stations and 1 discharge station located in the command for the period 1986-2014. Principal Component Analysis (PCA) has then been performed on the selected data sets for selection of data sets showing principal tendencies. The data sets obtained after PCA have then been used in the model development of ENN and ANN models. Performance indices were performed for the developed model for three data sets. The results obtained from Set 2 showed that ENN with R= 0.933, R2 = 0.870, Nash Sutcliffe = 0.8689, RMSE = 276.1359 and Relative Peak Error = 0.00879 outperforms ANN in simulating the discharge. Therefore, ENN model is suggested as a better model for rainfall-runoff discharge in the Sone command, Bihar.

Download Full-text

Comparison and selection of growth models using the Schnute model

Journal of Forest Science ◽

10.17221/4501-jfs ◽

2012 ◽

Vol 52 (No. 4) ◽

pp. 188-196 ◽

Cited By ~ 2

Author(s):

Y. Lei ◽

S. Y Zhang

Keyword(s):

Functional Form ◽

Growth Models ◽

A Priori ◽

Data Sets ◽

Forest Species ◽

Empirical Relationships ◽

Data Set ◽

Schnute Model ◽

Functional Forms ◽

Selection Of

Forestmodellers have long faced the problem of selecting an appropriate mathematical model to describe tree ontogenetic or size-shape empirical relationships for tree species. A common practice is to develop many models (or a model pool) that include different functional forms, and then to select the most appropriate one for a given data set. However, this process may impose subjective restrictions on the functional form. In this process, little attention is paid to the features (e.g. asymptote and inflection point rather than asymptote and nonasymptote) of different functional forms, and to the intrinsic curve of a given data set. In order to find a better way of comparing and selecting the growth models, this paper describes and analyses the characteristics of the Schnute model. This model has both flexibility and versatility that have not been used in forestry. In this study, the Schnute model was applied to different data sets of selected forest species to determine their functional forms. The results indicate that the model shows some desirable properties for the examined data sets, and allows for discerning the different intrinsic curve shapes such as sigmoid, concave and other curve shapes. Since no suitable functional form for a given data set is usually known prior to the comparison of candidate models, it is recommended that the Schnute model be used as the first step to determine an appropriate functional form of the data set under investigation in order to avoid using a functional form a priori.

Download Full-text

Structures of copper(II) and manganese(II) di(hydrogen malonate) dihydrate; effects of intensity profile truncation and background modelling on structure models

Acta Crystallographica Section B Structural Science ◽

10.1107/s0108768101004050 ◽

2001 ◽

Vol 57 (4) ◽

pp. 497-506 ◽

Cited By ~ 13

Author(s):

A. T. H. Lenstra ◽

O. N. Kataeva

Keyword(s):

Crystal Structures ◽

Truncation Error ◽

High Order ◽

Data Sets ◽

Data Set ◽

Background Modelling ◽

Local Background ◽

Model Robustness ◽

Selection Of ◽

Low Order

The crystal structures of the title compounds were determined with net intensities I derived via the background–peak–background procedure. Least-squares optimizations reveal differences between the low-order (0 < s < 0.7 Å−1) and high-order (0.7 < s < 1.0 Å−1) structure models. The scale factors indicate discrepancies of up to 10% between the low-order and high-order reflection intensities. This observation is compound independent. It reflects the scan-angle-induced truncation error, because the applied scan angle (0.8 + 2.0 tan θ)° underestimates the wavelength dispersion in the monochromated X-ray beam. The observed crystal structures show pseudo-I-centred sublattices for three of its non-H atoms in the asymmetric unit. Our selection of observed intensities (I > 3σ) stresses that pseudo-symmetry. Model refinements on individual data sets with (h + k + l) = 2n and (h + k + l) = 2n + 1 illustrate the lack of model robustness caused by that pseudo-symmetry. To obtain a better balanced data set and thus a more robust structure we decided to exploit background modelling. We described the background intensities B(\displaystyle\mathrel{\mathop H^{\rightharpoonup}}) with an 11th degree polynomial in θ. This function predicts the local background b at each position \displaystyle\mathrel{\mathop H^{\rightharpoonup}} and defines the counting statistical distribution P(B), in which b serves as average and variance. The observation R defines P(R). This leads to P(I) = P(R)/P(B) and thus I = R − b and σ2(I) = I so that the error σ(I) is background independent. Within this framework we reanalysed the structure of the copper(II) derivative. Background modelling resulted in a structure model with an improved internal consistency. At the same time the unweighted R value based on all observations decreased from 10.6 to 8.4%. A redetermination of the structure at 120 K concluded the analysis.

Download Full-text

The Impact of Normalization Methods on RNA-Seq Data Analysis

BioMed Research International ◽

10.1155/2015/621690 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 44

Author(s):

J. Zyprych-Walczak ◽

A. Szabelska ◽

L. Handschuh ◽

K. Górczak ◽

K. Klamecka ◽

...

Keyword(s):

High Throughput Sequencing ◽

Data Sets ◽

Complex Data ◽

Rna Seq ◽

Medical Problems ◽

Data Set ◽

Normalization Methods ◽

Wide Range ◽

The Impact ◽

Selection Of

High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably.

Download Full-text

Predicting and Analysis of Phishing Attacks and Breaches In E-Commerce Websites

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207443 ◽

2020 ◽

pp. 170-175

Author(s):

N. Ram Mohan ◽

N. Praveen Kumar

Keyword(s):

Cyber Security ◽

Research Topic ◽

Process Models ◽

Data Sets ◽

Data Set ◽

Arrival Times ◽

Qualitative And Quantitative ◽

Incident Data ◽

New Research ◽

Threat Situation

Analyzing cyber incident data sets is an important method for deepening our understanding of the evolution of the threat situation. This is a relatively new research topic, and many studies remain to be done. In this paper, I reported a statistical analysis of a breach incident data set corresponding to 12 years (2005–2017) of cyber hacking activities that include malware attacks. I shown that, in contrast to the findings reported in the literature, both hacking breach incident inter-arrival times and breach sizes should be modeled by stochastic processes, rather than by distributions because they exhibit autocorrelations. Then, I proposed a particular stochastic process models to, respectively, fit the inter-arrival times and the breach sizes. I also shown that these models can predict the inter-arrival times and the breach sizes. In order to get deeper insights into the evolution of hacking breach incidents, we conduct both qualitative and quantitative trend analyses on the data set. I drew a set of cyber security insights, including that the threat of cyber hacks is indeed getting worse in terms of their frequency, but not in terms of the magnitude of their damage.

Download Full-text