Statistical Analysis of Maximally Similar Sets in Ecological Research

David Roberts

doi:10.3390/math6120317

Statistical Analysis of Maximally Similar Sets in Ecological Research

Mathematics ◽

10.3390/math6120317 ◽

2018 ◽

Vol 6 (12) ◽

pp. 317

Author(s):

David Roberts

Keyword(s):

Statistical Power ◽

Rank Order ◽

Dimensional Space ◽

Random Sets ◽

Forest Composition ◽

Triangular Distribution ◽

Large Set ◽

Spring Temperature ◽

Explanatory Variables ◽

Size Number

Maximally similar sets (MSSs) are sets of elements that share a neighborhood in a high-dimensional space defined by a symmetric, reflexive similarity relation. Each element of the universe is employed as the kernel of a neighborhood of a given size (number of members), and elements are added to the neighborhood in order of similarity to the current members of the set until the desired neighborhood size is achieved. The set of neighborhoods is then reduced to the set of unique, maximally similar sets by eliminating all sets that are permutations of an existing set. Subsequently, the within-MSS variability of candidate explanatory variables associated with the elements is compared to random sets of the same size to estimate the probability of obtaining variability as low as was observed. Explanatory variables can be compared for effect size by the rank order of within-MSS variability and random set variability, correcting for statistical power as necessary. The analyses performed identify constraints, as opposed to determinants, in the triangular distribution of pair-wise element similarity. In the example given here, the variability in spring temperature, summer temperature, and the growing degree days of forest vegetation sample units shows the greatest constraint on forest composition of a large set of candidate environmental variables.

Download Full-text

Statistical Analysis of Maximally Similar Sets in Ecological Research

10.20944/preprints201810.0543.v1 ◽

2018 ◽

Author(s):

David W. Roberts

Keyword(s):

Statistical Power ◽

Dimensional Space ◽

Random Sets ◽

Forest Composition ◽

Triangular Distribution ◽

Large Set ◽

Spring Temperature ◽

Size Number ◽

The Universe ◽

Vegetation Samples

Maximally similar sets (MSS) are sets of elements that share a neighborhood in a high-dimensional space defined by symmetric, reflexive similarity relation. Each element of the universe is employed as the kernel of a neighborhood of a given size (number of members), and elements are added to the neighborhood in order of similarity to the current members of the set until the desired neighborhood size is achieved. The set of neighborhoods is then reduced to the set of unique maximally similar sets by eliminating all sets that are permutations of an existing set. Subsequently, the within-MSS variability of attributes associated with the elements is compared to random sets of the same size to estimate the probability of obtaining variability as low as observed. Individual attributes can be compared for effect size by the ratio of within-MSS variability to random set variability, correcting for statistical power as necessary. The analyses performed identify constraints, as opposed to determinants, in the triangular distribution of pair-wise element similarity. In the example given here, the variability in spring temperature, summer temperature, and growing degree days of forest vegetation samples shows the greatest constraint on forest composition of a large set of candidate environmental variables

Download Full-text

The drivers of corporate governance disclosure: the case of Nifty 500 Index

International Journal of Law and Management ◽

10.1108/ijlma-02-2017-0020 ◽

2018 ◽

Vol 60 (2) ◽

pp. 681-700 ◽

Cited By ~ 4

Author(s):

Androniki Katarachia ◽

Electra Pitoska ◽

Grigoris Giannarakis ◽

Elpida Poutoglidou

Keyword(s):

Corporate Governance ◽

Financial Leverage ◽

Policy Makers ◽

Content Type ◽

Explanatory Variables ◽

Size Number ◽

Governance Practices ◽

Board Director ◽

First Time ◽

Governance Disclosure

Purpose Based on agency theory, the purpose of this paper is to investigate the determinants on the dissemination level of corporate governance disclosure (CGD). Design/methodology/approach The sample of the study incorporates listed companies in Nifty 500 Index for the period 2009-2014. The Governance Disclosure Score calculated by Bloomberg is used as a proxy for the dissemination level of corporate governance information. In total, eight explanatory variables are uses, namely, board’s size, number of board meetings, CEO duality, presence of women on the board, company’s size, financial performance, Tobin’s Q ratio and financial leverage. Findings The results of study suggest a need for improvement in CGDs by Indian companies, as they fail to comply the majority of the proposed disclosure items. Furthermore, it is revealed that the number of board director, the value of company, the financial leverage and the presence of women affect negatively the dissemination level of corporate governance information. While, the size of company is the only determinant that positively affects the extent of CGD. Practical implications The results are valuable because they reveal the attributes that determines which companies needs less or extra monitoring by shareholders and investors regarding the applied corporate governance practices. In addition, the study can be valuable to policy makers responsible for the regulation of company’s accountability in relation to corporate governance practices. Originality/value The study extents previous studies by incorporating for the first time Bloomberg’s rating approach regarding the dissemination level of CGD in Indian context.

Download Full-text

Power-Cost Efficiency of Eight Macrobenthic Sampling Schemes in Puget Sound, Washington, USA

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f89-267 ◽

1989 ◽

Vol 46 (12) ◽

pp. 2157-2165 ◽

Cited By ~ 32

Author(s):

Steven P. Ferraro ◽

Faith A. Cole ◽

Waldemar A. DeBen ◽

Richard C. Swartz

Keyword(s):

Cost Efficiency ◽

Statistical Power ◽

Rank Order ◽

Puget Sound ◽

Sampling Scheme ◽

Total Biomass ◽

Type I ◽

Power Cost ◽

Sampling Schemes ◽

Sample Unit

Power-cost efficiency (PCEi = (n × c)min/(ni × ci), where i = sampling scheme, n = minimum number of replicate samples needed to detect a difference between locations with an acceptable probability of Type I (α) and Type II (β) error (e.g. α = β = 0.05), c = mean "cost," in time or money, per replicate sample, and (n × c)min = minimum value of (n × c) among the i sampling schemes) is the appropriate expression for comparing the cost efficiency of alternative sampling schemes having equivalent statistical rigor when the statistical model is a redistribution for comparisons of two means. PCEs were determined for eight macrobenthic sampling schemes (four sample unit sizes and two sieve mesh sizes) in a comparison of a reference site versus a putative polluted site in Puget Sound, Washington. Laboratory processing times were, on average, about 2.5 times greater for the [Formula: see text]- than the [Formula: see text] samples. The 0.06-m2, 0- to 8-cm-deep sample unit size and 1.0-mm sieve mesh size was the overall optimum sampling scheme in this study; it ranked first in PCE on 8 and second on 3 of 11 measures of community structure. Rank order by statistical power of the 11 measures for this scheme was Infaunal Index > log10 (mollusc biomass + 1) > number of species > log10 (numerical abundance) > log10 (polychaete biomass + 1) > log10 (total biomass + 1) > log10 (crustacean biomass + 1) > McIntosh's index > 1 – Simpson's Index > Shannon's Index > Dominance Index.

Download Full-text

Neuroimaging: into the Multiverse

10.1101/2020.10.29.359778 ◽

2020 ◽

Author(s):

Jessica Dafflon ◽

Pedro F. Da Costa ◽

František Váša ◽

Ricardo Pio Monti ◽

Danilo Bzdok ◽

...

Keyword(s):

Active Learning ◽

Statistical Power ◽

Predictive Power ◽

Dimensional Space ◽

Graph Theoretic ◽

Processing Pipeline ◽

Regression Approach ◽

Low Dimensional ◽

Sequential Analyses ◽

Processing Steps

AbstractFor most neuroimaging questions the huge range of possible analytic choices leads to the possibility that conclusions from any single analytic approach may be misleading. Examples of possible choices include the motion regression approach used and smoothing and threshold factors applied during the processing pipeline. Although it is possible to perform a multiverse analysis that evaluates all possible analytic choices, this can be computationally challenging and repeated sequential analyses on the same data can compromise inferential and predictive power. Here, we establish how active learning on a low-dimensional space that captures the inter-relationships between analysis approaches can be used to efficiently approximate the whole multiverse of analyses. This approach balances the benefits of a multiverse analysis without the accompanying cost to statistical power, computational power and the integrity of inferences. We illustrate this approach with a functional MRI dataset of functional connectivity across adolescence, demonstrating how a multiverse of graph theoretic and simple pre-processing steps can be efficiently navigated using active learning. Our study shows how this approach can identify the subset of analysis techniques (i.e., pipelines) which are best able to predict participants’ ages, as well as allowing the performance of different approaches to be quantified.

Download Full-text

A General Methodology for the Forward Kinematic Problem of Symmetrical Parallel Mechanisms and Application to 5-PRUR Parallel Mechanisms (3T2R)

Volume 2: 34th Annual Mechanisms and Robotics Conference, Parts A and B ◽

10.1115/detc2010-28222 ◽

2010 ◽

Cited By ~ 3

Author(s):

Mehdi Tale Masouleh ◽

Manfred Husty ◽

Cle´ment Gosselin

Keyword(s):

Dimensional Space ◽

Large Set ◽

Parallel Mechanisms ◽

General Methodology ◽

Higher Dimensional ◽

Kinematic Mapping ◽

Rigid Body Displacement ◽

Forward Kinematic Problem ◽

Dimensional Projective Space ◽

Forward Kinematic

In this paper, a general methodology is introduced in order to formulate the FKP of symmetrical parallel mechanisms in a 7-dimensional projective space by the means of the so-called Study’s parameters. The main objective is to consider rigid-body displacement, and consequently the FKP, based on algebraic geometry, rather than rely on classical recipes, such as Euler angles, to assist in problem-solving. The state of the art presented in this paper is general and can be extended to other types of symmetrical mechanisms. In this paper, we limit the concept of kinematic mapping to topologically symmetrical mechanisms, i.e., mechanisms with limbs having identical kinematic arrangement. Exploring the FKP in a higher dimensional space is more challenging since it requires the use of a larger number of coordinates. There are, however, advantages in adopting a large set of coordinates, since this approach leads to expressions with lower degree that do not involve trigonometric functions.

Download Full-text

Multidimensional Solution Clustering and Its Application to the Coolant Passage Optimization of a Turbine Blade

Volume 2: 29th Design Automation Conference, Parts A and B ◽

10.1115/detc2003/dac-48764 ◽

2003 ◽

Cited By ~ 4

Author(s):

Min Joong Jeong ◽

Brian H. Dennis ◽

Shinobu Yoshimura

Keyword(s):

Turbine Blade ◽

Data Clustering ◽

Dimensional Space ◽

A Priori ◽

Large Set ◽

Clustering Methods ◽

Engineering Information ◽

Feasible Solutions ◽

Lower Dimensional Space ◽

Single Objective

Data clustering methods can be a useful tool for engineering design that is based on numerical optimization. The clustering method is an effective way of producing representative designs, or clusters, from a large set of potential designs. These methods have recently been applied to the clustering of Pareto-optimal solutions from multi-objective optimization. The results presented here focus on the application of clustering to single objective optimization results. In the case of single objective optimization, the method is used to determine the clusters in a set of quasi-optimal feasible solutions generated by an optimizer. A data clustering procedure based on an evolutionary method is briefly described. The number of clusters is determined automatically and need not be known a priori. The method is demonstrated by application to the results of a turbine blade coolant passage shape optimization problem. The solutions are transformed to a lower-dimensional space for better understanding of their variance and character. Engineering information, such as the shapes and locations of the internal passages, is supported by the visualization of clustered solutions. The clustering, transformation, and visualization methods presented in this study might be applicable to the increasing interpretation demands of design optimization.

Download Full-text

Characterization of two in vivo challenge models to measure functional activity of monoclonal antibodies to Plasmodium falciparum circumsporozoite protein

10.21203/rs.2.19888/v2 ◽

2020 ◽

Author(s):

Rama Raghunandan ◽

Bryan T Mayer ◽

Yevel Flores-Garcia ◽

Monica W Gerber ◽

Raphael Gottardo ◽

...

Keyword(s):

Plasmodium Falciparum ◽

Monoclonal Antibodies ◽

Functional Activity ◽

Statistical Power ◽

Rank Order ◽

Circumsporozoite Protein ◽

Dose Selection ◽

Selection Of

Abstract Background New strategies are needed to reduce the incidence of malaria, and promising approaches include the development of vaccines and monoclonal antibodies (mAbs) that target the circumsporozoite protein (CSP). To select the best candidates and speed development, it is essential to standardize preclinical assays to measure the potency of such interventions in animal models. Methods Two assay configurations were studied using transgenic Plasmodium berghei expressing Plasmodium falciparum full-length circumsporozoite protein. The assays measured 1) reduction in parasite infection of the liver (liver burden) following an intravenous (i.v) administration of sporozoites and 2) protection from parasitaemia following mosquito bite challenge. Two human CSP mAbs, AB311 and AB317, were compared for their ability to inhibit infection. Multiple independent experiments were conducted to define assay variability and resultant impact on the ability to discriminate differences in mAb functional activity. Results Overall, the assays produced highly consistent results in that all individual experiments showed greater functional activity for AB317 compared to AB311 as calculated by the dose required for 50% inhibition (ID50) as well as the serum concentration required for 50% inhibition (IC50). The data were then used to model experimental designs with adequate statistical power to rigorously screen, compare, and rank order novel anti-CSP mAbs. Conclusion The results indicate that in vivo assays described here can provide reliable information for comparing the functional activity of mAbs. The results also provide guidance regarding selection of the appropriate experimental design, dose selection, and group sizes.

Download Full-text

Efficient coding of natural scene statistics predicts discrimination thresholds for grayscale textures

eLife ◽

10.7554/elife.54347 ◽

2020 ◽

Vol 9 ◽

Author(s):

Tiberiu Tesileanu ◽

Mary M Conte ◽

John J Briguglio ◽

Ann M Hermundstad ◽

Jonathan D Victor ◽

...

Keyword(s):

Dimensional Space ◽

Second Order ◽

Visual Sensitivity ◽

Natural Scenes ◽

Large Set ◽

Natural Scene Statistics ◽

Image Statistics ◽

Efficient Coding ◽

Fractional Error ◽

Relative Salience

Previously, in Hermundstad et al., 2014, we showed that when sampling is limiting, the efficient coding principle leads to a ‘variance is salience’ hypothesis, and that this hypothesis accounts for visual sensitivity to binary image statistics. Here, using extensive new psychophysical data and image analysis, we show that this hypothesis accounts for visual sensitivity to a large set of grayscale image statistics at a striking level of detail, and also identify the limits of the prediction. We define a 66-dimensional space of local grayscale light-intensity correlations, and measure the relevance of each direction to natural scenes. The ‘variance is salience’ hypothesis predicts that two-point correlations are most salient, and predicts their relative salience. We tested these predictions in a texture-segregation task using un-natural, synthetic textures. As predicted, correlations beyond second order are not salient, and predicted thresholds for over 300 second-order correlations match psychophysical thresholds closely (median fractional error <0.13).

Download Full-text

Compressive Sensing Approach to Harmonics Detection in the Ship Electrical Network

Sensors ◽

10.3390/s20092744 ◽

2020 ◽

Vol 20 (9) ◽

pp. 2744

Author(s):

Beata Palczynska ◽

Romuald Masnicki ◽

Janusz Mindykowski

Keyword(s):

Fourier Transform ◽

Data Processing ◽

Compressive Sensing ◽

Rank Order ◽

Reconstruction Algorithm ◽

Electrical Network ◽

Large Set ◽

Processing Load ◽

Linear Measurements ◽

Programming Environments

The contribution of this paper is to show the opportunities for using the compressive sensing (CS) technique for detecting harmonics in a frequency sparse signal. The signal in a ship’s electrical network, polluted by harmonic distortions, can be modeled as a superposition of a small number of sinusoids and the discrete Fourier transform (DFT) basis forms its sparse domain. According to the theory of CS, a signal may be reconstructed from under-sampled incoherent linear measurements. This paper highlights the use of the discrete Radon transform (DRT) techniques in the CS scheme. In the reconstruction algorithm section, a fast algorithm based on the inverse DRT is presented, in which a few randomly sampled projections of the input signal are used to correctly reconstruct the original signal. However, DRT requires a very large set of measurements that can defeat the purpose of compressive data acquisition. To acquire the wideband data below the Nyquist frequency, the K-rank-order filter is applied in the sparse transform domain to extract the most significant components and accelerate the convergence of the solution. While most CS research efforts focus on random Gaussian measurements, the Bernoulli matrix with different values of the probability of ones is applied in the presented algorithm. Preliminary results of numerical simulation confirm the effectiveness of the algorithm used, but also indicate its limitations. A significant advantage of the proposed approach is the speed of analysis, which uses fast Fourier transform (FFT) and inverse FFT (IFFT) algorithms widely available in programming environments. Moreover, the data processing algorithm is quite simple, and therefore memory usage and burden of the data processing load are relatively low.

Download Full-text

Kinematics approach with neural networks for early detection of sepsis (KANNEDS)

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01529-3 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Márcio Freire Cruz ◽

Naoaki Ono ◽

Ming Huang ◽

Md. Altaf-Ul-Amin ◽

Shigehiko Kanaya ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Early Detection ◽

Short Term Memory ◽

Dimensional Space ◽

Vital Signs ◽

Accurate Method ◽

Severe Illness ◽

Explanatory Variables ◽

Novel Approach

Abstract Background Sepsis is a severe illness that affects millions of people worldwide, and its early detection is critical for effective treatment outcomes. In recent years, researchers have used models to classify positive patients or identify the probability for sepsis using vital signs and other time-series variables as input. Methods In our study, we analyzed patients’ conditions by their kinematics position, velocity, and acceleration, in a six-dimensional space defined by six vital signs. The patient is affected by the disease after a period if the position gets “near” to a calculated sepsis position in space. We imputed these kinematics features as explanatory variables of long short-term memory (LSTM), convolutional neural network (CNN) and linear neural network (LNN) and compared the prediction accuracies with only the vital signs as input. The dataset used contained information of approximately 4800 patients, each with 48 hourly registers. Results We demonstrated that the kinematics features models had an improved performance compared with vital signs models. The kinematics features model of LSTM achieved the best accuracy, 0.803, which was nine points higher than the vital signs model. Although with lesser accuracies, the kinematics features models of the CNN and LNN showed better performances than vital signs models. Conclusion Applying our novel approach for early detection of sepsis using neural networks will prove to be an invaluable, more accurate method than considering only simple vital signs as input variables. We expect that other researchers with similar objectives can use the model presented in this innovative approach to improve their results.

Download Full-text