The use of MSR (Minimum Sample Richness) for sample assemblage comparisons

Paleobiology ◽  
2011 ◽  
Vol 37 (4) ◽  
pp. 696-709 ◽  
Author(s):  
Kenny J. Travouillon ◽  
Gilles Escarguel ◽  
Serge Legendre ◽  
Michael Archer ◽  
Suzanne J. Hand

Minimum Sample Richness (MSR) is defined as the smallest number of taxa that must be recorded in a sample to achieve a given level of inter-assemblage classification accuracy. MSR is calculated from known or estimated richness and taxonomic similarity. Here we test MSR for strengths and weaknesses by using 167 published mammalian local faunas from the Paleogene and early Neogene of the Quercy and Limagne area (Massif Central, southwestern France), and then apply MSR to 84 Oligo-Miocene faunas from Riversleigh, northwestern Queensland, Australia. In many cases, MSR is able to detect the assemblages in the data set that are potentially too incomplete to be used in a similarity-based comparative taxonomic analysis. The results show that the use of MSR significantly improves the quality of the clustering of fossil assemblages. We conclude that this method can screen sample assemblages that are not representative of their underlying original living communities. Ultimately, it can be used to identify which assemblages require further sampling before being included in a comparative analysis.

2021 ◽  
Vol 10 (7) ◽  
pp. 436
Author(s):  
Amerah Alghanim ◽  
Musfira Jilani ◽  
Michela Bertolotto ◽  
Gavin McArdle

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.


T-Comm ◽  
2020 ◽  
Vol 14 (10) ◽  
pp. 53-60
Author(s):  
Oleg I. Sheluhin ◽  
◽  
Valentina P. Ivannikova ◽  

A comparative analysis of statistical and model-based methods for selecting the quantity and the composition of informative features was performed using the UNSW-NB15 database for machine learning models training for attack detection. Feature selection is one of the most important steps in data preparation for machine learning tasks. It allows to increase a quality of machine learning models: it reduces sizes of the fitted models, training time and probability of overfitting. The research was conducted using Python programming language libraries: scikit-learn, which includes various machine learning models and functions for data preparation and models estimation, and FeatureSelector, which contains functions for statistical data analysis. Numerical results of experimental research of application of both statistical methods of features selection and machine learning models-based methods are provided. As the result, the reduced set of features is obtained, which allows improving the quality of classification by removing noise features that have little effect on the final result and reducing the quantity of informative features of the data set from 41 to 17. It is shown that the most effective among the analyzed methods for feature selection is the statistical method SelectKBest with the function chi2, which allows to obtain a reduced set of features providing an accuracy of classification as high as 90% in comparation with 74% provided with the full set.


2014 ◽  
Vol 64 (4) ◽  
pp. 367-392 ◽  
Author(s):  
Karolína Lajblová ◽  
Petr Kraft

Abstract The earliest ostracods from the Bohemian Massif (Central European Variscides) have been recorded from the Middle Ordovician of the Prague Basin (Barrandian area), in the upper Klabava Formation, and became an abundant component of fossil assemblages in the overlying Šarka Formation. Both early ostracod associations consist of eight species in total, representing mainly eridostracans, palaeocopids, and binodicopids. The revision, description, or redescription of all species and their distribution in the basin is provided. Their diversification patterns and palaeogeographical relationships to ostracod assemblages from other regions are discussed.


Author(s):  
Luigi Leonardo Palese

In 2019, an outbreak occurred which resulted in a global pandemic. The causative agent of this serious global health threat was a coronavirus similar to the agent of SARS, referred to as SARS-CoV-2. In this work an analysis of the available structures of the SARS-CoV-2 main protease has been performed. From a data set of crystallographic structures the dynamics of the protease has been obtained. Furthermore, a comparative analysis of the structures of SARS-CoV-2 with those of the main protease of the coronavirus responsible of SARS (SARS-CoV) was carried out. The results of these studies suggest that, although main proteases of SARS-CoV and SARS-CoV-2 are similar at the backbone level, some plasticity at the substrate binding site can be observed. The consequences of these structural aspects on the search for effective inhibitors of these enzymes are discussed, with a focus on already known compounds. The results obtained show that compounds containing an oxirane ring could be considered as inhibitors of the main protease of SARS-CoV-2.


Author(s):  
B. F. Tarasenko B. F. ◽  
◽  
S. Y. Orlenko S. Y. ◽  
V. V. Kuzmin V. V.

The article presents a comparative analysis, based on field tests, of the quality of loosening of soil structures of the upper horizon with technical means developed at KubSAU and an improved design of a universal tillage unit.


2012 ◽  
Vol 9 (2) ◽  
pp. 53-57 ◽  
Author(s):  
O.V. Darintsev ◽  
A.B. Migranov

The main stages of solving the problem of planning movements by mobile robots in a non-stationary working environment based on neural networks, genetic algorithms and fuzzy logic are considered. The features common to the considered intellectual algorithms are singled out and their comparative analysis is carried out. Recommendations are given on the use of this or that method depending on the type of problem being solved and the requirements for the speed of the algorithm, the quality of the trajectory, the availability (volume) of sensory information, etc.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


Trials ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Zhuoran Kuang ◽  
◽  
Xiaoyan Li ◽  
Jianxiong Cai ◽  
Yaolong Chen ◽  
...  

Abstract Objective To assess the registration quality of traditional Chinese medicine (TCM) clinical trials for COVID-19, H1N1, and SARS. Method We searched for clinical trial registrations of TCM in the WHO International Clinical Trials Registry Platform (ICTRP) and Chinese Clinical Trial Registry (ChiCTR) on April 30, 2020. The registration quality assessment is based on the WHO Trial Registration Data Set (Version 1.3.1) and extra items for TCM information, including TCM background, theoretical origin, specific diagnosis criteria, description of intervention, and outcomes. Results A total of 136 records were examined, including 129 severe acute respiratory syndrome coronavirus 2 (COVID-19) and 7 H1N1 influenza (H1N1) patients. The deficiencies in the registration of TCM clinical trials (CTs) mainly focus on a low percentage reporting detailed information about interventions (46.6%), primary outcome(s) (37.7%), and key secondary outcome(s) (18.4%) and a lack of summary result (0%). For the TCM items, none of the clinical trial registrations reported the TCM background and rationale; only 6.6% provided the TCM diagnosis criteria or a description of the TCM intervention; and 27.9% provided TCM outcome(s). Conclusion Overall, although the number of registrations of TCM CTs increased, the registration quality was low. The registration quality of TCM CTs should be improved by more detailed reporting of interventions and outcomes, TCM-specific information, and sharing of the result data.


Sign in / Sign up

Export Citation Format

Share Document