PEPCONF, a diverse data set of peptide conformational energies

Viki Kumar Prasad; Alberto Otero-de-la-Roza; Gino A. DiLabio

doi:10.1038/sdata.2018.310

Towards a framework for measuring creative economy: evidence from Balkan countries

Measuring Business Excellence ◽

10.1108/mbe-03-2018-0013 ◽

2019 ◽

Vol 23 (1) ◽

pp. 41-62 ◽

Cited By ~ 1

Author(s):

Valentina Ndou ◽

Giovanni Schiuma ◽

Giuseppina Passiante

Keyword(s):

Business Environment ◽

Creative Economy ◽

Data Set ◽

Content Type ◽

Balkan Countries ◽

World Economic ◽

And Performance ◽

Diverse Data ◽

The Relationship ◽

Performance Dimension

PurposeThe creative process through which the territorial resources, knowledge and culture are used, exploited and configured to match needs and to achieve congruence with the changing business environment has become a crucial process for competitiveness. This is even more relevant for economies of developing countries which are continuously struggling to reap the benefits of globalisation, as well as to grasp the new opportunities for competitiveness. As such, this paper aims to try to concentrate on the dynamic perspectives of the creative economy of countries by distinguishing between the potentialities and performance. The paper tackles the influence that creativity capacities might have on performance of countries.Design/methodology/approachThe methodology consists in identifying creative economy indicators from a diverse data set of the World Economic Forum and distinguish them between potential and performance indicators.FindingsData reveal as good progress and emphasis is being devoted to increasing the level of creativity; however, the Balkan countries still holdup in their capacity to boost innovation.Practical implicationsThe paper provide a new focus of research on creativity measurement that is significant for understanding what creative capacities territories possess and the ability to make proficient use for growth and innovation.Originality/valueThis paper proposes a new operational framework for measuring and interpreting the creative economy indicators by identifying not only indicators that gauge the potentialities of a country, but also indicators that are linked with the performance dimension, as well as the relationship amongst them.

Download Full-text

Estimating relatedness between malaria parasites

10.1101/575985 ◽

2019 ◽

Cited By ~ 5

Author(s):

Aimee R. Taylor ◽

Pierre E. Jacob ◽

Daniel E. Neafsey ◽

Caroline O. Buckee

Keyword(s):

Genetic Epidemiology ◽

Ad Hoc ◽

Pathogen Transmission ◽

Identity By Descent ◽

Malaria Parasites ◽

Data Set ◽

Epidemiology Studies ◽

Diverse Data ◽

Prospective Study Design ◽

Identity By State

1.AbstractUnderstanding the relatedness of individuals within or between populations is a common goal in biology. Increasingly, relatedness features in genetic epidemiology studies of pathogens. These studies are relatively new compared to those in humans and other organisms, but are important for designing interventions and understanding pathogen transmission. Only recently have researchers begun to routinely apply relatedness to apicomplexan eukaryotic malaria parasites, and to date have used a range of different approaches on an ad hoc basis. It remains unclear how to compare different studies, therefore, and which measures to use. Here, we systematically compare measures based on identity-by-state and identity-by-descent using a globally diverse data set of malaria parasites,Plasmodium falciparumandPlasmodium vivax, and provide marker requirements for estimates based on identity-by-descent. We formally show that the informativeness of polyallelic markers for relatedness inference is maximised when alleles are equifrequent. Estimates based on identity-by-state are sensitive to allele frequencies, which vary across populations and by experimental design. For portability across studies, we thus recommend estimates based on identity-by-descent. To generate reliable estimates, we recommend approximately 200 biallelic or 100 polyallelic markers. Confidence intervals illuminate inference across studies based on different sets of markers. These marker requirements, unlike many thus far reported, are immediately applicable to haploid malaria parasites and other haploid eukaryotes. This is the first attempt to provide rigorous analysis of the reliability of, and requirements for, relatedness inference in malaria genetic epidemiology, and will provide a basis for statistically informed prospective study design and surveillance strategies.

Download Full-text

The Relationship Between Gratitude and Religious Identification of NCAA Athletes: A Replication Study

Journal of Clinical Sport Psychology ◽

10.1123/jcsp.2020-0035 ◽

2021 ◽

pp. 1-17

Author(s):

Nicole T. Gabana ◽

Jeffrey B. Ruser ◽

Mariya A. Yukhymenko-Lescroart ◽

Jenelle N. Gilbert

Keyword(s):

National Collegiate Athletic Association ◽

Past Research ◽

Well Being ◽

Spiritual Identity ◽

General State ◽

Data Set ◽

Religious Identification ◽

Dispositional Trait ◽

And Performance ◽

Diverse Data

A holistic, multicultural approach to student-athlete mental health, well-being, and performance promotes the consideration of spiritual and religious identities in counseling and consultation. Preliminary research supports the interconnectedness of spirituality, religiosity, and gratitude in athletes; thus, this study sought to replicate Gabana, D’Addario, Luzzeri, and Soendergaard's study (2020) and extend the literature by examining a larger, independently sampled, more diverse data set and multiple types of gratitude. National Collegiate Athletic Association Division I–III student-athletes (N = 596) were surveyed to better understand how religious and spiritual identity related to trait, general-state, and sport-state gratitude. Results supported past research; athletes who self-identified as being both spiritual and religious reported greater dispositional (trait) gratitude than those who self-identified as spiritual/nonreligious or nonspiritual/nonreligious. Between group differences were not found when comparing general-state and sport-state gratitude. Findings strengthen and extend the understanding of spirituality, religion, and gratitude in sport. Limitations, practical implications, and future directions are discussed.

Download Full-text

Video Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch223 ◽

2011 ◽

pp. 1185-1189 ◽

Cited By ~ 2

Author(s):

Jung Hwan Oh ◽

Jeong Kyu Lee ◽

Sae Hwang

Keyword(s):

Data Mining ◽

Research Area ◽

Multimedia Databases ◽

Video Data ◽

Multimedia Data ◽

Data Sets ◽

Data Set ◽

Useful Knowledge ◽

Active Research ◽

Diverse Data

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.

Download Full-text

ACAT1 Benchmark of RANS-Informed Analytical Methods for Fan Broadband Noise Prediction—Part I—Influence of the RANS Simulation

Acoustics ◽

10.3390/acoustics2030029 ◽

2020 ◽

Vol 2 (3) ◽

pp. 539-578

Author(s):

Carolin Kissner ◽

Sébastien Guérin ◽

Pascal Seeler ◽

Mattias Billson ◽

Paruchuri Chaitanya ◽

...

Keyword(s):

Turbulence Model ◽

Analytical Methods ◽

Companion Paper ◽

Broadband Noise ◽

Operating Conditions ◽

Navier Stokes ◽

Data Set ◽

The Past ◽

Flow Turbulence ◽

Diverse Data

A benchmark of Reynolds-Averaged Navier-Stokes (RANS)-informed analytical methods, which are attractive for predicting fan broadband noise, was conducted within the framework of the European project TurboNoiseBB. This paper discusses the first part of the benchmark, which investigates the influence of the RANS inputs. Its companion paper focuses on the influence of the applied acoustic models on predicted fan broadband noise levels. While similar benchmarking activities were conducted in the past, this benchmark is unique due to its large and diverse data set involving members from more than ten institutions. In this work, the authors analyze RANS solutions performed at approach conditions for the ACAT1 fan. The RANS solutions were obtained using different CFD codes, mesh resolutions, and computational settings. The flow, turbulence, and resulting fan broadband noise predictions are analyzed to pinpoint critical influencing parameters related to the RANS inputs. Experimental data are used for comparison. It is shown that when turbomachinery experts perform RANS simulations using the same geometry and the same operating conditions, the most crucial choices in terms of predicted fan broadband noise are the type of turbulence model and applied turbulence model extensions. Chosen mesh resolutions, CFD solvers, and other computational settings are less critical.

Download Full-text

Estimating Intersection Control Delay Using Large Data Sets of Travel Time from a Global Positioning System

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198105191700103 ◽

2005 ◽

Vol 1917 (1) ◽

pp. 18-27

Author(s):

Brian Hoeschen ◽

Darcy Bullock ◽

Mark Schlappi

Keyword(s):

Travel Time ◽

Traffic Engineering ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Data Set ◽

Control Delay ◽

Diverse Data ◽

Intersection Control ◽

Better Than

Historically, stopped delay was used to characterize the operation of intersection movements because it was relatively easy to measure. During the past decade, the traffic engineering community has moved away from using stopped delay and now uses control delay. That measurement is more precise but quite difficult to extract from large data sets if strict definitions are used to derive the data. This paper evaluates two procedures for estimating control delay. The first is based on a historical approximation that control delay is 30% larger than stopped delay. The second is new and based on segment delay. The procedures are applied to a diverse data set collected in Phoenix, Arizona, and compared with control delay calculated by using the formal definition. The new approximation was observed to be better than the historical stopped delay procedure; it provided an accurate prediction of control delay. Because it is an approximation, this methodology would be most appropriately applied to large data sets collected from travel time studies for ranking and prioritizing intersections for further analysis.

Download Full-text

Predicting bacterial virulence factors – evaluation of machine learning and negative data strategies

Briefings in Bioinformatics ◽

10.1093/bib/bbz076 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1596-1608 ◽

Cited By ~ 3

Author(s):

Robert Rentzsch ◽

Carlus Deneke ◽

Andreas Nitsche ◽

Bernhard Y Renard

Keyword(s):

Machine Learning ◽

Virulence Factors ◽

Experimental Testing ◽

Sequence Similarity ◽

Support Vector ◽

Negative Data ◽

Data Set ◽

Bacterial Proteins ◽

Hybrid Classifiers ◽

Diverse Data

Abstract Bacterial proteins dubbed virulence factors (VFs) are a highly diverse group of sequences, whose only obvious commonality is the very property of being, more or less directly, involved in virulence. It is therefore tempting to speculate whether their prediction, based on direct sequence similarity (seqsim) to known VFs, could be enhanced or even replaced by using machine-learning methods. Specifically, when trained on a large and diverse set of VFs, such may be able to detect putative, non-trivial characteristics shared by otherwise unrelated VF families and therefore better predict novel VFs with insignificant similarity to each individual family. We therefore first reassess the performance of dimer-based Support Vector Machines, as used in the widely used MP3 method, in light of seqsim-only and seqsim/dimer-hybrid classifiers. We then repeat the analysis with a novel, considerably more diverse data set, also addressing the important problem of negative data selection. Finally, we move on to the real-world use case of proteome-wide VF prediction, outlining different approaches to estimating specificity in this scenario. We find that direct seqsim is of unparalleled importance and therefore should always be exploited. Further, we observe strikingly low correlations between different feature and classifier types when ranking proteins by VF likeness. We therefore propose a ‘best of each world’ approach to prioritize proteins for experimental testing, focussing on the top predictions of each classifier. Further, classifiers for individual VF families should be developed.

Download Full-text

A decade of detailed observations (2008–2018) in steep bedrock permafrost at Matterhorn Hörnligrat (Zermatt, CH)

10.5194/essd-2019-14 ◽

2019 ◽

Author(s):

Samuel Weber ◽

Jan Beutel ◽

Reto Da Forno ◽

Alain Geiger ◽

Stephan Gruber ◽

...

Keyword(s):

Past Research ◽

Process Models ◽

Sensor Technology ◽

Future Research ◽

Data Set ◽

Technological Advances ◽

Data Record ◽

History Of ◽

Diverse Data ◽

And Control

Abstract. The PermaSense project is an ongoing interdisciplinary effort between geo-science and engineering disciplines started in 2006 with the goals to make observations possible that previously have not been possible. Specifically the aims are to obtain measurements data in unprecedented quantity and quality based on technological advances. This paper describes a unique ten+ year data record obtained from in-situ measurements in steep bedrock permafrost in an Alpine environment on the Matterhorn Hörnligrat, Zermatt Switzerland at 3500 m a.s.l. Through the utilization of state-of-the-art wireless sensor technology it was possible to obtain more data of higher quality, make this data available in near real-time and tightly monitor and control the running experiments. This data set (DOI: https://doi.org/10.1594/PANGAEA.897640, Weber et al., 2019a) constitutes the longest, densest and most diverse data record in the history of mountain permafrost research worldwide with 17 different sensor types used at 29 distinct sensor locations consisting of over 114.5 million data points captured over a period of ten+ years. By documenting and sharing this data in this form we contribute to making our past research reproducible and facilitate future research based on this data e.g. in the area of analysis methodology, comparative studies, assessment of change in the environment, natural hazard warning and the development of process models.

Download Full-text

Scalable Micro-planned Generation of Discourse from Structured Data

Computational Linguistics ◽

10.1162/coli_a_00363 ◽

2020 ◽

Vol 45 (4) ◽

pp. 737-763 ◽

Cited By ~ 1

Author(s):

Anirban Laha ◽

Parag Jain ◽

Abhijit Mishra ◽

Karthik Sankaranarayanan

Keyword(s):

Natural Language ◽

Structured Data ◽

Data Sets ◽

Data Types ◽

Data Set ◽

Language Generation ◽

Parallel Data ◽

Simple Sentences ◽

Diverse Data ◽

Existing Data

We present a framework for generating natural language description from structured data such as tables; the problem comes under the category of data-to-text natural language generation (NLG). Modern data-to-text NLG systems typically use end-to-end statistical and neural architectures that learn from a limited amount of task-specific labeled data, and therefore exhibit limited scalability, domain-adaptability, and interpretability. Unlike these systems, ours is a modular, pipeline-based approach, and does not require task-specific parallel data. Rather, it relies on monolingual corpora and basic off-the-shelf NLP tools. This makes our system more scalable and easily adaptable to newer domains. Our system utilizes a three-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent, and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain data set curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other popular data sets covering diverse data types such as knowledge graphs and key-value maps.

Download Full-text

Various aspects of retention index usage for GC-MS library search: A statistical investigation using a diverse data set

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2020.104042 ◽

2020 ◽

Vol 202 ◽

pp. 104042 ◽

Cited By ~ 2

Author(s):

Dmitriy D. Matyushin ◽

Anastasia Yu. Sholokhova ◽

Anastasia E. Karnaeva ◽

Aleksey K. Buryak

Keyword(s):

Retention Index ◽

Statistical Investigation ◽

Data Set ◽

Library Search ◽

Diverse Data

Download Full-text