A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus

Mapping Intimacies ◽

10.1101/063537 ◽

2016 ◽

Cited By ~ 2

Author(s):

Justin B. Lack ◽

Jeremy D. Lange ◽

Alison D. Tang ◽

Russell B. Corbett-Detig ◽

John E. Pool

Keyword(s):

Current Data ◽

Published Data ◽

Data Sets ◽

Drosophila Genome ◽

Population Admixture ◽

Multiple Sources ◽

Common Reference ◽

Data Object ◽

Population Genomic ◽

Genomic Resource

ABSTRACTThe Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user’s needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,122 wild-derived genomes. New additions include 306 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations.

Download Full-text

The Drosophila Genome Nexus: a population genomic resource of 605 Drosophila melanogaster genomes, including 197 genomes from a single ancestral range population

10.1101/009886 ◽

2014 ◽

Cited By ~ 2

Author(s):

Justin Lack ◽

Charis Cardeno ◽

Marc Crepeau ◽

William Taylor ◽

Russ Corbett-Detig ◽

...

Keyword(s):

Population Genomics ◽

Demographic History ◽

Genetic Research ◽

Genomic Analysis ◽

Data Sets ◽

Large Sample Size ◽

Drosophila Genome ◽

High Genetic Diversity ◽

Population Genomic ◽

Variant Detection

Hundreds of wild-derived D. melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach, and settled on an assembly strategy that utilizes two alignment programs and incorporates both SNPs and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets (previous DPGP releases and the DGRP freeze 2.0), and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 605 consistently aligned genomes, and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.

Download Full-text

To Dereplicate or Not To Dereplicate?

mSphere ◽

10.1128/msphere.00971-19 ◽

2020 ◽

Vol 5 (3) ◽

Author(s):

Jacob T. Evans ◽

Vincent J. Denef

Keyword(s):

Quality Assessment ◽

Large Fraction ◽

Assessment Tools ◽

Published Data ◽

Data Sets ◽

Independent Data ◽

Average Nucleotide Identity ◽

Software Packages ◽

Population Genomic ◽

Genomic Analyses

ABSTRACT Metagenome-assembled genomes (MAGs) expand our understanding of microbial diversity, evolution, and ecology. Concerns have been raised on how sequencing, assembly, binning, and quality assessment tools may result in MAGs that do not reflect single populations in nature. Here, we reflect on another issue, i.e., how to handle highly similar MAGs assembled from independent data sets. Obtaining multiple genomic representatives for a species is highly valuable, as it allows for population genomic analyses; however, when retaining genomes of closely related populations, it complicates MAG quality assessment and abundance inferences. We show that (i) published data sets contain a large fraction of MAGs sharing >99% average nucleotide identity, (ii) different software packages and parameters used to resolve this redundancy remove very different numbers of MAGs, and (iii) the removal of closely related genomes leads to losses of population-specific auxiliary genes. Finally, we highlight some approaches that can infer strain-specific dynamics across a sample series without dereplication.

Download Full-text

Asian Americans and Pacific Islanders: Employment Issues in the United States

10.36650/nexus9.1-2_58-69_kim ◽

2011 ◽

Vol 9 (1-2) ◽

pp. 58-69

Author(s):

Marlene Kim

Keyword(s):

United States ◽

Asian Americans ◽

The United States ◽

Current Data ◽

Pacific Islanders ◽

Native Hawaiians ◽

Data Sets ◽

High Poverty ◽

Unemployment Rates

Asian Americans and Pacific Islanders (AAPIs) in the United States face problems of discrimination, the glass ceiling, and very high long-term unemployment rates. As a diverse population, although some Asian Americans are more successful than average, others, like those from Southeast Asia and Native Hawaiians and Pacific Islanders (NHPIs), work in low-paying jobs and suffer from high poverty rates, high unemployment rates, and low earnings. Collecting more detailed and additional data from employers, oversampling AAPIs in current data sets, making administrative data available to researchers, providing more resources for research on AAPIs, and enforcing nondiscrimination laws and affirmative action mandates would assist this population.

Download Full-text

DV-DVFS: merging data variety and DVFS technique to manage the energy consumption of big data processing

Journal Of Big Data ◽

10.1186/s40537-021-00437-7 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hossein Ahmadvand ◽

Fouzhan Foroutan ◽

Mahmood Fathy

Keyword(s):

Big Data ◽

Energy Consumption ◽

Processing Time ◽

Experimental Results ◽

The Other ◽

Data Sets ◽

Multiple Sources ◽

Evaluation Phase ◽

Dynamic Voltage ◽

Processing Resources

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.

Download Full-text

The Sampling Distribution of Disease-Associated Alleles

Genetics ◽

10.1093/genetics/147.4.1855 ◽

1997 ◽

Vol 147 (4) ◽

pp. 1855-1861 ◽

Cited By ~ 1

Author(s):

Montgomery Slatkin ◽

Bruce Rannala

Keyword(s):

Low Frequency ◽

Null Model ◽

Sampling Distribution ◽

Death Process ◽

Published Data ◽

Data Sets ◽

Likelihood Functions ◽

Alternative Hypotheses ◽

Size Standard ◽

Birth Death

Abstract A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.

Download Full-text

Protest Event Analysis: Developing a Semiautomated NLP Approach

American Behavioral Scientist ◽

10.1177/00027642211021650 ◽

2021 ◽

pp. 000276422110216

Author(s):

Jasmine Lorenzini ◽

Hanspeter Kriesi ◽

Peter Makarov ◽

Bruno Wüest

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Event Analysis ◽

Validity And Reliability ◽

Multiple Sources ◽

Social Scientists ◽

Extensive Discussion ◽

Computational Tools ◽

News Sources

Protest event analysis is a key method to study social movements, allowing to systematically analyze protest events over time and space. However, the manual coding of protest events is time-consuming and resource intensive. Recently, advances in automated approaches offer opportunities to code multiple sources and create large data sets that span many countries and years. However, too often the procedures used are not discussed in details and, therefore, researchers have a limited capacity to assess the validity and reliability of the data. In addition, many researchers highlighted biases associated with the study of protest events that are reported in the news. In this study, we ask how social scientists can build on electronic news databases and computational tools to create reliable PEA data that cover a large number of countries over a long period of time. We provide a detailed description our semiautomated approach and we offer an extensive discussion of potential biases associated with the study of protest events identified in international news sources.

Download Full-text

Relative testis size and mating systems in anurans: large testis in multiple-male mating in foam-nesting frogs

Animal Biology ◽

10.1163/157075511x570312 ◽

2011 ◽

Vol 61 (2) ◽

pp. 225-238 ◽

Cited By ~ 15

Author(s):

Wen Bo Liao ◽

Zhi Ping Mi ◽

Cai Quan Zhou ◽

Ling Jin ◽

Xian Han ◽

...

Keyword(s):

Sperm Competition ◽

Published Data ◽

Male Mating ◽

Data Sets ◽

Testis Size ◽

Data Set ◽

Monogamous Species ◽

Large Testis ◽

Testes Size ◽

Testis Mass

AbstractComparative studies of the relative testes size in animals show that promiscuous species have relatively larger testes than monogamous species. Sperm competition favours the evolution of larger ejaculates in many animals – they give bigger testes. In the view, we presented data on relative testis mass for 17 Chinese species including 3 polyandrous species. We analyzed relative testis mass within the Chinese data set and combining those data with published data sets on Japanese and African frogs. We found that polyandrous foam nesting species have relatively large testes, suggesting that sperm competition was an important factor affecting the evolution of relative testes size. For 4 polyandrous species testes mass is positively correlated with intensity (males/mating) but not with risk (frequency of polyandrous matings) of sperm competition.

Download Full-text

A Support Based Initialization Algorithm for Categorical Data Clustering

Journal of Information Technology Research ◽

10.4018/jitr.2018040104 ◽

2018 ◽

Vol 11 (2) ◽

pp. 53-67

Author(s):

Ajay Kumar ◽

Shishir Kumar

Keyword(s):

Categorical Data ◽

Selection Process ◽

Numerical Data ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Data Object ◽

Data Points ◽

Wu Method ◽

Selection Algorithms

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.

Download Full-text

Children with 5′-end NF1 gene mutations are more likely to have glioma

Neurology Genetics ◽

10.1212/nxg.0000000000000192 ◽

2017 ◽

Vol 3 (5) ◽

pp. e192 ◽

Cited By ~ 12

Author(s):

Corina Anastasaki ◽

Stephanie M. Morris ◽

Feng Gao ◽

David H. Gutmann

Keyword(s):

Gene Mutation ◽

Statistical Significance ◽

Gene Mutations ◽

Neurofibromatosis Type ◽

Published Data ◽

Data Sets ◽

Nonsense Mutations ◽

Data Set ◽

Nf1 Gene ◽

The Relationship

Objective:To ascertain the relationship between the germline NF1 gene mutation and glioma development in patients with neurofibromatosis type 1 (NF1).Methods:The relationship between the type and location of the germline NF1 mutation and the presence of a glioma was analyzed in 37 participants with NF1 from one institution (Washington University School of Medicine [WUSM]) with a clinical diagnosis of NF1. Odds ratios (ORs) were calculated using both unadjusted and weighted analyses of this data set in combination with 4 previously published data sets.Results:While no statistical significance was observed between the location and type of the NF1 mutation and glioma in the WUSM cohort, power calculations revealed that a sample size of 307 participants would be required to determine the predictive value of the position or type of the NF1 gene mutation. Combining our data set with 4 previously published data sets (n = 310), children with glioma were found to be more likely to harbor 5′-end gene mutations (OR = 2; p = 0.006). Moreover, while not clinically predictive due to insufficient sensitivity and specificity, this association with glioma was stronger for participants with 5′-end truncating (OR = 2.32; p = 0.005) or 5′-end nonsense (OR = 3.93; p = 0.005) mutations relative to those without glioma.Conclusions:Individuals with NF1 and glioma are more likely to harbor nonsense mutations in the 5′ end of the NF1 gene, suggesting that the NF1 mutation may be one predictive factor for glioma in this at-risk population.

Download Full-text

Structure in parasite component communities in wild rodents: predictability, stability, associations and interactions .... or pure randomness?

Parasitology ◽

10.1017/s0031182008000334 ◽

2008 ◽

Vol 135 (7) ◽

pp. 751-766 ◽

Cited By ~ 42

Author(s):

J. M. BEHNKE

Keyword(s):

Species Richness ◽

Published Data ◽

Data Sets ◽

Wild Rodents ◽

Heligmosomoides Polygyrus ◽

Wood Mice ◽

Core Species ◽

Helminth Communities ◽

Immune Mediated ◽

Concurrent Infections

SUMMARYExperimental data establish that interactions exist between species of intestinal helminths during concurrent infections in rodents, the strongest effects being mediated through the host's immune responses. Detecting immune-mediated relationships in wild rodent populations has been fraught with problems and published data do not support a major role for interactions in structuring helminth communities. Helminths in wild rodents show predictable patterns of seasonal, host age-dependent and spatial variation in species richness and in abundance of core species. When these are controlled for, patterns of co-infection compatible with synergistic interactions can be demonstrated. At least one of these, the positive relationship betweenHeligmosomoides polygyrusand species richness of other helminths has been demonstrated in three totally independent data-sets. Collectively, they explain only a small percentage of the variance/deviance in abundance data and at this level are unlikely to play a major role in structuring helminth communities, although they may be important in the more heavily infected wood mice. Current worm burdens underestimate the possibility that earlier interactions through the immune system have taken place, and therefore interactions may have a greater role to play than is immediately evident from current worm burdens. Longitudinal studies are proposed to resolve this issue.

Download Full-text