scholarly journals A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus

2016 ◽  
Author(s):  
Justin B. Lack ◽  
Jeremy D. Lange ◽  
Alison D. Tang ◽  
Russell B. Corbett-Detig ◽  
John E. Pool

ABSTRACTThe Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user’s needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,122 wild-derived genomes. New additions include 306 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations.

2014 ◽  
Author(s):  
Justin Lack ◽  
Charis Cardeno ◽  
Marc Crepeau ◽  
William Taylor ◽  
Russ Corbett-Detig ◽  
...  

Hundreds of wild-derived D. melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach, and settled on an assembly strategy that utilizes two alignment programs and incorporates both SNPs and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets (previous DPGP releases and the DGRP freeze 2.0), and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 605 consistently aligned genomes, and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets.


mSphere ◽  
2020 ◽  
Vol 5 (3) ◽  
Author(s):  
Jacob T. Evans ◽  
Vincent J. Denef

ABSTRACT Metagenome-assembled genomes (MAGs) expand our understanding of microbial diversity, evolution, and ecology. Concerns have been raised on how sequencing, assembly, binning, and quality assessment tools may result in MAGs that do not reflect single populations in nature. Here, we reflect on another issue, i.e., how to handle highly similar MAGs assembled from independent data sets. Obtaining multiple genomic representatives for a species is highly valuable, as it allows for population genomic analyses; however, when retaining genomes of closely related populations, it complicates MAG quality assessment and abundance inferences. We show that (i) published data sets contain a large fraction of MAGs sharing >99% average nucleotide identity, (ii) different software packages and parameters used to resolve this redundancy remove very different numbers of MAGs, and (iii) the removal of closely related genomes leads to losses of population-specific auxiliary genes. Finally, we highlight some approaches that can infer strain-specific dynamics across a sample series without dereplication.


2011 ◽  
Vol 9 (1-2) ◽  
pp. 58-69
Author(s):  
Marlene Kim

Asian Americans and Pacific Islanders (AAPIs) in the United States face problems of discrimination, the glass ceiling, and very high long-term unemployment rates. As a diverse population, although some Asian Americans are more successful than average, others, like those from Southeast Asia and Native Hawaiians and Pacific Islanders (NHPIs), work in low-paying jobs and suffer from high poverty rates, high unemployment rates, and low earnings. Collecting more detailed and additional data from employers, oversampling AAPIs in current data sets, making administrative data available to researchers, providing more resources for research on AAPIs, and enforcing nondiscrimination laws and affirmative action mandates would assist this population.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Hossein Ahmadvand ◽  
Fouzhan Foroutan ◽  
Mahmood Fathy

AbstractData variety is one of the most important features of Big Data. Data variety is the result of aggregating data from multiple sources and uneven distribution of data. This feature of Big Data causes high variation in the consumption of processing resources such as CPU consumption. This issue has been overlooked in previous works. To overcome the mentioned problem, in the present work, we used Dynamic Voltage and Frequency Scaling (DVFS) to reduce the energy consumption of computation. To this goal, we consider two types of deadlines as our constraint. Before applying the DVFS technique to computer nodes, we estimate the processing time and the frequency needed to meet the deadline. In the evaluation phase, we have used a set of data sets and applications. The experimental results show that our proposed approach surpasses the other scenarios in processing real datasets. Based on the experimental results in this paper, DV-DVFS can achieve up to 15% improvement in energy consumption.


Genetics ◽  
1997 ◽  
Vol 147 (4) ◽  
pp. 1855-1861 ◽  
Author(s):  
Montgomery Slatkin ◽  
Bruce Rannala

Abstract A theory is developed that provides the sampling distribution of low frequency alleles at a single locus under the assumption that each allele is the result of a unique mutation. The numbers of copies of each allele is assumed to follow a linear birth-death process with sampling. If the population is of constant size, standard results from theory of birth-death processes show that the distribution of numbers of copies of each allele is logarithmic and that the joint distribution of numbers of copies of k alleles found in a sample of size n follows the Ewens sampling distribution. If the population from which the sample was obtained was increasing in size, if there are different selective classes of alleles, or if there are differences in penetrance among alleles, the Ewens distribution no longer applies. Likelihood functions for a given set of observations are obtained under different alternative hypotheses. These results are applied to published data from the BRCA1 locus (associated with early onset breast cancer) and the factor VIII locus (associated with hemophilia A) in humans. In both cases, the sampling distribution of alleles allows rejection of the null hypothesis, but relatively small deviations from the null model can account for the data. In particular, roughly the same population growth rate appears consistent with both data sets.


2021 ◽  
pp. 000276422110216
Author(s):  
Jasmine Lorenzini ◽  
Hanspeter Kriesi ◽  
Peter Makarov ◽  
Bruno Wüest

Protest event analysis is a key method to study social movements, allowing to systematically analyze protest events over time and space. However, the manual coding of protest events is time-consuming and resource intensive. Recently, advances in automated approaches offer opportunities to code multiple sources and create large data sets that span many countries and years. However, too often the procedures used are not discussed in details and, therefore, researchers have a limited capacity to assess the validity and reliability of the data. In addition, many researchers highlighted biases associated with the study of protest events that are reported in the news. In this study, we ask how social scientists can build on electronic news databases and computational tools to create reliable PEA data that cover a large number of countries over a long period of time. We provide a detailed description our semiautomated approach and we offer an extensive discussion of potential biases associated with the study of protest events identified in international news sources.


2011 ◽  
Vol 61 (2) ◽  
pp. 225-238 ◽  
Author(s):  
Wen Bo Liao ◽  
Zhi Ping Mi ◽  
Cai Quan Zhou ◽  
Ling Jin ◽  
Xian Han ◽  
...  

AbstractComparative studies of the relative testes size in animals show that promiscuous species have relatively larger testes than monogamous species. Sperm competition favours the evolution of larger ejaculates in many animals – they give bigger testes. In the view, we presented data on relative testis mass for 17 Chinese species including 3 polyandrous species. We analyzed relative testis mass within the Chinese data set and combining those data with published data sets on Japanese and African frogs. We found that polyandrous foam nesting species have relatively large testes, suggesting that sperm competition was an important factor affecting the evolution of relative testes size. For 4 polyandrous species testes mass is positively correlated with intensity (males/mating) but not with risk (frequency of polyandrous matings) of sperm competition.


2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.


2017 ◽  
Vol 3 (5) ◽  
pp. e192 ◽  
Author(s):  
Corina Anastasaki ◽  
Stephanie M. Morris ◽  
Feng Gao ◽  
David H. Gutmann

Objective:To ascertain the relationship between the germline NF1 gene mutation and glioma development in patients with neurofibromatosis type 1 (NF1).Methods:The relationship between the type and location of the germline NF1 mutation and the presence of a glioma was analyzed in 37 participants with NF1 from one institution (Washington University School of Medicine [WUSM]) with a clinical diagnosis of NF1. Odds ratios (ORs) were calculated using both unadjusted and weighted analyses of this data set in combination with 4 previously published data sets.Results:While no statistical significance was observed between the location and type of the NF1 mutation and glioma in the WUSM cohort, power calculations revealed that a sample size of 307 participants would be required to determine the predictive value of the position or type of the NF1 gene mutation. Combining our data set with 4 previously published data sets (n = 310), children with glioma were found to be more likely to harbor 5′-end gene mutations (OR = 2; p = 0.006). Moreover, while not clinically predictive due to insufficient sensitivity and specificity, this association with glioma was stronger for participants with 5′-end truncating (OR = 2.32; p = 0.005) or 5′-end nonsense (OR = 3.93; p = 0.005) mutations relative to those without glioma.Conclusions:Individuals with NF1 and glioma are more likely to harbor nonsense mutations in the 5′ end of the NF1 gene, suggesting that the NF1 mutation may be one predictive factor for glioma in this at-risk population.


Parasitology ◽  
2008 ◽  
Vol 135 (7) ◽  
pp. 751-766 ◽  
Author(s):  
J. M. BEHNKE

SUMMARYExperimental data establish that interactions exist between species of intestinal helminths during concurrent infections in rodents, the strongest effects being mediated through the host's immune responses. Detecting immune-mediated relationships in wild rodent populations has been fraught with problems and published data do not support a major role for interactions in structuring helminth communities. Helminths in wild rodents show predictable patterns of seasonal, host age-dependent and spatial variation in species richness and in abundance of core species. When these are controlled for, patterns of co-infection compatible with synergistic interactions can be demonstrated. At least one of these, the positive relationship betweenHeligmosomoides polygyrusand species richness of other helminths has been demonstrated in three totally independent data-sets. Collectively, they explain only a small percentage of the variance/deviance in abundance data and at this level are unlikely to play a major role in structuring helminth communities, although they may be important in the more heavily infected wood mice. Current worm burdens underestimate the possibility that earlier interactions through the immune system have taken place, and therefore interactions may have a greater role to play than is immediately evident from current worm burdens. Longitudinal studies are proposed to resolve this issue.


Sign in / Sign up

Export Citation Format

Share Document