scholarly journals Benchmarking software tools for detecting and quantifying selection in Evolve and Resequencing studies

2019 ◽  
Author(s):  
Christos Vlachos ◽  
Claire Burny ◽  
Marta Pelizzola ◽  
Rui Borges ◽  
Andreas Futschik ◽  
...  

AbstractThe combination of experimental evolution with whole genome re-sequencing of pooled individuals, also called Evolve and Resequence (E&R) is a powerful approach to study selection processes and to infer the architecture of adaptive variation. Given the large potential of this method, a range of software tools were developed to identify selected SNPs and to measure their selection coefficients. In this benchmarking study, we are comparing 15 test statistics implemented in 10 software tools using three different scenarios. We demonstrate that the power of the methods differs among the scenarios, but some consistently outperform others. LRT-1, which takes advantage of time series data consistently performed best for all three scenarios. Nevertheless, the CMH test, which requires only two time points had almost the same performance. This benchmark study will not only facilitate the analysis of already existing data, but also affect the design of future data collections.

2000 ◽  
Vol 16 (6) ◽  
pp. 927-997 ◽  
Author(s):  
Hyungsik R. Moon ◽  
Peter C.B. Phillips

Time series data are often well modeled by using the device of an autoregressive root that is local to unity. Unfortunately, the localizing parameter (c) is not consistently estimable using existing time series econometric techniques and the lack of a consistent estimator complicates inference. This paper develops procedures for the estimation of a common localizing parameter using panel data. Pooling information across individuals in a panel aids the identification and estimation of the localizing parameter and leads to consistent estimation in simple panel models. However, in the important case of models with concomitant deterministic trends, it is shown that pooled panel estimators of the localizing parameter are asymptotically biased. Some techniques are developed to overcome this difficulty, and consistent estimators of c in the region c < 0 are developed for panel models with deterministic and stochastic trends. A limit distribution theory is also established, and test statistics are constructed for exploring interesting hypotheses, such as the equivalence of local to unity parameters across subgroups of the population. The methods are applied to the empirically important problem of the efficient extraction of deterministic trends. They are also shown to deliver consistent estimates of distancing parameters in nonstationary panel models where the initial conditions are in the distant past. In the development of the asymptotic theory this paper makes use of both sequential and joint limit approaches. An important limitation in the operation of the joint asymptotics that is sometimes needed in our development is the rate condition n/T → 0. So the results in the paper are likely to be most relevant in panels where T is large and n is moderately large.


2020 ◽  
Vol 12 (6) ◽  
pp. 890-904 ◽  
Author(s):  
Neda Barghi ◽  
Christian Schlötterer

Abstract In molecular population genetics, adaptation is typically thought to occur via selective sweeps, where targets of selection have independent effects on the phenotype and rise to fixation, whereas in quantitative genetics, many loci contribute to the phenotype and subtle frequency changes occur at many loci during polygenic adaptation. The sweep model makes specific predictions about frequency changes of beneficial alleles and many test statistics have been developed to detect such selection signatures. Despite polygenic adaptation is probably the prevalent mode of adaptation, because of the traditional focus on the phenotype, we are lacking a solid understanding of the similarities and differences of selection signatures under the two models. Recent theoretical and empirical studies have shown that both selective sweep and polygenic adaptation models could result in a sweep-like genomic signature; therefore, additional criteria are needed to distinguish the two models. With replicated populations and time series data, experimental evolution studies have the potential to identify the underlying model of adaptation. Using the framework of experimental evolution, we performed computer simulations to study the pattern of selected alleles for two models: 1) adaptation of a trait via independent beneficial mutations that are conditioned for fixation, that is, selective sweep model and 2) trait optimum model (polygenic adaptation), that is adaptation of a quantitative trait under stabilizing selection after a sudden shift in trait optimum. We identify several distinct patterns of selective sweep and trait optimum models in populations of different sizes. These features could provide the foundation for development of quantitative approaches to differentiate the two models.


2020 ◽  
Author(s):  
Paolo Oliveri ◽  
SImona Simoncelli ◽  
Pierluigi DI Pietro ◽  
Sara Durante

&lt;p&gt;One of the main challenges for the present and future in ocean observations is to find best practices for data management: infrastructures like Copernicus and SeaDataCloud already take responsibility for assembly, archive, update and publish data. Here we present the strengths and weaknesses in a SeaDataCloud Temperature and Salinity time series data collections, in particular a tool able to recognize the different devices and platforms and to merge them with processed Copernicus platforms.&lt;/p&gt;&lt;p&gt;While Copernicus has the main target to quickly acquire and publish data, SeaDataNet aims to publish data with the best quality available. This two data repository should be considered together, since the originator can ingest the data in both the infrastructures or only in one, or partially in both. This results sometimes in data partially available in Copernicus or SeaDataCloud, with great impact for the researcher who wants to access as much data as possible. The data reprocessing should not be loaded on researchers' shoulders, since only skilled users in all data management plan know how merge the data.&lt;/p&gt;&lt;p&gt;The SeaDataCloud time series data collections is a Global Ocean soon-to-be-published dataset that will represent a reference for ocean researchers, released in binary, user friendly Ocean Data View format. The database management plan was originally for profiles, but had been adapted for time series, resolving several issues like the uniqueness of the identifiers (ID).&lt;/p&gt;&lt;p&gt;Here we present an extension of the SOURCE (Sea Observations Utility for Reprocessing. Calibration and Evaluation) Python package, able to enhance the data quality with redundant sophisticated methods and simplify their usage.&amp;#160;&lt;/p&gt;&lt;p&gt;SOURCE increases quality control (Q/C) performances on observations using statistical quality check procedures that follows the ocean best practices guidelines, exploiting the following&amp;#160; issues:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Find and aggregate all broken time series using likeness in ID parameter strings;&lt;/li&gt; &lt;li&gt;Find and organize in a dictionary all different metadata variables;&lt;/li&gt; &lt;li&gt;Correct time series time to match simpler measure units;&lt;/li&gt; &lt;li&gt;Filter devices that are outside of a selected horizontal rectangle;&lt;/li&gt; &lt;li&gt;Give some information on original Q/C scheme by SeaDataCloud infrastructure;&lt;/li&gt; &lt;li&gt;Give information tables on platforms and on the merged ID string duplicates together with an errors log file (missing time, depth, data, wrong Q/C variables, etc.).&lt;/li&gt; &lt;/ol&gt;&lt;p&gt;In particular, the duplicates table and the log file may be helpful to SeaDataCloud partners in order to update the data collection and make it finally available for the users.&lt;/p&gt;&lt;p&gt;The reconstructed SeaDataCloud time series data, divided by parameter and stored in a more flexible dataset, give the possibility to ingest it in the main part of the software, allowing to compare it with Copernicus time series, find the same platform using horizontal and vertical surroundings (without looking to ID) find and cleanup&amp;#160; duplicated data, merge the two databases to extend the data coverage.&lt;/p&gt;&lt;p&gt;This allow researchers to have the most wide and the best quality possible data for the final users release and to to use these data to calibrate and validate models, in order to reach an idea of a whole area sea conditions.&lt;/p&gt;


India, which has the most rice tillage area in the world, is one of the massive cultivators of this white crop. Besides, rice is the main staple food of many Indians. The main purpose of this study is to develop a predictive model on Indian rice production. In this, we have used different types of soft computing models like Fuzzy Logic, Statistical Equations, Artificial Neural Network (ANN) and Genetic Algorithm (GA) and developed a hybrid model to get the optimum result. The vital aspect of this predictive model is the accuracy of the future data prediction on the basis of past time series data. The Prediction performance has been assessed by using error finding equations like Mean Squared Error (MSE), Root Mean Square Error (RMSE) and Average Error.


2020 ◽  
Author(s):  
Iain Mathieson

AbstractTime series data of allele frequencies are a powerful resource for detecting and classifying natural and artificial selection. Ancient DNA now allows us to observe these trajectories in natural populations of long-lived species such as humans. Here, we develop a hidden Markov model to infer selection coefficients that vary over time. We show through simulations that our approach can accurately estimate both selection coefficients and the timing of changes in selection. Finally, we analyze some of the strongest signals of selection in the human genome using ancient DNA. We show that the European lactase persistence mutation was selected over the past 5,000 years with a selection coefficient of 2-2.5% in Britain, Central Europe and Iberia, but not Italy. In northern East Asia, selection at the ADH1B locus associated with alcohol metabolism intensified around 4,000 years ago, approximately coinciding with the introduction of rice-based agriculture. Finally, a derived allele at the FADS locus was selected in parallel in both Europe and East Asia, as previously hypothesized. Our approach is broadly applicable to both natural and experimental evolution data and shows how time series data can be used to resolve fine-scale details of selection.


1996 ◽  
Vol 40 (1) ◽  
pp. 40-45 ◽  
Author(s):  
Yu Hsing ◽  
Hui S. Chang

This paper re-examines the demand for higher education at private institutions and tests if in recent years enrollment has become more sensitive to rising tuition and other related costs. Time series data between FY 1964–65 and FY 1990–91 are used as the sample. Major findings are interesting. The general functional form yields coefficients with smaller standard errors and larger value of the test statistics. The logarithmic form can be rejected at the 5% level. Tuition elasticities rose from −0.261 to −0.557 and income elasticities also increased from 0.493 to 1.093 during the sample period. Thus, enrollment has become more sensitive to changes in tuition and other costs. However, part of the loss of enrollment due to tuition increases can be recovered by rising income elasticities.


2020 ◽  
Vol 17 (36) ◽  
pp. 1186-1198
Author(s):  
Mustofa USMAN ◽  
N INDRYANI ◽  
WARSONO A. ◽  
AMANTO WAMILIANA

The Vector Autoregressive Moving Average (VARMA) model is one of the models that is often used in modeling multivariate time series data. In time-series data of economics, especially data return, they usually have high fluctuations in some periods, so the return volatility is unstable. In modeling data return of share prices ADRO and ITMG, the behavior of high volatility will be considered. This study aims to find the best model that fits the data return of share price of the energy companies of PT Adaro Energy Tbk (ADRO) and PT Indo Tambangraya Megah Tbk (ITMG), to analyze the behavior of impulse response of the variables data return ADRO and ITMG, to analyze the granger causality test, and to forecast the next 12 periods. Based on the selection of the best model using the criteria of AICC, HQC, AIC, and SBC, it was found that the VARMA (2.2) -GARCH (1.1) model is the best one for the data in this study. The model VARMA(2,2)-GARCH (1,1) is then written as a univariate model. For the univariate ADRO model, the test statistics F = 4,73 and P-value = 0,0084, which indicates the model is very significant; and for the univariate ITMG model, the test statistics is F = 5,82 and P-value 0,0001, which indicates the model is significant. Based on the best model selected, the impulse response, Granger causality test, and forecasting for the next 12 periods are discussed.


2021 ◽  
Vol 2 (1) ◽  
pp. 32-60
Author(s):  
V. Sakthivel Samy ◽  
Koyel Pramanick ◽  
Veena Thenkanidiyoor ◽  
Jeni Victor

The aim of this study is to analyze meteorological data obtained from the various expeditions made to the Indian stations in Antarctica over recent years and determine how significantly the weather has shown a marked change over the years. For any time series data analysis, there are two main goals: (a) the authors need to identify the nature of the phenomenon from the sequence of observations and (b) predict the future data. On account of these goals, the pattern in the time series data and its variability are to be accurately identified. This paper can then interpret and integrate the pattern established with its associated meteorological datasets collected in Antarctica. Using the data analytics knowledge the validity of interpretation for the given datasets a pattern has been identified, which could extrapolate the pattern towards prediction. To ease the time series data analysis, the authors developed online meteorological data analytic portal at NCPOR, Goa http://data.ncaor.gov.in/.


Author(s):  
James Morrison ◽  
David Christie ◽  
Charles Greenwood ◽  
Ruairi Maciver ◽  
Arne Vogler

This paper presents a set of software tools for interrogating and processing time series data. The functionality of this toolset will be demonstrated using data from a specific deployment involving multiple sensors deployed for a specific time period. The approach was developed initially for Datawell Waverider MKII/MKII buoys [1] and expanded to include data from acoustic devices in this case Nortek AWACs. Tools of this nature are important to address a specific lack of features in the sensor manufacturers own tools. It also helps to develop standard approaches for dealing with anomalous data from sensors. These software tools build upon an effective modern interpreted programming language in this case Python which has access to high performance low level libraries. This paper demonstrates the use of these tools applied to a sensor network based on the North West coast of Scotland as described in [2,3]. Examples can be seen of computationally complex data being easily calculated for monthly averages. Analysis down to a wave by wave basis will also be demonstrated form the same source dataset. The tools make use of a flexible data structure called a DataFrame which supports mixed data types, hierarchical and time indexing and is also integrated with modern plotting libraries. This allows sub second querying and the ability for dynamic plotting of large datasets. By using modern compression techniques and file formats it is possible to process datasets which are larger than memory datasets without the need for a traditional relational database. The software library shall be of use to a wide variety of industry involved in offshore engineering along with any scientists interested in the coastal environment.


2019 ◽  
Vol 4 (1) ◽  
pp. 25
Author(s):  
Rizki Herdatullah ◽  
Syaiful Bukhori ◽  
Windi Eka Yulia Retnani

Optimization comes from basic words optimal which mean the best, highest, most beneficial, make the best, and do optimizing (make the best, highest, etc.). Forecasting is an attempt to predict the future. Prediction can be done by studying the pattern of historical data to find a model that can show future data. This methoed is called time series data forecasting. One of many algorithm that can builds model from historical data is Artificial Neural Networks (ANN). The algoritm mimics the human neuron system so that is can solve non-linear problems, such as the forecasting of transformator demand. In the process of modeling, ANN will always update the connection weights to find the optimum weights. In this final project ANN will be trained by Ant Colony Optimization (ACO). Based on the results can be seen that ANN with ACO as learning methods can predict transformator demand with good result.


Sign in / Sign up

Export Citation Format

Share Document