scholarly journals gapsplit: Efficient random sampling for non-convex constraint-based models

2019 ◽  
Author(s):  
Thomas C. Keaty ◽  
Paul A. Jensen

AbstractSummaryGapsplit generates random samples from convex and non-convex constraint-based models. Gapsplit targets under-sampled regions of the solution space for uniform coverage.Availability and ImplementationPython and Matlab source code are freely available at http://jensenlab.net/[email protected]

2020 ◽  
Vol 36 (8) ◽  
pp. 2623-2625 ◽  
Author(s):  
Thomas C Keaty ◽  
Paul A Jensen

Abstract Summary Gapsplit generates random samples from convex and non-convex constraint-based models by targeting under-sampled regions of the solution space. Gapsplit provides uniform coverage of linear, mixed-integer and general non-linear models. Availability and implementation Python and Matlab source code are freely available at http://jensenlab.net/tools. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Frank Ecker ◽  
Jennifer Francis ◽  
Per Olsson ◽  
Katherine Schipper

AbstractThis paper investigates how data requirements often encountered in archival accounting research can produce a data-restricted sample that is a non-random selection of observations from the reference sample to which the researcher wishes to generalize results. We illustrate the effects of non-random sampling on results of association tests in a setting with data on one variable of interest for all observations and frequently-missing data on another variable of interest. We develop and validate a resampling approach that uses only observations from the data-restricted sample to construct distribution-matched samples that approximate randomly-drawn samples from the reference sample. Our simulation tests provide evidence that distribution-matched samples yield generalizable results. We demonstrate the effects of non-random sampling in tests of the association between realized returns and five implied cost of equity metrics. In this setting, the reference sample has full information on realized returns, while on average only 16% of reference sample observations have data on cost of equity metrics. Consistent with prior research (e.g., Easton and Monahan The Accounting Review 80, 501–538, 2005), analysis using the unadjusted (non-random) cost of equity sample reveals weak or negative associations between realized returns and cost of equity metrics. In contrast, using distribution-matched samples, we find reliable evidence of the theoretically-predicted positive association. We also conceptually and empirically compare distribution-matching with multiple imputation and selection models, two other approaches to dealing with non-random samples.


1963 ◽  
Vol 9 (2) ◽  
pp. 237-249 ◽  
Author(s):  
R. N. Wensley ◽  
C. D. McKeen

The relation of soil populations of the muskmelon wilt fungus, Fusarium oxysporum f. melonis, to the wilt potentials of a yellow Fox sandy loam soil (Fsl) and a dark Colwood loam (Cl) was investigated. In either soil a direct relationship existed between the size of the population of the fungus and wilt incidence. Notwithstanding this relationship, with the same population the greater incidence of wilt in Fsl than in Cl showed that a factor or factors other than population affect the wilt potential. Whereas mean populations of field soils obtained at the site of wilted plants ranged upward to 3300 per gram, they declined steadily during the 9-month interval between crops. During this interval random samples of field soils yielded mean populations of 228 and 268 per gram of Fsl and Cl, respectively. Of the F. oxysporum colonies isolated at the end of harvest, about 70% from plant sites and approximately 21% from intersites were pathogenic. Two to eight months later only 12 to 15% of F. oxysporum isolates obtained by random sampling of infested field soils were pathogenic.


2020 ◽  
Author(s):  
Xun Zhu ◽  
Ti-Cheng Chang ◽  
Richard Webby ◽  
Gang Wu

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.


2020 ◽  
Author(s):  
N Goonasekera ◽  
A Mahmoud ◽  
J Chilton ◽  
E Afgan

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Mikko Rautiainen ◽  
Tobias Marschall

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner


1987 ◽  
Vol 63 (6) ◽  
pp. 422-425 ◽  
Author(s):  
P. L. Marshall

An interactive microcomputer program was developed to aid the design of simple random sampling with or without replacement. The program determines: (1) sample size for a set of given conditions for up to 20 variables; (2) combinations of conditions that will yield a given samples size; and (3) the relationship between percentage error and sample size for a given set of conditions. Potential applications are illustrated with three simple examples. Key Words: sample size, simple random sampling


2018 ◽  
Author(s):  
Roc Reguant ◽  
Yevgeniy Antipin ◽  
Rob Sheridan ◽  
Augustin Luna ◽  
Chris Sander

AbstractSummaryAlignmentViewer is multiple sequence alignment viewer for protein families with flexible visualization, analysis tools and links to protein family databases. It is directly accessible in web browsers without the need for software installation, as it is implemented in JavaScript, and does not require an internet connection to function. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, facilitating the computation of protein 3D structures and the detection of functionally constrained interactions.Availability and ImplementationAlignmentViewer is open source software under the MIT license. The viewer is at http://alignmentviewer.org and the source code, documentation and issue tracking, for co-development, are at https://github.com/dfci/[email protected], reaches all authors


2017 ◽  
Author(s):  
Bérénice Batut ◽  
Kévin Gravouil ◽  
Clémence Defois ◽  
Saskia Hiltemann ◽  
Jean-François Brugère ◽  
...  

AbstractBackgroundNew generation of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies.FindingsWe therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides a curated collection of tools to explore and visualize taxonomic and functional information from raw amplicon, metagenomic or metatranscriptomic sequences. To guide different analyses, several customizable workflows are included. All workflows are supported by tutorials and Galaxy interactive tours to guide the users through the analyses step by step. ASaiM is implemented as Galaxy Docker flavour. It is scalable to many thousand datasets, but also can be used a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io/)ConclusionsBased on the Galaxy framework, ASaiM offers sophisticated analyses to scientists without command-line knowledge. ASaiM provides a powerful framework to easily and quickly explore microbiota data in a reproducible and transparent environment.


2019 ◽  
Author(s):  
Elizaveta V. Starikova ◽  
Polina O. Tikhonova ◽  
Nikita A. Prianichnikov ◽  
Chris M. Rands ◽  
Evgeny M. Zdobnov ◽  
...  

AbstractSummaryPhigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated “prophage genome maps” and marks possible transposon insertion spots inside prophages. It provides putative taxonomic annotations that can distinguish tailed from non-tailed phages. It is applicable for mining prophage regions from large metagenomic datasets.AvailabilitySource code for Phigaro is freely available for download at https://github.com/bobeobibo/phigaro along with test data. The code is written in Python.


Sign in / Sign up

Export Citation Format

Share Document