gapsplit: Efficient random sampling for non-convex constraint-based models

Gapsplit: efficient random sampling for non-convex constraint-based models

Bioinformatics ◽

10.1093/bioinformatics/btz971 ◽

2020 ◽

Vol 36 (8) ◽

pp. 2623-2625 ◽

Cited By ~ 1

Author(s):

Thomas C Keaty ◽

Paul A Jensen

Keyword(s):

Random Sampling ◽

Linear Models ◽

Source Code ◽

Solution Space ◽

Supplementary Information ◽

Mixed Integer ◽

Supplementary Data ◽

Convex Constraint ◽

Random Samples ◽

Constraint Based Models

Abstract Summary Gapsplit generates random samples from convex and non-convex constraint-based models by targeting under-sampled regions of the solution space. Gapsplit provides uniform coverage of linear, mixed-integer and general non-linear models. Availability and implementation Python and Matlab source code are freely available at http://jensenlab.net/tools. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Non-random sampling and association tests on realized returns and risk proxies

Review of Accounting Studies ◽

10.1007/s11142-021-09581-0 ◽

2021 ◽

Author(s):

Frank Ecker ◽

Jennifer Francis ◽

Per Olsson ◽

Katherine Schipper

Keyword(s):

Random Sampling ◽

Reference Sample ◽

Positive Association ◽

Cost Of Equity ◽

Association Tests ◽

Random Samples ◽

Distribution Matching ◽

Matched Samples ◽

Data Requirements ◽

Selection Of

AbstractThis paper investigates how data requirements often encountered in archival accounting research can produce a data-restricted sample that is a non-random selection of observations from the reference sample to which the researcher wishes to generalize results. We illustrate the effects of non-random sampling on results of association tests in a setting with data on one variable of interest for all observations and frequently-missing data on another variable of interest. We develop and validate a resampling approach that uses only observations from the data-restricted sample to construct distribution-matched samples that approximate randomly-drawn samples from the reference sample. Our simulation tests provide evidence that distribution-matched samples yield generalizable results. We demonstrate the effects of non-random sampling in tests of the association between realized returns and five implied cost of equity metrics. In this setting, the reference sample has full information on realized returns, while on average only 16% of reference sample observations have data on cost of equity metrics. Consistent with prior research (e.g., Easton and Monahan The Accounting Review 80, 501–538, 2005), analysis using the unadjusted (non-random) cost of equity sample reveals weak or negative associations between realized returns and cost of equity metrics. In contrast, using distribution-matched samples, we find reliable evidence of the theoretically-predicted positive association. We also conceptually and empirically compare distribution-matching with multiple imputation and selection models, two other approaches to dealing with non-random samples.

Download Full-text

POPULATIONS OF FUSARIUM OXYSPORUM F. MELONIS AND THEIR RELATION TO THE WILT POTENTIAL OF TWO SOILS

Canadian Journal of Microbiology ◽

10.1139/m63-030 ◽

1963 ◽

Vol 9 (2) ◽

pp. 237-249 ◽

Cited By ~ 14

Author(s):

R. N. Wensley ◽

C. D. McKeen

Keyword(s):

Fusarium Oxysporum ◽

Random Sampling ◽

Direct Relationship ◽

Sandy Loam ◽

Sandy Loam Soil ◽

Random Samples ◽

Field Soils ◽

Infested Field ◽

Loam Soil ◽

Wilt Incidence

The relation of soil populations of the muskmelon wilt fungus, Fusarium oxysporum f. melonis, to the wilt potentials of a yellow Fox sandy loam soil (Fsl) and a dark Colwood loam (Cl) was investigated. In either soil a direct relationship existed between the size of the population of the fungus and wilt incidence. Notwithstanding this relationship, with the same population the greater incidence of wilt in Fsl than in Cl showed that a factor or factors other than population affect the wilt potential. Whereas mean populations of field soils obtained at the site of wilted plants ranged upward to 3300 per gram, they declined steadily during the 9-month interval between crops. During this interval random samples of field soils yielded mean populations of 228 and 268 per gram of Fsl and Cl, respectively. Of the F. oxysporum colonies isolated at the end of harvest, about 70% from plant sites and approximately 21% from intersites were pathogenic. Two to eight months later only 12 to 15% of F. oxysporum isolates obtained by random sampling of infested field soils were pathogenic.

Download Full-text

idCOV: a pipeline for quick clade identification of SARS-CoV-2 isolates

10.1101/2020.10.08.330456 ◽

2020 ◽

Author(s):

Xun Zhu ◽

Ti-Cheng Chang ◽

Richard Webby ◽

Gang Wu

Keyword(s):

Personal Computer ◽

Source Code ◽

Command Line ◽

Sequencing Data ◽

Link Type ◽

Public Dataset ◽

Virus Isolates

AbstractidCOV is a phylogenetic pipeline for quickly identifying the clades of SARS-CoV-2 virus isolates from raw sequencing data based on a selected clade-defining marker list. Using a public dataset, we show that idCOV can make equivalent calls as annotated by Nextstrain.org on all three common clade systems using user uploaded FastQ files directly. Web and equivalent command-line interfaces are available. It can be deployed on any Linux environment, including personal computer, HPC and the cloud. The source code is available at https://github.com/xz-stjude/idcov. A documentation for installation can be found at https://github.com/xz-stjude/idcov/blob/master/README.md.

Download Full-text

GalaxyCloudRunner: enhancing scalable computing for Galaxy

10.1101/2020.05.28.121772 ◽

2020 ◽

Author(s):

N Goonasekera ◽

A Mahmoud ◽

J Chilton ◽

E Afgan

Keyword(s):

Source Code ◽

Supplementary Information ◽

Scalable Computing ◽

Link Type ◽

Cloud Providers ◽

Galaxy Server ◽

Cloud Resources

AbstractSummaryThe existence of more than 100 public Galaxy servers with service quotas is indicative of the need for an increased availability of compute resources for Galaxy to use. The GalaxyCloudRunner enables a Galaxy server to easily expand its available compute capacity by sending user jobs to cloud resources. User jobs are routed to the acquired resources based on a set of configurable rules and the resources can be dynamically acquired from any of 4 popular cloud providers (AWS, Azure, GCP, or OpenStack) in an automated fashion.Availability and implementationGalaxyCloudRunner is implemented in Python and leverages Docker containers. The source code is MIT licensed and available at https://github.com/cloudve/galaxycloudrunner. The documentation is available at http://gcr.cloudve.org/.ContactEnis Afgan ([email protected])Supplementary informationNone

Download Full-text

GraphAligner: rapid and versatile sequence-to-graph alignment

Genome Biology ◽

10.1186/s13059-020-02157-2 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 1

Author(s):

Mikko Rautiainen ◽

Tobias Marschall

Keyword(s):

Genetic Variation ◽

Error Correction ◽

Genome Assembly ◽

State Of The Art ◽

Source Code ◽

The State ◽

Graph Alignment ◽

Link Type ◽

Long Reads

Abstract Genome graphs can represent genetic variation and sequence uncertainty. Aligning sequences to genome graphs is key to many applications, including error correction, genome assembly, and genotyping of variants in a pangenome graph. Yet, so far, this step is often prohibitively slow. We present GraphAligner, a tool for aligning long reads to genome graphs. Compared to the state-of-the-art tools, GraphAligner is 13x faster and uses 3x less memory. When employing GraphAligner for error correction, we find it to be more than twice as accurate and over 12x faster than extant tools.Availability: Package manager: https://anaconda.org/bioconda/graphalignerand source code: https://github.com/maickrau/GraphAligner

Download Full-text

A Microcomputer Program for Assisting in the Design of Simple Random Samples

The Forestry Chronicle ◽

10.5558/tfc63422-6 ◽

1987 ◽

Vol 63 (6) ◽

pp. 422-425 ◽

Cited By ~ 1

Author(s):

P. L. Marshall

Keyword(s):

Sample Size ◽

Key Words ◽

Random Sampling ◽

Simple Random Sampling ◽

Percentage Error ◽

Microcomputer Program ◽

Random Samples ◽

Potential Applications ◽

The Relationship

An interactive microcomputer program was developed to aid the design of simple random sampling with or without replacement. The program determines: (1) sample size for a set of given conditions for up to 20 variables; (2) combinations of conditions that will yield a given samples size; and (3) the relationship between percentage error and sample size for a given set of conditions. Potential applications are illustrated with three simple examples. Key Words: sample size, simple random sampling

Download Full-text

AlignmentViewer: Sequence Analysis of Large Protein Families

10.1101/269720 ◽

2018 ◽

Cited By ~ 1

Author(s):

Roc Reguant ◽

Yevgeniy Antipin ◽

Rob Sheridan ◽

Augustin Luna ◽

Chris Sander

Keyword(s):

Open Source Software ◽

Source Code ◽

Web Browsers ◽

Protein Families ◽

Large Protein ◽

Multiple Sequence ◽

Internet Connection ◽

Visualization Analysis ◽

Link Type ◽

Evolutionary Coupling

AbstractSummaryAlignmentViewer is multiple sequence alignment viewer for protein families with flexible visualization, analysis tools and links to protein family databases. It is directly accessible in web browsers without the need for software installation, as it is implemented in JavaScript, and does not require an internet connection to function. It can handle protein families with tens of thousands of sequences and is particularly suitable for evolutionary coupling analysis, facilitating the computation of protein 3D structures and the detection of functionally constrained interactions.Availability and ImplementationAlignmentViewer is open source software under the MIT license. The viewer is at http://alignmentviewer.org and the source code, documentation and issue tracking, for co-development, are at https://github.com/dfci/[email protected], reaches all authors

Download Full-text

ASaiM: a Galaxy-based framework to analyze raw shotgun data from microbiota

10.1101/183970 ◽

2017 ◽

Cited By ~ 2

Author(s):

Bérénice Batut ◽

Kévin Gravouil ◽

Clémence Defois ◽

Saskia Hiltemann ◽

Jean-François Brugère ◽

...

Keyword(s):

Technological Progress ◽

Source Code ◽

Command Line ◽

Bioinformatic Tools ◽

Link Type ◽

Data Analyses ◽

The Galaxy ◽

Sequencing Platforms ◽

User Friendly ◽

New Generation

AbstractBackgroundNew generation of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies.FindingsWe therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides a curated collection of tools to explore and visualize taxonomic and functional information from raw amplicon, metagenomic or metatranscriptomic sequences. To guide different analyses, several customizable workflows are included. All workflows are supported by tutorials and Galaxy interactive tours to guide the users through the analyses step by step. ASaiM is implemented as Galaxy Docker flavour. It is scalable to many thousand datasets, but also can be used a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io/)ConclusionsBased on the Galaxy framework, ASaiM offers sophisticated analyses to scientists without command-line knowledge. ASaiM provides a powerful framework to easily and quickly explore microbiota data in a reproducible and transparent environment.

Download Full-text

Phigaro: high throughput prophage sequence annotation

10.1101/598243 ◽

2019 ◽

Cited By ~ 6

Author(s):

Elizaveta V. Starikova ◽

Polina O. Tikhonova ◽

Nikita A. Prianichnikov ◽

Chris M. Rands ◽

Evgeny M. Zdobnov ◽

...

Keyword(s):

Test Data ◽

High Throughput ◽

Source Code ◽

Sequence Annotation ◽

Command Line ◽

Link Type ◽

Genome Maps ◽

Transposon Insertion ◽

Prophage Sequence

AbstractSummaryPhigaro is a standalone command-line application that is able to detect prophage regions taking raw genome and metagenome assemblies as an input. It also produces dynamic annotated “prophage genome maps” and marks possible transposon insertion spots inside prophages. It provides putative taxonomic annotations that can distinguish tailed from non-tailed phages. It is applicable for mining prophage regions from large metagenomic datasets.AvailabilitySource code for Phigaro is freely available for download at https://github.com/bobeobibo/phigaro along with test data. The code is written in Python.

Download Full-text