A massively parallel algorithm for finding non-existing sequences in genomes

Mapping Intimacies ◽

10.1101/709949 ◽

2019 ◽

Author(s):

Marco Falda

Keyword(s):

Parallel Algorithm ◽

Reference Genome ◽

Hamming Distance ◽

Massively Parallel ◽

Additional Information ◽

Link Type ◽

Absent Words

AbstractWe discuss a method for producing a set of absent words in a reference genome with a guaranteed Hamming distance along all positions and additional information about the number of mismatches, their location and the position of the best match. We implemented it exploiting the massively parallelism of modern GPUs hardware: the code is available at https://bitbucket.org/mfalda/cuda_keeseek/.

Download Full-text

A massively parallel strategy for STR marker development, capture, and genotyping

10.1101/063727 ◽

2016 ◽

Author(s):

Logan Kistler ◽

Stephen M. Johnson ◽

Mitchell T. Irwin ◽

Edward E. Louis ◽

Aakrosh Ratan ◽

...

Keyword(s):

Reference Genome ◽

Massively Parallel Sequencing ◽

Genetic Research ◽

Test Group ◽

Massively Parallel ◽

Fecal Dna ◽

Marker Development ◽

Sequencing Data ◽

Str Loci ◽

Link Type

AbstractShort tandem repeat (STRs or microsatellites) variants, are highly polymorphic markers that facilitate powerful, high-precision population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic flux. However, STR marker development and analysis by conventional PCR-based methods imposes a workflow bottleneck and is suboptimal for noninvasive sampling strategies such as fecal DNA recovery. While massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery, here we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without requiring a reference genome assembly, and a methodological approach for highly parallel recovery of enriched STR loci. We first employed our approach to design and capture a panel of 5,000 STR loci from a test group of diademed sifakas (Propithecus diadema, n=3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci—97.3-99.6% of STRs characterized with ≥10x non-redundant coverage. Second, we tested our STR capture strategy on a P. diadema fecal DNA preparation, and report robust initial results and methodological suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from regions flanking the STR loci. Our method provides a cost-effective and highly scalable solution for rapid recovery of large STR and SNP datasets in any species without need for a reference genome, and can be used even with suboptimal DNA, which is more easily acquired in conservation and ecological genetic studies.Data DepositionRaw sequencing data are available under Study Accession numbers SRP073167 (genomic shotgun data for Oberon and Tatiana) and SRP076225 (targeted re-sequencing data) from the NCBI Sequence Read Archive. BaitSTR software is available at Github (core BaitSTR programs: https://github.com/aakrosh/BaitSTR; BaitSTR_type.pl companion script for genotyping and block manipulation: https://github.com/lkistler/BaitSTR_type).

Download Full-text

The Development of ASHS HortBase—A Global Information System

HortScience ◽

10.21273/hortsci.33.3.552e ◽

1998 ◽

Vol 33 (3) ◽

pp. 552e-552

Author(s):

James L. Green

Keyword(s):

Information System ◽

Information Needs ◽

Standing Committee ◽

Task Forces ◽

Additional Information ◽

Link Type ◽

Dispersed System ◽

Global Information System ◽

Information File ◽

International Standing

In 1997, the ASHS Board of Directors established ASHS HortBase as a Standing Committee of the Society. The ASHS HortBase Committee, a six-member Standing Committee and Chair, is charged to implement and maintain ASHS HortBase. The members of the ASHS HortBase Committee will be chair and chair-elect of the three HortBase Task Forces: 1) Finance and Marketing; 2) Standards—authoring, reviewing, and publishing; and 3) Technology. ASHS HortBase is a dispersed, dynamic horticultural information system (network) on the WWW comprised of peer—reviewed, concise, interlinked information modules to meet the information needs of instructors and students, gardeners and growers. A strong advantage and distinguishing characteristic of ASHS HortBase is our dynamic pool of potential authors, reviewers, and users (ASHS Extension, Industry, and Teaching membership) to continually evolve and update the peer-reviewed information in HortBase. We have the scholastic international standing to provide peer review and validation of the information and to recognition to the authors, coupled with the marketing to stimulate wide use of their information modules. ASHS HortBase is a dispersed system (dispersed development and server costs). The “dispersed cost” for information file development and updating and delivery on the respective authors' dispersed servers disperses the major costs of the HortBase information system. Additional information on ASHS HortBase and the papers presented at the 4-h Colloquium on HortBase at ASHS-97 can be found at http://[email protected] or contact me ([email protected], phone 541.737.5452, fax 541.737.3479).

Download Full-text

Enhanced Ant Colony-Inspired Parallel Algorithm to Improve Cryptographic PRNGs

Journal of Cyber Security and Mobility ◽

10.13052/2245-1439.623 ◽

2017 ◽

Author(s):

Jorg Keller ◽

Gabriele Spenger ◽

Steffen Wendzel

Keyword(s):

State Space ◽

Parallel Algorithm ◽

Random Number ◽

Random Number Generator ◽

Ant Colony ◽

Massively Parallel ◽

Power Devices ◽

Promising Candidate ◽

Massively Parallel Systems ◽

Pseudo Random Number Generator

We present and motivate a parallel algorithm to compute promising candidate states for modifying the state space of a pseudo-random number generator in order to increase its cycle length. This is important for generators in low-power devices where increase of state space to achieve longer cycles is not an alternative. The runtime of the parallel algorithm is improved by an analogy to ant colony behavior: if two paths meet, the resulting path is followed at accelerated speed just as ants tend to reinforce paths that have been used by other ants. We evaluate our algorithm with simulations and demonstrate high parallel efficiency that makes the algorithm well-suited even for massively parallel systems like GPUs. Furthermore, the accelerated path variant of the algorithm achieves a runtime improvement of up to 4% over the straightforward implementation.1

Download Full-text

Identity and compatibility of reference genome resources

10.1101/2021.03.15.435425 ◽

2021 ◽

Author(s):

Michał Stolarczyk ◽

Bingjie Xue ◽

Nathan C. Sheffield

Keyword(s):

Coordinate System ◽

Genome Analysis ◽

Reference Genome ◽

Reference Data ◽

Coordinate Systems ◽

Link Type ◽

Novel Approach ◽

Many Sources ◽

Parent Child Relationships ◽

Parent Child

Genome analysis relies on reference data like sequences, feature annotations, and aligner indexes. These data can be found in many versions from many sources, making it challenging to identify and assess compatibility among them. For example, how can you determine which indexes are derived from identical raw sequence files, or which annotations share a compatible coordinate system? Here, we describe a novel approach to establish identity and compatibility of reference genome resources. We approach this with three advances: First, we derive unique identifiers for each resource; second, we record parent-child relationships among resources; and third, we describe recursive identifiers that determine identity as well as compatibility of coordinate systems and sequence names. These advances facilitate portability, reproducibility, and re-use of genome reference data.Availabilityhttps://refgenie.databio.org

Download Full-text

A Massively Parallel Algorithm for Fuzzy Vector Quantization

The KIPS Transactions PartA ◽

10.3745/kipsta.2009.16a.6.411 ◽

2009 ◽

Vol 16A (6) ◽

pp. 411-418 ◽

Cited By ~ 1

Author(s):

Luong Van Huynh ◽

Cheol-Hong Kim ◽

Jong-Myon Kim

Keyword(s):

Parallel Algorithm ◽

Vector Quantization ◽

Massively Parallel ◽

Fuzzy Vector Quantization ◽

Fuzzy Vector

Download Full-text

A MASSIVELY PARALLEL ALGORITHM FOR VECTOR QUANTIZATION

Proceedings DCC '95 Data Compression Conference ◽

10.1109/dcc.1995.515604 ◽

2005 ◽

Cited By ~ 6

Author(s):

K.S. Prashant ◽

V.J. Mathews

Keyword(s):

Parallel Algorithm ◽

Vector Quantization ◽

Massively Parallel

Download Full-text

A fast and accurate parallel algorithm for genome mapping assembly aimed at massively parallel sequencers

Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics - BCB '15 ◽

10.1145/2808719.2812220 ◽

2015 ◽

Cited By ~ 1

Author(s):

Wilfredo Lugo ◽

Jaime Seguel

Keyword(s):

Parallel Algorithm ◽

Genome Mapping ◽

Massively Parallel

Download Full-text

MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller–Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers

Journal of Chemical Theory and Computation ◽

10.1021/ct400795v ◽

2013 ◽

Vol 9 (12) ◽

pp. 5373-5380 ◽

Cited By ~ 31

Author(s):

Michio Katouda ◽

Takahito Nakajima

Keyword(s):

Parallel Algorithm ◽

Second Order ◽

Massively Parallel ◽

Perturbation Calculation ◽

Moller Plesset

Download Full-text

MPI/OpenMP hybrid parallel algorithm for resolution of identity second-order Møller-Plesset perturbation calculation of analytical energy gradient for massively parallel multicore supercomputers

Journal of Computational Chemistry ◽

10.1002/jcc.24701 ◽

2017 ◽

Vol 38 (8) ◽

pp. 489-507

Author(s):

Michio Katouda ◽

Takahito Nakajima

Keyword(s):

Parallel Algorithm ◽

Second Order ◽

Massively Parallel ◽

Perturbation Calculation ◽

Energy Gradient ◽

Moller Plesset

Download Full-text

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

10.1101/2020.04.17.035287 ◽

2020 ◽

Author(s):

Yuxuan Yuan ◽

Philipp E. Bayer ◽

Robyn Anderson ◽

HueyTyng Lee ◽

Chon-Kit Kenneth Chan ◽

...

Keyword(s):

Genome Assembly ◽

Chinese Spring ◽

Complete Genome ◽

Reference Genome ◽

Computing Time ◽

Link Type ◽

Recent Advances ◽

Long Read ◽

Genome Assemblies

AbstractRecent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k-mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at https://github.com/AppliedBioinformatics/RefKA

Download Full-text