scholarly journals Stacks 2: Analytical Methods for Paired-end Sequencing Improve RADseq-based Population Genomics

2019 ◽  
Author(s):  
Nicolas C. Rochette ◽  
Angel G. Rivera-Colón ◽  
Julian M. Catchen

AbstractFor half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here we describe the first software capable of using paired-end sequencing to derive short contigs from de novo RAD data natively. Stacks version 2 employs a de Bruijn graph assembler to build contigs from paired-end reads and overlap those contigs with the corresponding single-end loci. The new architecture allows all the individuals in a meta population to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes – generating RAD loci that are 400-800bp in length. To prove its recall and precision, we test the software with simulated data and compare reference-aligned and de novo analyses of three empirical datasets. We show that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo datasets.

2009 ◽  
Vol 20 (2) ◽  
pp. 265-272 ◽  
Author(s):  
R. Li ◽  
H. Zhu ◽  
J. Ruan ◽  
W. Qian ◽  
X. Fang ◽  
...  

Viruses ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 66
Author(s):  
Zoltán László ◽  
Péter Pankovics ◽  
Gábor Reuter ◽  
Attila Cságola ◽  
Ádám Bálint ◽  
...  

Most picornaviruses of the family Picornaviridae are relatively well known, but there are certain “neglected” genera like Bopivirus, containing a single uncharacterised sequence (bopivirus A1, KM589358) with very limited background information. In this study, three novel picornaviruses provisionally called ovipi-, gopi- and bopivirus/Hun (MW298057-MW298059) from enteric samples of asymptomatic ovine, caprine and bovine respectively, were determined using RT-PCR and dye-terminator sequencing techniques. These monophyletic viruses share the same type II-like IRES, NPGP-type 2A, similar genome layout (4-3-4) and cre-localisations. Culture attempts of the study viruses, using six different cell lines, yielded no evidence of viral growth in vitro. Genomic and phylogenetic analyses show that bopivirus/Hun of bovine belongs to the species Bopivirus A, while the closely related ovine-origin ovipi- and caprine-origin gopivirus could belong to a novel species “Bopivirus B” in the genus Bopivirus. Epidemiological investigation of N = 269 faecal samples of livestock (ovine, caprine, bovine, swine and rabbit) from different farms in Hungary showed that bopiviruses were most prevalent among <12-month-old ovine, caprine and bovine, but undetectable in swine and rabbit. VP1 capsid-based phylogenetic analyses revealed the presence of multiple lineages/genotypes, including closely related ovine/caprine strains, suggesting the possibility of ovine–caprine interspecies transmission of certain bopiviruses.


Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 222
Author(s):  
Bartosz Ulaszewski ◽  
Joanna Meger ◽  
Jaroslaw Burczyk

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.


Agronomy ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1342
Author(s):  
Shaghayegh Mehravi ◽  
Gholam Ali Ranjbar ◽  
Ghader Mirzaghaderi ◽  
Anita Alice Severn-Ellis ◽  
Armin Scheben ◽  
...  

The species of Pimpinella, one of the largest genera of the family Apiaceae, are traditionally cultivated for medicinal purposes. In this study, high-throughput double digest restriction-site associated DNA sequencing technology (ddRAD-seq) was used to identify single nucleotide polymorphisms (SNPs) in eight Pimpinella species from Iran. After double-digestion with the enzymes HpyCH4IV and HinfI, a total of 334,702,966 paired-end reads were de novo assembled into 1,270,791 loci with an average of 28.8 reads per locus. After stringent filtering, 2440 high-quality SNPs were identified for downstream analysis. Analysis of genetic relationships and population structure, based on these retained SNPs, indicated the presence of three major groups. Gene ontology and pathway analysis were determined by using comparison SNP-associated flanking sequences with a public non-redundant database. Due to the lack of genomic resources in this genus, our present study is the first report to provide high-quality SNPs in Pimpinella based on a de novo analysis pipeline using ddRAD-seq. This data will enhance the molecular knowledge of the genus Pimpinella and will provide an important source of information for breeders and the research community to enhance breeding programs and support the management of Pimpinella genomic resources.


Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1835
Author(s):  
Antonio Barrera ◽  
Patricia Román-Román ◽  
Francisco Torres-Ruiz

A joint and unified vision of stochastic diffusion models associated with the family of hyperbolastic curves is presented. The motivation behind this approach stems from the fact that all hyperbolastic curves verify a linear differential equation of the Malthusian type. By virtue of this, and by adding a multiplicative noise to said ordinary differential equation, a diffusion process may be associated with each curve whose mean function is said curve. The inference in the resulting processes is presented jointly, as well as the strategies developed to obtain the initial solutions necessary for the numerical resolution of the system of equations resulting from the application of the maximum likelihood method. The common perspective presented is especially useful for the implementation of the necessary procedures for fitting the models to real data. Some examples based on simulated data support the suitability of the development described in the present paper.


Toxins ◽  
2018 ◽  
Vol 10 (9) ◽  
pp. 359 ◽  
Author(s):  
Maria Romero-Gutiérrez ◽  
Carlos Santibáñez-López ◽  
Juana Jiménez-Vargas ◽  
Cesar Batista ◽  
Ernesto Ortiz ◽  
...  

To understand the diversity of scorpion venom, RNA from venomous glands from a sawfinger scorpion, Serradigitus gertschi, of the family Vaejovidae, was extracted and used for transcriptomic analysis. A total of 84,835 transcripts were assembled after Illumina sequencing. From those, 119 transcripts were annotated and found to putatively code for peptides or proteins that share sequence similarities with the previously reported venom components of other species. In accordance with sequence similarity, the transcripts were classified as potentially coding for 37 ion channel toxins; 17 host defense peptides; 28 enzymes, including phospholipases, hyaluronidases, metalloproteases, and serine proteases; nine protease inhibitor-like peptides; 10 peptides of the cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 protein superfamily; seven La1-like peptides; and 11 sequences classified as “other venom components”. A mass fingerprint performed by mass spectrometry identified 204 components with molecular masses varying from 444.26 Da to 12,432.80 Da, plus several higher molecular weight proteins whose precise masses were not determined. The LC-MS/MS analysis of a tryptic digestion of the soluble venom resulted in the de novo determination of 16,840 peptide sequences, 24 of which matched sequences predicted from the translated transcriptome. The database presented here increases our general knowledge of the biodiversity of venom components from neglected non-buthid scorpions.


Author(s):  
Russell Lewis McLaughlin

Abstract Motivation Repeat expansions are an important class of genetic variation in neurological diseases. However, the identification of novel repeat expansions using conventional sequencing methods is a challenge due to their typical lengths relative to short sequence reads and difficulty in producing accurate and unique alignments for repetitive sequence. However, this latter property can be harnessed in paired-end sequencing data to infer the possible locations of repeat expansions and other structural variation. Results This article presents REscan, a command-line utility that infers repeat expansion loci from paired-end short read sequencing data by reporting the proportion of reads orientated towards a locus that do not have an adequately mapped mate. A high REscan statistic relative to a population of data suggests a repeat expansion locus for experimental follow-up. This approach is validated using genome sequence data for 259 cases of amyotrophic lateral sclerosis, of which 24 are positive for a large repeat expansion in C9orf72, showing that REscan statistics readily discriminate repeat expansion carriers from non-carriers. Availabilityand implementation C source code at https://github.com/rlmcl/rescan (GNU General Public Licence v3).


Sign in / Sign up

Export Citation Format

Share Document