scholarly journals Defining Loci in Restriction-Based Reduced Representation Genomic Data from Nonmodel Species: Sources of Bias and Diagnostics for Optimal Clustering

2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Daniel C. Ilut ◽  
Marie L. Nydam ◽  
Matthew P. Hare

Next generation sequencing holds great promise for applications of phylogeography, landscape genetics, and population genomics in wild populations of nonmodel species, but the robustness of inferences hinges on careful experimental design and effective bioinformatic removal of predictable artifacts. Addressing this issue, we use published genomes from a tunicate, stickleback, and soybean to illustrate the potential for bioinformatic artifacts and introduce a protocol to minimize two sources of error expected from similarity-based de-novo clustering of stacked reads: the splitting of alleles into different clusters, which creates false homozygosity, and the grouping of paralogs into the same cluster, which creates false heterozygosity. We present an empirical application focused onCiona savignyi, a tunicate with very high SNP heterozygosity (~0.05), because high diversity challenges the computational efficiency of most existing nonmodel pipelines while also potentially exacerbating paralog artifacts. The simulated and empirical data illustrate the advantages of using higher sequence difference clustering thresholds than is typical and demonstrate the utility of our protocol for efficiently identifying an optimum threshold from data without prior knowledge of heterozygosity. The empiricalCiona savignyidata also highlight null alleles as a potentially large source of false homozygosity in restriction-based reduced representation genomic data.

Forests ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 222
Author(s):  
Bartosz Ulaszewski ◽  
Joanna Meger ◽  
Jaroslaw Burczyk

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.


2020 ◽  
Vol 11 (11) ◽  
Author(s):  
Jing-dong Zhou ◽  
Ting-juan Zhang ◽  
Zi-jun Xu ◽  
Zhao-qun Deng ◽  
Yu Gu ◽  
...  

AbstractThe potential mechanism of myelodysplastic syndromes (MDS) progressing to acute myeloid leukemia (AML) remains poorly elucidated. It has been proved that epigenetic alterations play crucial roles in the pathogenesis of cancer progression including MDS. However, fewer studies explored the whole-genome methylation alterations during MDS progression. Reduced representation bisulfite sequencing was conducted in four paired MDS/secondary AML (MDS/sAML) patients and intended to explore the underlying methylation-associated epigenetic drivers in MDS progression. In four paired MDS/sAML patients, cases at sAML stage exhibited significantly increased methylation level as compared with the matched MDS stage. A total of 1090 differentially methylated fragments (DMFs) (441 hypermethylated and 649 hypomethylated) were identified involving in MDS pathogenesis, whereas 103 DMFs (96 hypermethylated and 7 hypomethylated) were involved in MDS progression. Targeted bisulfite sequencing further identified that aberrant GFRA1, IRX1, NPY, and ZNF300 methylation were frequent events in an additional group of de novo MDS and AML patients, of which only ZNF300 methylation was associated with ZNF300 expression. Subsequently, ZNF300 hypermethylation in larger cohorts of de novo MDS and AML patients was confirmed by real-time quantitative methylation-specific PCR. It was illustrated that ZNF300 methylation could act as a potential biomarker for the diagnosis and prognosis in MDS and AML patients. Functional experiments demonstrated the anti-proliferative and pro-apoptotic role of ZNF300 overexpression in MDS-derived AML cell-line SKM-1. Collectively, genome-wide DNA hypermethylation were frequent events during MDS progression. Among these changes, ZNF300 methylation, a regulator of ZNF300 expression, acted as an epigenetic driver in MDS progression. These findings provided a theoretical basis for the usage of demethylation drugs in MDS patients against disease progression.


Author(s):  
Takuya Shimizu ◽  
Tadakazu Kondo ◽  
Yasuhito Nannya ◽  
Mizuki Watanabe ◽  
Toshio Kitawaki ◽  
...  

2020 ◽  
Vol 94 ◽  
Author(s):  
D. Babaran ◽  
M.T. Arts ◽  
R.J. Botelho ◽  
S.A. Locke ◽  
J. Koprivnikar

Abstract The free-living infectious stages of macroparasites, specifically, the cercariae of trematodes (flatworms), are likely to be significant (albeit underappreciated) vectors of nutritionally important polyunsaturated fatty acids (PUFA) to consumers within aquatic food webs, and other macroparasites could serve similar roles. In the context of de novo omega-3 (n-3) PUFA biosynthesis, it was thought that most animals lack the fatty acid (FA) desaturase enzymes that convert stearic acid (18:0) into ɑ-linolenic acid (ALA; 18:3n-3), the main FA precursor for n-3 long-chain PUFA. Recently, novel sequences of these enzymes were recovered from 80 species from six invertebrate phyla, with experimental confirmation of gene function in five phyla. Given this wide distribution, and the unusual attributes of flatworm genomes, we conducted an additional search for genes for de novo n-3 PUFA in the phylum Platyhelminthes. Searches with experimentally confirmed sequences from Rotifera recovered nine relevant FA desaturase sequences from eight species in four genera in the two exclusively endoparasite classes (Trematoda and Cestoda). These results could indicate adaptations of these particular parasite species, or may reflect the uneven taxonomic coverage of sequence databases. Although additional genomic data and, particularly, experimental study of gene functionality are important future validation steps, our results indicate endoparasitic platyhelminths may have enzymes for de novo n-3 PUFA biosynthesis, thereby contributing to global PUFA production, but also representing a potential target for clinical antihelmintic applications.


2014 ◽  
Vol 12 (S1) ◽  
pp. S83-S86 ◽  
Author(s):  
Yul-Kyun Ahn ◽  
Swati Tripathi ◽  
Young-Il Cho ◽  
Jeong-Ho Kim ◽  
Hye-Eun Lee ◽  
...  

Next-generation sequencing technique has been known as a useful tool for de novo transcriptome assembly, functional annotation of genes and identification of molecular markers. This study was carried out to mine molecular markers from de novo assembled transcriptomes of four chilli pepper varieties, the highly pungent ‘Saengryeg 211’ and non-pungent ‘Saengryeg 213’ and variably pigmented ‘Mandarin’ and ‘Blackcluster’. Pyrosequencing of the complementary DNA library resulted in 361,671, 274,269, 279,221, and 316,357 raw reads, which were assembled in 23,607, 19,894, 18,340 and 20,357 contigs, for the four varieties, respectively. Detailed sequence variant analysis identified numerous potential single-nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) for all the varieties for which the primers were designed. The transcriptome information and SNP/SSR markers generated in this study provide valuable resources for high-density molecular genetic mapping in chilli pepper and Quantitative trait loci analysis related to fruit qualities. These markers for pepper will be highly valuable for marker-assisted breeding and other genetic studies.


Genes ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 124
Author(s):  
Alessio Iannucci ◽  
Alexey I. Makunin ◽  
Artem P. Lisachov ◽  
Claudio Ciofi ◽  
Roscoe Stanyon ◽  
...  

The study of vertebrate genome evolution is currently facing a revolution, brought about by next generation sequencing technologies that allow researchers to produce nearly complete and error-free genome assemblies. Novel approaches however do not always provide a direct link with information on vertebrate genome evolution gained from cytogenetic approaches. It is useful to preserve and link cytogenetic data with novel genomic discoveries. Sequencing of DNA from single isolated chromosomes (ChromSeq) is an elegant approach to determine the chromosome content and assign genome assemblies to chromosomes, thus bridging the gap between cytogenetics and genomics. The aim of this paper is to describe how ChromSeq can support the study of vertebrate genome evolution and how it can help link cytogenetic and genomic data. We show key examples of ChromSeq application in the refinement of vertebrate genome assemblies and in the study of vertebrate chromosome and karyotype evolution. We also provide a general overview of the approach and a concrete example of genome refinement using this method in the species Anolis carolinensis.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 1138 ◽  
Author(s):  
Holly B Bratcher ◽  
Craig Corton ◽  
Keith A Jolley ◽  
Julian Parkhill ◽  
Martin CJ Maiden

2014 ◽  
Author(s):  
Jonathan Puritz ◽  
Christopher M. Hollenbeck ◽  
John R. Gold

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).


Sign in / Sign up

Export Citation Format

Share Document