scholarly journals AnnotationBustR: an R package to extract subsequences from GenBank annotations

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5179 ◽  
Author(s):  
Samuel R. Borstein ◽  
Brian C. O’Meara

BackgroundDNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequence records on GenBank contain more than one gene or are portions of genomes. Inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from many GenBank accessions based on annotations where gene names may vary extensively.ResultsThe R packageAnnotationBustRallows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers.AnnotationBustRextracts subsequences of interest and then writes them to a FASTA file for users to employ in their research endeavors.ConclusionFASTA files of extracted subsequences and accession tables generated byAnnotationBustRallow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses.

Author(s):  
Samuel R. Borstein ◽  
Brian C. O'Meara

Background. DNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequences on GenBank contain more than one gene or are portions of genomes, and inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from multiple GenBank accessions based on annotations where gene names may vary extensively. Results. The R package AnnotationBustR allows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers. AnnotationBustR extracts portions of interest and then writes them to a FASTA file for users to employ in their research endeavors. Conclusion. FASTA files of extracted subsequences and accession tables generated by AnnotationBustR allow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses.


Author(s):  
Samuel R. Borstein ◽  
Brian C. O'Meara

Background. DNA sequences are pivotal for a wide array of research in biology. Large sequence databases, like GenBank, provide an amazing resource to utilize DNA sequences for large scale analyses. However, many sequences on GenBank contain more than one gene or are portions of genomes, and inconsistencies in the way genes are annotated and the numerous synonyms a single gene may be listed under provide major challenges for extracting large numbers of subsequences for comparative analysis across taxa. At present, there is no easy way to extract portions from multiple GenBank accessions based on annotations where gene names may vary extensively. Results. The R package AnnotationBustR allows users to extract sequences based on GenBank annotations through the ACNUC retrieval system given search terms of gene synonyms and accession numbers. AnnotationBustR extracts portions of interest and then writes them to a FASTA file for users to employ in their research endeavors. Conclusion. FASTA files of extracted subsequences and accession tables generated by AnnotationBustR allow users to quickly find and extract subsequences from GenBank accessions. These sequences can then be incorporated in various analyses, like the construction of phylogenies to test a wide range of ecological and evolutionary hypotheses.


Author(s):  
Alexandra Sanmark

Chapter 5 shifts the focus to the rituals and activities of the wider community in Scandinavia. At thing sites a wide range of community activities and rituals, which most likely and created collective memories and strengthened social cohesion, were enacted. Many of these activities may have been designed by the elite, but equally the idea of assemblies as communal spaces may have been collectively driven. The archaeological signature of meeting-places and assembly-sites suggests associations with feasting and eating on a large-scale, and architectural layouts that emphasised the collective over the individual and facilitated group interaction and cohesion. The construction, enlargement and maintenance of monuments and other features required the participation of large numbers of people. By joining in this work the population gained shared ownership of the sites. This was further enhanced by communal activities during the meetings, which also involved games and sports, as well as trade. Assemblies therefore formed arenas of interplay between the top-elite and the wider population; kings were elected and ruled through the assembly, while at the same time continuously dependent on the endorsement of the people.


2016 ◽  
Author(s):  
Alan Medlar ◽  
Laura Laakso ◽  
Andreia Miraldo ◽  
Ari Löytynoja

AbstractHigh-throughput RNA-seq data has become ubiquitous in the study of non-model organisms, but its use in comparative analysis remains a challenge. Without a reference genome for mapping, sequence data has to be de novo assembled, producing large numbers of short, highly redundant contigs. Preparing these assemblies for comparative analyses requires the removal of redundant isoforms, assignment of orthologs and converting fragmented transcripts into gene alignments. In this article we present Glutton, a novel tool to process transcriptome assemblies for downstream evolutionary analyses. Glutton takes as input a set of fragmented, possibly erroneous transcriptome assemblies. Utilising phylogeny-aware alignment and reference data from a closely related species, it reconstructs one transcript per gene, finds orthologous sequences and produces accurate multiple alignments of coding sequences. We present a comprehensive analysis of Glutton’s performance across a wide range of divergence times between study and reference species. We demonstrate the impact choice of assembler has on both the number of alignments and the correctness of ortholog assignment and show substantial improvements over heuristic methods, without sacrificing correctness. Finally, using inference of Darwinian selection as an example of downstream analysis, we show that Glutton-processed RNA-seq data give results comparable to those obtained from full length gene sequences even with distantly related reference species. Glutton is available from http://wasabiapp.org/software/glutton/ and is licensed under the GPLv3.


2015 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Andy Law ◽  
Alan Gray ◽  
John A. Woolliams ◽  
Albert Tenesa

Computational tools are quickly becoming the main bottleneck to analyze large-scale genomic and genetic data. This big-data problem, affecting a wide range of fields, is becoming more acute with the fast increase of data available. To address it, we developed DISSECT, a new, easy to use, and freely available software able to exploit the parallel computer architectures of supercomputers to perform a wide range of genomic and epidemiologic analyses which currently can only be carried out on reduced sample sizes or in restricted conditions. We showcased our new tool by addressing the challenge of predicting phenotypes from genotype data in human populations using Mixed Linear Model analysis. We analyzed simulated traits from half a million individuals genotyped for 590,004 SNPs using the combined computational power of 8,400 processor cores. We found that prediction accuracies in excess of 80% of the theoretical maximum could be achieved with large numbers of training individuals.


2020 ◽  
Vol 37 (12) ◽  
pp. 3684-3698 ◽  
Author(s):  
Ruidong Li ◽  
Han Qu ◽  
Jinfeng Chen ◽  
Shibo Wang ◽  
John M Chater ◽  
...  

Abstract Compared with genomic data of individual markers, haplotype data provide higher resolution for DNA variants, advancing our knowledge in genetics and evolution. Although many computational and experimental phasing methods have been developed for analyzing diploid genomes, it remains challenging to reconstruct chromosome-scale haplotypes at low cost, which constrains the utility of this valuable genetic resource. Gamete cells, the natural packaging of haploid complements, are ideal materials for phasing entire chromosomes because the majority of the haplotypic allele combinations has been preserved. Therefore, compared with the current diploid-based phasing methods, using haploid genomic data of single gametes may substantially reduce the complexity in inferring the donor’s chromosomal haplotypes. In this study, we developed the first easy-to-use R package, Hapi, for inferring chromosome-length haplotypes of individual diploid genomes with only a few gametes. Hapi outperformed other phasing methods when analyzing both simulated and real single gamete cell sequencing data sets. The results also suggested that chromosome-scale haplotypes may be inferred by using as few as three gametes, which has pushed the boundary to its possible limit. The single gamete cell sequencing technology allied with the cost-effective Hapi method will make large-scale haplotype-based genetic studies feasible and affordable, promoting the use of haplotype data in a wide range of research.


2020 ◽  
Author(s):  
Marshall W. Ritchie ◽  
Jeff W. Dawson ◽  
Heath A. MacMillan

AbstractThe body temperature of ectothermic animals is heavily dependent on environmental temperature, impacting fitness. Laboratory exposure to favorable and unfavorable temperatures is used to understand these effects, as well as the physiological, biochemical, and molecular underpinnings of variation in thermal performance. Although small ectotherms, like insects, can often be easily reared in large numbers, it can be challenging and expensive to simultaneously create and manipulate several thermal environments in a laboratory setting. Here, we describe the creation and use of a thermal gradient device that can produce a wide range of constant or varying temperatures concurrently. This device is composed of a solid aluminum plate and copper piping, combined with a pair of programmable refrigerated circulators. As a simple proof-of-concept, we completed single experimental runs to produce a low-temperature survival curve for flies (Drosophila melanogaster) and explore the effects of daily thermal cycles of varying amplitude on growth rates of crickets (Gryllodes sigillatus). This approach avoids the use of multiple heating/cooling water or glycol baths or incubators for large-scale assessments of organismal thermal performance. It makes static or dynamic thermal experiments (e.g., creating a thermal performance or survival curves, quantifying responses to fluctuating thermal environments, or monitoring animal behaviour across a range of temperatures) easier, faster, and less costly.


2020 ◽  
Vol 219 (11) ◽  
Author(s):  
Claudia Baumann ◽  
Xiangyu Zhang ◽  
Rabindranath De La Fuente

The polycomb group protein CBX2 is an important epigenetic reader involved in cell proliferation and differentiation. While CBX2 overexpression occurs in a wide range of human tumors, targeted deletion results in homeotic transformation, proliferative defects, and premature senescence. However, its cellular function(s) and whether it plays a role in maintenance of genome stability remain to be determined. Here, we demonstrate that loss of CBX2 in mouse fibroblasts induces abnormal large-scale chromatin structure and chromosome instability. Integrative transcriptome analysis and ATAC-seq revealed a significant dysregulation of transcripts involved in DNA repair, chromocenter formation, and tumorigenesis in addition to changes in chromatin accessibility of genes involved in lateral sclerosis, basal transcription factors, and folate metabolism. Notably, Cbx2−/− cells exhibit prominent decondensation of satellite DNA sequences at metaphase and increased sister chromatid recombination events leading to rampant chromosome instability. The presence of extensive centromere and telomere defects suggests a prominent role for CBX2 in heterochromatin homeostasis and the regulation of nuclear architecture.


2015 ◽  
pp. 161-175
Author(s):  
Marek Maziarz ◽  
Maciej Piasecki ◽  
Stan Szpakowicz

The System of Register Labels in plWordNetStylistic registers influence word usage. Both traditional dictionaries and wordnets assign lexical units to registers, and there is a wide range of solutions. A system of register labels can be flat or hierarchical, with few labels or many, homogeneous or decomposed into sets of elementary features. We review the register label systems in lexicography, and then discuss our model, designed for plWordNet, a large wordnet for Polish. There follows a detailed comparative analysis of several register systems in Polish lexical resources. We also present the practical effect of the adoption of our flat, small and homogeneous system: a relatively high consistency of register assignment in plWordNet, as measured by inter-annotator agreement on a manageable sample. Large-scale conclusions for the whole plWordNet remain to be made once the annotation has been completed, but the experience half-way through this labour-intensive exercise is very encouraging.


2021 ◽  
pp. 1-18
Author(s):  
Andrew Cardow ◽  
Jean-Sebastien Imbeau ◽  
Bill Willie Apiata ◽  
Jenny Martin

Abstract Transition from the military environment into a civilian environment is a topic that has seen increasing attention within the last two decades. There is, in the literature, a clearly articulated issue that transition from the military to the civilian world is somewhat different to transitioning from school to work, or from career to career, or from work to retirement. Many, but not all, of the extant examples regarding military transition are case studies, focus groups or small-scale qualitative surveys. The following article details a large-scale survey that took place in New Zealand in 2019. From just over 1400 responses, a wide range of information was gathered. The aim of the survey was to uncover the experiences of military who had undergone transition within New Zealand. In this respect, the survey was exploratory. We report here the qualitative results that expand the existing body of knowledge of military transition. Our results are in line with international results and demonstrate that a large majority of respondents had a less than desirable transition experience. The contribution made therefore is a reinforcement that current practice in this area is needing a great deal of attention. The following outlines the experiences our New Zealand-based respondents had and how this mirrors the extant international literature. As this was the first survey of its kind to attract large numbers of respondents within New Zealand, the results and discussion that follow present aspects of transition that the Ministry of Defence and the New Zealand Defence Force may wish to consider when planning future transition programmes.


Sign in / Sign up

Export Citation Format

Share Document