scholarly journals RecBlast: Cloud-Based Large Scale Orthology Detection

2017 ◽  
Author(s):  
Efrat Rapoport ◽  
Moran Neuhof

AbstractBackgroundThe effective detection and comparison of orthologues is crucial for answering many questions in comparative genomics, phylogenetics and evolutionary biology. One of the most common methods for discovering orthologues is widely known as ‘Reciprocal Blast’. While this method is simple when comparing only two genomes, performing a large-scale comparison of Multiple Genes across Multiple Taxa becomes a labor-intensive and inefficient task. The low efficiency of this complicated process limits the scope and breadth of questions that would otherwise benefit from this powerful method.FindingsHere we present RecBlast, an intuitive and easy-to-use pipeline that enables fast and easy discovery of orthologues along and across the evolutionary tree. RecBlast is capable of running heavy, large-scale and complex Reciprocal Blast comparisons across multiple genes and multiple taxa, in a completely automatic way. RecBlast is available as a cloud-based web server, which includes an easy-to-use user interface, implemented using cloud computing and an elastic and scalable server architecture. RecBlast is also available as a powerful standalone software supporting multi-processing for large datasets, and a cloud image which can be easily deployed on Amazon Web Services cloud. We also include sample results spanning 448 human genes, which illustrate the potential of RecBlast in detecting orthologues and in highlighting patterns and trends across multiple taxa.ConclusionsRecBlast provides a fast, inexpensive and valuable insight into trends and phenomena across distance phyla, and provides data, visualizations and directions for downstream analysis. RecBlast's fully automatic pipeline provides a new and intuitive discovery platform for researchers from any domain in biology who are interested in evolution, comparative genomics and phylogenetics, regardless of their computational skills.

2019 ◽  
Author(s):  
Yatish Turakhia ◽  
Heidi I. Chen ◽  
Amir Marcovitz ◽  
Gill Bejerano

Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools and protein databases focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (deletion and non-synonymous substitution) as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence protein-coding gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using the hg38 human assembly as a reference, we discovered over 500 unique human genes affected by such high-confidence erosion events in different clades across 58 mammals. While most of these events likely have benign consequences, we also found dozens of clade-specific gene losses that result in early lethality in outgroup mammals or are associated with severe congenital diseases in humans. Our discoveries yield intriguing potential for translational medical genetics and for evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.


2017 ◽  
Author(s):  
Dang Liu ◽  
Martin Hunt ◽  
Isheng. J. Tsai

AbstractIdentification of synteny between genomes of closely related species is an important aspect of comparative genomics. However, it is unknown to what extent draft assemblies lead to errors in such analysis. To investigate this, we fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests throws into question the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, determining the true evolutionary relationship is compromised by assembly improvement using a reference guided approach with a closely related species. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. Our results highlight the need for gold standard genome assemblies for synteny identification and accurate downstream analysis.Author summaryGenome assemblies across all domains of life are currently produced routinely. Initial analysis of any new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. Here, we systematically evaluate this interplay by inferring synteny in genome assemblies with different degrees of contiguation. As expected, our investigation reveals that assembly quality can drastically affect synteny analysis, from the initial synteny identification to downstream analysis. Importantly, we found that improving a fragmented assembly using synteny with the genome of a related species can be dangerous, as this a priori assumes a potentially false evolutionary relationship between the species. The results presented here re-emphasize the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology.


2020 ◽  
Author(s):  
Nicole S. Torosin ◽  
Joshua Ward ◽  
Adrian V. Bell ◽  
Leslie A. Knapp

AbstractKin recognition is essential to the evolution of human cooperation, social organization, and altruistic behavior. However, the genetic underpinnings of kin recognition have been largely understudied. Facial resemblance is an important relatedness cue for humans and more closely related individuals are generally thought to share greater facial similarity. To evaluate the relationship between perceived self-resemblance and genetic similarity among biologically related and unrelated females, we administered facial self-recognition surveys to twenty-three sets of related females and genotyped three different genetic systems, human leukocyte antigens (HLA), neutral nuclear microsatellites and mitochondrial haplogroups, for each individual. Using these data, we examined the relationship between visual kin recognition and genetic similarity. We found that pairs of individuals identified as visually more similar had greater HLA allelic sharing when compared to less facially similar participants. We did not find the same relationship for microsatellite and mitochondrial similarity, suggesting that HLA allelic similarity increases the probability of perceived self-resemblance in humans while other genetic markers do not. Our results demonstrate that some genetic markers, such as HLA-DRB, may have significant influence on phenotype and that large scale surveys of HLA and facial feature morphology will yield valuable insight into the evolutionary biology of genotype-phenotype relationships and kin recognition.


Author(s):  
M. E. J. Newman ◽  
R. G. Palmer

Developed after a meeting at the Santa Fe Institute on extinction modeling, this book comments critically on the various modeling approaches. In the last decade or so, scientists have started to examine a new approach to the patterns of evolution and extinction in the fossil record. This approach may be called "statistical paleontology," since it looks at large-scale patterns in the record and attempts to understand and model their average statistical features, rather than their detailed structure. Examples of the patterns these studies examine are the distribution of the sizes of mass extinction events over time, the distribution of species lifetimes, or the apparent increase in the number of species alive over the last half a billion years. In attempting to model these patterns, researchers have drawn on ideas not only from paleontology, but from evolutionary biology, ecology, physics, and applied mathematics, including fitness landscapes, competitive exclusion, interaction matrices, and self-organized criticality. A self-contained review of work in this field.


Author(s):  
Anna Lavecchia ◽  
Matteo Chiara ◽  
Caterina De Virgilio ◽  
Caterina Manzari ◽  
Carlo Pazzani ◽  
...  

Abstract Staphylococcus cohnii (SC), a coagulase-negative bacterium, was first isolated in 1975 from human skin. Early phenotypic analyses led to the delineation of two subspecies (subsp.), Staphylococcus cohnii subsp. cohnii (SCC) and Staphylococcus cohnii subsp. urealyticus (SCU). SCC was considered to be specific to humans whereas SCU apparently demonstrated a wider host range, from lower primates to humans. The type strains ATCC 29974 and ATCC 49330 have been designated for SCC and SCU, respectively. Comparative analysis of 66 complete genome sequences—including a novel SC isolate—revealed unexpected patterns within the SC complex, both in terms of genomic sequence identity and gene content, highlighting the presence of 3 phylogenetically distinct groups. Based on our observations, and on the current guidelines for taxonomic classification for bacterial species, we propose a revision of the SC species complex. We suggest that SCC and SCU should be regarded as two distinct species: SC and SU (Staphylococcus urealyticus), and that two distinct subspecies, SCC and SCB (SC subsp. barensis, represented by the novel strain isolated in Bari) should be recognized within SC. Furthermore, since large scale comparative genomics studies recurrently suggest inconsistencies or conflicts in taxonomic assignments of bacterial species, we believe that the approach proposed here might be considered for more general application.


2017 ◽  
Vol 139 (5) ◽  
Author(s):  
Sara Benyakhlef ◽  
Ahmed Al Mers ◽  
Ossama Merroun ◽  
Abdelfattah Bouatem ◽  
Hamid Ajdad ◽  
...  

Reducing levelized electricity costs of concentrated solar power (CSP) plants can be of great potential in accelerating the market penetration of these sustainable technologies. Linear Fresnel reflectors (LFRs) are one of these CSP technologies that may potentially contribute to such cost reduction. However, due to very little previous research, LFRs are considered as a low efficiency technology. In this type of solar collectors, there is a variety of design approaches when it comes to optimizing such systems. The present paper aims to tackle a new research axis based on variability study of heliostat curvature as an approach for optimizing small and large-scale LFRs. Numerical investigations based on a ray tracing model have demonstrated that LFR constructors should adopt a uniform curvature for small-scale LFRs and a variable curvature per row for large-scale LFRs. Better optical performances were obtained for LFRs regarding these adopted curvature types. An optimization approach based on the use of uniform heliostat curvature for small-scale LFRs has led to a system cost reduction by means of reducing its receiver surface and height.


2007 ◽  
Vol 283 (3) ◽  
pp. 1229-1233 ◽  
Author(s):  
Claudia Ben-Dov ◽  
Britta Hartmann ◽  
Josefin Lundgren ◽  
Juan Valcárcel

Alternative splicing of mRNA precursors allows the synthesis of multiple mRNAs from a single primary transcript, significantly expanding the information content and regulatory possibilities of higher eukaryotic genomes. High-throughput enabling technologies, particularly large-scale sequencing and splicing-sensitive microarrays, are providing unprecedented opportunities to address key questions in this field. The picture emerging from these pioneering studies is that alternative splicing affects most human genes and a significant fraction of the genes in other multicellular organisms, with the potential to greatly influence the evolution of complex genomes. A combinatorial code of regulatory signals and factors can deploy physiologically coherent programs of alternative splicing that are distinct from those regulated at other steps of gene expression. Pre-mRNA splicing and its regulation play important roles in human pathologies, and genome-wide analyses in this area are paving the way for improved diagnostic tools and for the identification of novel and more specific pharmaceutical targets.


2001 ◽  
Vol 2 (4) ◽  
pp. 243-251
Author(s):  
Jo Wixon

We bring you a report from the CSHL Genome Sequencing and Biology Meeting, which has a long and prestigious history. This year there were sessions on large-scale sequencing and analysis, polymorphisms (covering discovery and technologies and mapping and analysis), comparative genomics of mammalian and model organism genomes, functional genomics and bioinformatics.


2015 ◽  
Vol 282 (1815) ◽  
pp. 20151421 ◽  
Author(s):  
Göran Arnqvist ◽  
Ahmed Sayadi ◽  
Elina Immonen ◽  
Cosima Hotzy ◽  
Daniel Rankin ◽  
...  

The ultimate cause of genome size (GS) evolution in eukaryotes remains a major and unresolved puzzle in evolutionary biology. Large-scale comparative studies have failed to find consistent correlations between GS and organismal properties, resulting in the ‘ C -value paradox’. Current hypotheses for the evolution of GS are based either on the balance between mutational events and drift or on natural selection acting upon standing genetic variation in GS. It is, however, currently very difficult to evaluate the role of selection because within-species studies that relate variation in life-history traits to variation in GS are very rare. Here, we report phylogenetic comparative analyses of GS evolution in seed beetles at two distinct taxonomic scales, which combines replicated estimation of GS with experimental assays of life-history traits and reproductive fitness. GS showed rapid and bidirectional evolution across species, but did not show correlated evolution with any of several indices of the relative importance of genetic drift. Within a single species, GS varied by 4–5% across populations and showed positive correlated evolution with independent estimates of male and female reproductive fitness. Collectively, the phylogenetic pattern of GS diversification across and within species in conjunction with the pattern of correlated evolution between GS and fitness provide novel support for the tenet that natural selection plays a key role in shaping GS evolution.


Author(s):  
Khe Foon Hew ◽  
Chen Qiao ◽  
Ying Tang

Although massive open online courses (MOOCs) have attracted much worldwide attention, scholars still understand little about the specific elements that students find engaging in these large open courses. This study offers a new original contribution by using a machine learning classifier to analyze 24,612 reflective sentences posted by 5,884 students, who participated in one or more of 18 highly rated MOOCs. Highly rated MOOCs were sampled because they exemplify good practices or teaching strategies. We selected highly rated MOOCs from Coursetalk, an open user-driven aggregator and discovery website that allows students to search and review various MOOCs. We defined a highly rated MOOC as a free online course that received an overall five-star course quality rating, and received at least 50 reviews from different learners within a specific subject area. We described six specific themes found across the entire data corpus: (a) structure and pace, (b) video, (c) instructor, (d) content and resources, (e) interaction and support, and (f) assignment and assessment. The findings of this study provide valuable insight into factors that students find engaging in large-scale open online courses.


Sign in / Sign up

Export Citation Format

Share Document