scholarly journals Open Reading Frame Phylogenetic Analysis on the Cloud

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Che-Lun Hung ◽  
Chun-Yuan Lin

Phylogenetic analysis has become essential in researching the evolutionary relationships between viruses. These relationships are depicted on phylogenetic trees, in which viruses are grouped based on sequence similarity. Viral evolutionary relationships are identified from open reading frames rather than from complete sequences. Recently, cloud computing has become popular for developing internet-based bioinformatics tools. Biocloud is an efficient, scalable, and robust bioinformatics computing service. In this paper, we propose a cloud-based open reading frame phylogenetic analysis service. The proposed service integrates the Hadoop framework, virtualization technology, and phylogenetic analysis methods to provide a high-availability, large-scale bioservice. In a case study, we analyze the phylogenetic relationships amongNorovirus. Evolutionary relationships are elucidated by aligning different open reading frame sequences. The proposed platform correctly identifies the evolutionary relationships between members ofNorovirus.

2002 ◽  
Vol 83 (9) ◽  
pp. 2303-2307 ◽  
Author(s):  
Masanori Terai ◽  
Robert D. Burk

We have characterized the complete genome (8300 bp) of an isolate of Felis domesticus papillomavirus (FdPV) from a domestic cat with cutaneous papillomatosis. A BLAST homology search using the nucleotide sequence of the L1 open reading frame demonstrated that the FdPV genome was most closely related to canine oral papillomavirus (COPV). A 384 bp non-coding region (NCR) was found between the end of L1 and the beginning of E6, and a 1·3 kbp NCR was located between the end of E2 and the beginning of L2. Phylogenetic analysis placed FdPV in the E3 clade with COPV. Both viruses contain the atypical second NCR, which has no homology with sequences in existing databases.


Genome ◽  
2018 ◽  
Vol 61 (4) ◽  
pp. 254-265 ◽  
Author(s):  
Joshua B. Gross ◽  
James Weagley ◽  
Bethany A. Stahl ◽  
Li Ma ◽  
Luis Espinasa ◽  
...  

In this study, we report evidence of a novel duplication of Melanocortin receptor 1 (Mc1r) in the cavefish genome. This locus was discovered following the observation of excessive allelic diversity in a ∼820 bp fragment of Mc1r amplified via degenerate PCR from a natural population of Astyanax aeneus fish from Guerrero, Mexico. The cavefish genome reveals the presence of two closely related Mc1r open reading frames separated by a 1.46 kb intergenic region. One open reading frame corresponds to the previously reported Mc1r receptor, and the other open reading frame (duplicate copy) is 975 bp in length, encoding a receptor of 325 amino acids. Sequence similarity analyses position both copies in the syntenic region of the single Mc1r locus in 16 representative craniate genomes spanning bony fish (including Astyanax) to mammals, suggesting we discovered tandem duplicates of this important gene. The two Mc1r copies share ∼89% sequence similarity and, within Astyanax, are more similar to one another compared to other melanocortin family members. Future studies will inform the precise functional significance of the duplicated Mc1r locus and if this novel copy number variant may have adaptive significance for the Astyanax lineage.


Genes ◽  
2019 ◽  
Vol 10 (7) ◽  
pp. 490 ◽  
Author(s):  
Sharma ◽  
Gupta

The class Hematozoa encompasses several clinically important genera, including Plasmodium, whose members cause the major life-threating disease malaria. Hence, a good understanding of the interrelationships of organisms from this class and reliable means for distinguishing them are of much importance. This study reports comprehensive phylogenetic and comparative analyses on protein sequences on the genomes of 28 hematozoa species to understand their interrelationships. In addition to phylogenetic trees based on two large datasets of protein sequences, detailed comparative analyses were carried out on the genomes of hematozoa species to identify novel molecular synapomorphies consisting of conserved signature indels (CSIs) in protein sequences. These studies have identified 79 CSIs that are exclusively present in specific groups of Hematozoa/Plasmodium species, also supported by phylogenetic analysis, providing reliable means for the identification of these species groups and understanding their interrelationships. Of these CSIs, six CSIs are specifically shared by all hematozoa species, two CSIs serve to distinguish members of the order Piroplasmida, five CSIs are uniquely found in all Piroplasmida species except B. microti and two CSIs are specific for the genus Theileria. Additionally, we also describe 23 CSIs that are exclusively present in all genome-sequenced Plasmodium species and two, nine, ten and eight CSIs which are specific for members of the Plasmodium subgenera Haemamoeba, Laverania, Vinckeia and Plasmodium (excluding P. ovale and P. malariae), respectively. Additionally, our work has identified several CSIs that support species relationships which are not evident from phylogenetic analysis. Of these CSIs, one CSI supports the ancestral nature of the avian-Plasmodium species in comparison to the mammalian-infecting groups of Plasmodium species, four CSIs strongly support a specific relationship of species between the subgenera Plasmodium and Vinckeia and three CSIs each that reliably group P. malariae with members of the subgenus Plasmodium and P. ovale within the subgenus Vinckeia, respectively. These results provide a reliable framework for understanding the evolutionary relationships among the Plasmodium/Piroplasmida species. Further, in view of the exclusivity of the described molecular markers for the indicated groups of hematozoa species, particularly large numbers of unique characteristics that are specific for all Plasmodium species, they provide important molecular tools for biochemical/genetic studies and for developing novel diagnostics and therapeutics for these organisms.


2009 ◽  
Vol 77 (4) ◽  
pp. 1389-1396 ◽  
Author(s):  
Carolyn Marion ◽  
Dominique H. Limoli ◽  
Gregory S. Bobulsky ◽  
Jessica L. Abraham ◽  
Amanda M. Burnaugh ◽  
...  

ABSTRACT Colonization of the airway by Streptococcus pneumoniae is typically asymptomatic; however, progression of bacteria beyond the oronasopharynx can cause diseases including otitis media and pneumonia. The mechanisms by which S. pneumoniae establishes and maintains colonization remain poorly understood. Both N-linked and O-linked glycans are abundant in the airway. Our previous research demonstrated that S. pneumoniae can sequentially deglycosylate N-linked glycans and suggested that this modification of sugar structures may aid in colonization. There is published evidence that S. pneumoniae expresses a secreted O-glycosidase that cleaves galactose β1-3 N-acetylgalactosamine (Galβ1-3GalNAc) from core-1 O-linked glycans; however, the biological function of this enzyme has not previously been determined. We established that the activity is not secreted but is instead surface associated in a sortase-dependent manner. Genome analysis revealed an open reading frame predicted to encode a sortase-dependent surface protein with sequence similarity to the O-glycosidase of Bifidobacterium longum. Deletion of this pneumococcal open reading frame confirmed that this gene encodes an O-glycosidase. Experiments using a model glycoconjugate demonstrated that this O-glycosidase, together with the neuraminidase NanA, is required for S. pneumoniae to cleave sialylated core-1 O-linked glycans. The ability of the O-glycosidase mutant to cleave this glycan structure was restored by both genetic complementation and the addition of O-glycosidase. The mutant showed a reduction in adherence to human airway epithelial cells and a significantly decreased ability to colonize the upper respiratory tract, suggesting that cleavage of core-1 O-linked glycans enhances the ability of S. pneumoniae to colonize the human airway.


1999 ◽  
Vol 10 (04) ◽  
pp. 635-643 ◽  
Author(s):  
AGNIESZKA GIERLIK ◽  
PAWEŁ MACKIEWICZ ◽  
MARIA KOWALCZUK ◽  
STANISŁAW CEBRAT ◽  
MIROSŁAW R. DUDEK

Coding sequences of DNA generate Open Reading Frames (ORFs) inside them with much higher frequency than random DNA sequences do, especially in the antisense strand. This is a specific feature of the genetic code. Since coding sequences are selected for their length, the generated ORFs are indirect results of this selection and their length is also influenced by selection. That is why ORFs found in any genome, even much longer ones than those spontaneously generated in random DNA sequences, should be considered as two different sets of ORFs: The first one coding for proteins, the second one generated by the coding ORFs. Even intergenic sequences possess greater capacity for generating ORFs than random DNA sequences of the same nucleotide composition, which seems to be a premise that intergenic sequences were generated from coding sequences by recombinational mechanisms.


2004 ◽  
Vol 78 (21) ◽  
pp. 11544-11550 ◽  
Author(s):  
Paul Kraft ◽  
Andrea Oeckinghaus ◽  
Daniel Kümmel ◽  
George H. Gauss ◽  
John Gilmore ◽  
...  

ABSTRACT Sulfolobus spindle-shaped viruses (SSVs), or Fuselloviridae, are ubiquitous crenarchaeal viruses found in high-temperature acidic hot springs around the world (pH ≤4.0; temperature of ≥70°C). Because they are relatively easy to isolate, they represent the best studied of the crenarchaeal viruses. This is particularly true for the type virus, SSV1, which contains a double-stranded DNA genome of 15.5 kilobases, encoding 34 putative open reading frames. Interestingly, the genome shows little sequence similarity to organisms other than its SSV homologues. Together, sequence similarity and biochemical analyses have suggested functions for only 6 of the 34 open reading frames. Thus, even though SSV1 is the best-studied crenarchaeal virus, functions for most (28) of its open reading frames remain unknown. We have undertaken biochemical and structural studies for the gene product of open reading frame F-93. We find that F-93 exists as a homodimer in solution and that a tight dimer is also present in the 2.7-Å crystal structure. Further, the crystal structure reveals a fold that is homologous to the SlyA and MarR subfamilies of winged-helix DNA binding proteins. This strongly suggests that F-93 functions as a transcription factor that recognizes a (pseudo-)palindromic DNA target sequence.


2008 ◽  
Vol 82 (17) ◽  
pp. 8917-8921 ◽  
Author(s):  
Christopher J. McCormick ◽  
Omar Salim ◽  
Paul R. Lambden ◽  
Ian N. Clarke

ABSTRACT A generally accepted view of norovirus replication is that capsid expression requires production of a subgenomic transcript, the presence of capsid often being used as a surrogate marker to indicate the occurrence of viral replication. Using a polymerase II-based baculovirus delivery system, we observed capsid expression following introduction of a full-length genogroup 3 norovirus genome into HepG2 cells. However, capsid expression occurred as a result of a novel translation termination/reinitiation event between the nonstructural-protein and capsid open reading frames, a feature that may be unique to genogroup 3 noroviruses.


2001 ◽  
Vol 11 (10) ◽  
pp. 1632-1640
Author(s):  
Hedi Hegyi ◽  
Mark Gerstein

Annotation transfer is a principal process in genome annotation. It involves “transferring” structural and functional annotation to uncharacterized open reading frames (ORFs) in a newly completed genome from experimentally characterized proteins similar in sequence. To prevent errors in genome annotation, it is important that this process be robust and statistically well-characterized, especially with regard to how it depends on the degree of sequence similarity. Previously, we and others have analyzed annotation transfer in single-domain proteins. Multi-domain proteins, which make up the bulk of the ORFs in eukaryotic genomes, present more complex issues in functional conservation. Here we present a large-scale survey of annotation transfer in these proteins, using scop superfamilies to define domain folds and a thesaurus based on SWISS-PROT keywords to define functional categories. Our survey reveals that multi-domain proteins have significantly less functional conservation than single-domain ones, except when they share the exact same combination of domain folds. In particular, we find that for multi-domain proteins, approximate function can be accurately transferred with only 35% certainty for pairs of proteins sharing one structural superfamily. In contrast, this value is 67% for pairs of single-domain proteins sharing the same structural superfamily. On the other hand, if two multi-domain proteins contain the same combination of two structural superfamilies the probability of their sharing the same function increases to 80% in the case of complete coverage along the full length of both proteins, this value increases further to > 90%. Moreover, we found that only 70 of the current total of 455 structural superfamilies are found in both single and multi-domain proteins and only 14 of these were associated with the same function in both categories of proteins. We also investigated the degree to which function could be transferred between pairs of multi-domain proteins with respect to the degree of sequence similarity between them, finding that functional divergence at a given amount of sequence similarity is always about two-fold greater for pairs of multi-domain proteins (sharing similarity over a single domain) in comparison to pairs of single-domain ones, though the overall shape of the relationship is quite similar. Further information is available athttp://partslist.org/func orhttp://bioinfo.mbb.yale.edu/partslist/func.


1998 ◽  
Vol 180 (23) ◽  
pp. 6332-6337 ◽  
Author(s):  
Steven H. Schwartz ◽  
Todd A. Black ◽  
Karin Jäger ◽  
Jean-Michel Panoff ◽  
C. Peter Wolk

ABSTRACT Salt-induced genes in the cyanobacterium Anabaena sp. strain PCC 7120 were identified by use of a Tn5-based transposon bearing luxAB as a reporter. The genomic sequence adjacent to one site of insertion of the transposon was identical in part to the sequence of thelti2 gene, which was previously identified in a differential screen for cold-induced transcripts in Anabaena variabilis. The lti2-like gene was induced by sucrose and other osmotica and by low temperature, in addition to salt. Regulatory components necessary for the induction of this gene by osmotica were sought by a further round of transposon mutagenesis. One mutant that displayed reduced transcriptional activity of thelti2-like gene in response to exposure to osmotica had an insertion in an open reading frame, which was denoted orrA, whose predicted product showed sequence similarity to response regulators from two-component regulatory systems. The corresponding mutation was reconstructed and was shown, like the second-site transposon mutation, to result in reduced response to osmotic stress. Induction of the lux reporter gene by osmotica was restored by complementation with a genomic fragment containing the entire open reading frame for the presumptive response regulator, whereas a fragment containing a truncated copy of the open reading frame for the response regulator did not complement the mutation.


Sign in / Sign up

Export Citation Format

Share Document