scholarly journals A Hybrid Technique for the Periodicity Characterization of Genomic Sequence Data

2009 ◽  
Vol 2009 (1) ◽  
pp. 924601 ◽  
Author(s):  
Julien Epps
Data in Brief ◽  
2021 ◽  
Vol 34 ◽  
pp. 106577
Author(s):  
Eri Ogiso-Tanaka ◽  
Nobuhiko Oki ◽  
Tsuyoshi Tanaka ◽  
Takehiko Shimizu ◽  
Masao Ishimoto ◽  
...  

1992 ◽  
Vol 287 (1) ◽  
pp. 291-297 ◽  
Author(s):  
K F Nolan ◽  
S Kaluz ◽  
J M G Higgins ◽  
D Goundis ◽  
K B M Reid

A cosmid clone containing the complete coding sequence of the human properdin gene has been characterized. The gene is located at one end of the approximately 40 kb cosmid insert and approximately 8.2 kb of the sequence data have been obtained from this region. Two discrepancies with the published cDNA sequence [Nolan, Schwaeble, Kaluz, Dierich & Reid (1991) Eur. J. Immunol. 21, 771-776] have been resolved. Properdin has previously been described as a modular protein, with the majority of its sequence composed of six tandem repeats of a sequence motif of approximately 60 amino acids which is related to the type-I repeat sequence (TSR), initially described in thrombospondin [Lawler & Hynes (1986) J. Cell Biol. 103, 1635-1648; Goundis & Reid (1988), Nature (London) 335, 82-85]. Analysis of the genomic sequence data indicates that the human properdin gene is organized into ten exons which span approximately 6 kb of the genome. TSRs 2-5 are coded for by discrete, symmetrical exons (phase 1-1), which supports the hypothesis that modular proteins evolved by a process involving exon shuffling. TSR1 is also coded for by a discrete exon, but the boundaries are asymmetrical (phase 2-1). The sequence coding for the sixth TSR is split across the final two exons of the gene with the first 38 amino acids of the repeat coded for by an asymmetric exon (phase 1-2). This split at the genomic level has been shown, by alignment analysis, to be reflected at the protein level with the division of repeat 6 into TSR-like and TSR-unlike sequences.


2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Yu Bai ◽  
Yuki Iwasaki ◽  
Shigehiko Kanaya ◽  
Yue Zhao ◽  
Toshimichi Ikemura

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).


2020 ◽  
Vol 15 ◽  
Author(s):  
Affan Alim ◽  
Abdul Rafay ◽  
Imran Naseem

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.


Genes ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1212
Author(s):  
J. Spencer Johnston ◽  
Carl E. Hjelmen

Next-generation sequencing provides a nearly complete genomic sequence for model and non-model species alike; however, this wealth of sequence data includes no road map [...]


2014 ◽  
Vol 64 (Pt_2) ◽  
pp. 316-324 ◽  
Author(s):  
Jongsik Chun ◽  
Fred A. Rainey

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.


Sign in / Sign up

Export Citation Format

Share Document