A Hybrid Technique for the Periodicity Characterization of Genomic Sequence Data

Julien Epps

doi:10.1155/2009/924601

Molecular cloning and characterization of four scorpion K+-toxin-like peptides: A new subfamily of venom peptides (α-KTx14) and genomic analysis of a member***The nucleotide sequence data reported in this paper have been submitted to the EMBL Nucleotide Sequence Database under the accession numbers: AJ277726 (BmKK1); AJ277727 (BmKK2); AJ277728 (BmKK3); AJ277729 (BmKK4); and AJ277730 (genomic sequence of BmKK2).

Biochimie ◽

10.1016/s0300-9084(01)01326-8 ◽

2001 ◽

Vol 83 (9) ◽

pp. 883-889 ◽

Cited By ~ 31

Author(s):

Xian-Chun Zeng ◽

Fang Peng ◽

Feng Luo ◽

Shun-Yi Zhu ◽

Hui Liu ◽

...

Keyword(s):

Nucleotide Sequence ◽

Molecular Cloning ◽

Genomic Sequence ◽

Sequence Data ◽

Genomic Analysis ◽

Sequence Database ◽

Nucleotide Sequence Data ◽

Embl Nucleotide Sequence Database ◽

Nucleotide Sequence Database

Download Full-text

Characterization of the genomic sequence data around common cutworm resistance genes in soybean (Glycine max) using short- and long-read sequencing methods

Data in Brief ◽

10.1016/j.dib.2020.106577 ◽

2021 ◽

Vol 34 ◽

pp. 106577

Author(s):

Eri Ogiso-Tanaka ◽

Nobuhiko Oki ◽

Tsuyoshi Tanaka ◽

Takehiko Shimizu ◽

Masao Ishimoto ◽

...

Keyword(s):

Glycine Max ◽

Resistance Genes ◽

Genomic Sequence ◽

Sequence Data ◽

Common Cutworm ◽

Long Read

Download Full-text

Characterization of the human properdin gene

Biochemical Journal ◽

10.1042/bj2870291 ◽

1992 ◽

Vol 287 (1) ◽

pp. 291-297 ◽

Cited By ~ 36

Author(s):

K F Nolan ◽

S Kaluz ◽

J M G Higgins ◽

D Goundis ◽

K B M Reid

Keyword(s):

Amino Acids ◽

Tandem Repeats ◽

Genomic Sequence ◽

Sequence Data ◽

Phase 1 ◽

Repeat Sequence ◽

Sequence Motif ◽

Type I ◽

Alignment Analysis

A cosmid clone containing the complete coding sequence of the human properdin gene has been characterized. The gene is located at one end of the approximately 40 kb cosmid insert and approximately 8.2 kb of the sequence data have been obtained from this region. Two discrepancies with the published cDNA sequence [Nolan, Schwaeble, Kaluz, Dierich & Reid (1991) Eur. J. Immunol. 21, 771-776] have been resolved. Properdin has previously been described as a modular protein, with the majority of its sequence composed of six tandem repeats of a sequence motif of approximately 60 amino acids which is related to the type-I repeat sequence (TSR), initially described in thrombospondin [Lawler & Hynes (1986) J. Cell Biol. 103, 1635-1648; Goundis & Reid (1988), Nature (London) 335, 82-85]. Analysis of the genomic sequence data indicates that the human properdin gene is organized into ten exons which span approximately 6 kb of the genome. TSRs 2-5 are coded for by discrete, symmetrical exons (phase 1-1), which supports the hypothesis that modular proteins evolved by a process involving exon shuffling. TSR1 is also coded for by a discrete exon, but the boundaries are asymmetrical (phase 2-1). The sequence coding for the sixth TSR is split across the final two exons of the gene with the first 38 amino acids of the repeat coded for by an asymmetric exon (phase 1-2). This split at the genomic level has been shown, by alignment analysis, to be reflected at the protein level with the division of repeat 6 into TSR-like and TSR-unlike sequences.

Download Full-text

A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data

BioMed Research International ◽

10.1155/2014/765648 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11

Author(s):

Yu Bai ◽

Yuki Iwasaki ◽

Shigehiko Kanaya ◽

Yue Zhao ◽

Toshimichi Ikemura

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Self Organizing Map ◽

Genome Signature ◽

A Genome ◽

Wide Range ◽

Oligonucleotide Composition ◽

Species Specific

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a “genome signature,” and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

Download Full-text

Faculty Opinions recommendation of A likelihood ratio test of speciation with gene flow using genomic sequence data.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.3540959.3240060 ◽

2010 ◽

Author(s):

Nicolas Galtier ◽

Julien Dutheil

Keyword(s):

Gene Flow ◽

Likelihood Ratio ◽

Likelihood Ratio Test ◽

Genomic Sequence ◽

Sequence Data ◽

Ratio Test

Download Full-text

PoGB-Pred: Prediction of Antifreeze Proteins Sequences using Amino Acid Composition with Feature Selection followed by a Sequential based Ensemble Approach

Current Bioinformatics ◽

10.2174/1574893615999200707141926 ◽

2020 ◽

Vol 15 ◽

Author(s):

Affan Alim ◽

Abdul Rafay ◽

Imran Naseem

Keyword(s):

Amino Acid ◽

Dimension Reduction ◽

Protein Identification ◽

Cold Water ◽

Genomic Sequence ◽

Sequence Data ◽

Antifreeze Proteins ◽

Building Blocks ◽

Gradient Boosting ◽

Proposed Model

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.

Download Full-text

Special Issue: Genetic Basis of Phenotypic Variation in Drosophila and Other Insects

Genes ◽

10.3390/genes12081212 ◽

2021 ◽

Vol 12 (8) ◽

pp. 1212

Author(s):

J. Spencer Johnston ◽

Carl E. Hjelmen

Keyword(s):

Next Generation Sequencing ◽

Genetic Basis ◽

Genomic Sequence ◽

Sequence Data ◽

Complete Genomic Sequence ◽

Special Issue ◽

Model Species ◽

Road Map ◽

Generation Sequencing ◽

Complete Genomic

Next-generation sequencing provides a nearly complete genomic sequence for model and non-model species alike; however, this wealth of sequence data includes no road map [...]

Download Full-text

Characterization of an acetylcholine receptor gene of haemonchus contortus in relation to levamisole resistance1Note: Nucleotide sequence data reported in this paper are available in the GenBank™ database under accession No. U72490.1

Molecular and Biochemical Parasitology ◽

10.1016/s0166-6851(96)02793-4 ◽

1997 ◽

Vol 84 (2) ◽

pp. 179-187 ◽

Cited By ~ 39

Author(s):

Ruurdtje Hoekstra ◽

Allerdien Visser ◽

Lisa J Wiley ◽

Anthony S Weiss ◽

Nicholas C Sangster ◽

...

Keyword(s):

Nucleotide Sequence ◽

Acetylcholine Receptor ◽

Haemonchus Contortus ◽

Sequence Data ◽

Receptor Gene ◽

Nucleotide Sequence Data ◽

Acetylcholine Receptor Gene

Download Full-text

Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

BMC Bioinformatics ◽

10.1186/1471-2105-9-235 ◽

2008 ◽

Vol 9 (1) ◽

pp. 235 ◽

Cited By ~ 22

Author(s):

Jeremy D DeBarry ◽

Renyi Liu ◽

Jeffrey L Bennetzen

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Repeat Family

Download Full-text

Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea

INTERNATIONAL JOURNAL OF SYSTEMATIC AND EVOLUTIONARY MICROBIOLOGY ◽

10.1099/ijs.0.054171-0 ◽

2014 ◽

Vol 64 (Pt_2) ◽

pp. 316-324 ◽

Cited By ~ 258

Author(s):

Jongsik Chun ◽

Fred A. Rainey

Keyword(s):

Genomic Sequence ◽

Sequence Data ◽

Original Research ◽

Rrna Gene ◽

New Taxon ◽

Genome Sequences ◽

Microbial World ◽

Content Type ◽

Link Type ◽

Type Strains

The polyphasic approach used today in the taxonomy and systematics of the Bacteria and Archaea includes the use of phenotypic, chemotaxonomic and genotypic data. The use of 16S rRNA gene sequence data has revolutionized our understanding of the microbial world and led to a rapid increase in the number of descriptions of novel taxa, especially at the species level. It has allowed in many cases for the demarcation of taxa into distinct species, but its limitations in a number of groups have resulted in the continued use of DNA–DNA hybridization. As technology has improved, next-generation sequencing (NGS) has provided a rapid and cost-effective approach to obtaining whole-genome sequences of microbial strains. Although some 12 000 bacterial or archaeal genome sequences are available for comparison, only 1725 of these are of actual type strains, limiting the use of genomic data in comparative taxonomic studies when there are nearly 11 000 type strains. Efforts to obtain complete genome sequences of all type strains are critical to the future of microbial systematics. The incorporation of genomics into the taxonomy and systematics of the Bacteria and Archaea coupled with computational advances will boost the credibility of taxonomy in the genomic era. This special issue of International Journal of Systematic and Evolutionary Microbiology contains both original research and review articles covering the use of genomic sequence data in microbial taxonomy and systematics. It includes contributions on specific taxa as well as outlines of approaches for incorporating genomics into new strain isolation to new taxon description workflows.

Download Full-text