Genome-Wide Mining, Characterization and Development of miRNA-SSRs in Arabidopsis thaliana

Mapping Intimacies ◽

10.1101/203851 ◽

2017 ◽

Cited By ~ 4

Author(s):

Anuj Kumar ◽

Aditi Chauhan ◽

Mansi Sharma ◽

Sai Kumar Kompelli ◽

Vijay Gahlaut ◽

...

Keyword(s):

Arabidopsis Thaliana ◽

Dna Sequences ◽

Tandem Repeats ◽

Full Length ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Mirna Genes ◽

Genome Wide ◽

Varying Length

AbstractSimple Sequence Repeats (SSRs), also known as microsatellites are short tandem repeats of DNA sequences that are 1-6 bp long. In plants, SSRs serve as a source of important class of molecular markers because of their hypervariabile and co-dominant nature, making them useful both for the genetic studies and marker-assisted breeding. The SSRs are widespread throughout the genome of an organism, so that a large number of SSR datasets are available, most of them from either protein-coding regions or untranslated regions. It is only recently, that their occurrence within microRNAs (miRNA) genes has received attention. As is widely known, miRNA themselves are a class of non-coding RNAs (ncRNAs) with varying length of 19-22 nucleotides (nts), which play an important role in regulating gene expression in plants under different biotic and abiotic stresses. In this communication, we describe the results of a study, where miRNA-SSRs in full length pre-miRNA sequences of Arabidopsis thaliana were mined. The sequences were retrieved by annotations available at EnsemblPlants using BatchPrimer3 server with miRNA-SSR flanking primers found to be well distributed. Our analysis shows that miRNA-SSRs are relatively rare in protein-coding regions but abundant in non-coding region. All the observed 147 di-, tri-, tetra-, penta- and hexanucleotide SSRs were located in non-coding regions of all the 5 chromosomes of A. thaliana. While we confirm that miRNA-SSRs were commonly spread across the full length pre-miRNAs, we envisage that such studies would allow us to identify newly discovered markers for breeding studies.

Download Full-text

Structure and evolution of genes encoding polyubiquitin and ubiquitin-like proteins in Arabidopsis thaliana ecotype Columbia.

Genetics ◽

10.1093/genetics/139.2.921 ◽

1995 ◽

Vol 139 (2) ◽

pp. 921-939 ◽

Cited By ~ 5

Author(s):

J Callis ◽

T Carpenter ◽

C W Sun ◽

R D Vierstra

Keyword(s):

Arabidopsis Thaliana ◽

Amino Acid ◽

Tandem Repeats ◽

Synonymous Substitution ◽

Coding Region ◽

Coding Regions ◽

Isolation And Characterization ◽

Genes Encoding ◽

Polyubiquitin Gene ◽

Polyubiquitin Genes

Abstract The Arabidopsis thaliana ecotype Columbia ubiquitin gene family consists of 14 members that can be divided into three types of ubiquitin genes; polyubiquitin genes, ubiquitin-like genes and ubiquitin extension genes. The isolation and characterization of eight ubiquitin sequences, consisting of four polyubiquitin genes and four ubiquitin-like genes, are described here, and their relationships to each other and to previously identified Arabidopsis ubiquitin genes were analyzed. The polyubiquitin genes, UBQ3, UBQ10, UBQ11 and UBQ14, contain tandem repeats of the 228-bp ubiquitin coding region. Together with a previously described polyubiquitin gene, UBQ4, they differ in synonymous substitutions, number of ubiquitin coding regions, number and nature of nonubiquitin C-terminal amino acid(s) and chromosomal location, dividing into two subtypes; the UBQ3/UBQ4 and UBQ10/UBQ11/UBQ14 subtypes. Ubiquitin-like genes, UBQ7, UBQ8, UBQ9 and UBQ12, also contain tandem repeats of the ubiquitin coding region, but at least one repeat per gene encodes a protein with amino acid substitutions. Nucleotide comparisons, Ks value determinations and neighbor-joining analyses were employed to determine intra- and intergenic relationships. In general, the rate of synonymous substitution is too high to discern related repeats. Specific exceptions provide insight into gene relationships. The observed nucleotide relationships are consistent with previously described models involving gene duplications followed by both unequal crossing-over and gene conversion events.

Download Full-text

USING DIT-FFT ALGORITHM FOR IDENTIFICATION OF PROTEIN CODING REGION IN EUKARYOTIC GENE

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237219500029 ◽

2019 ◽

Vol 31 (01) ◽

pp. 1950002

Author(s):

Subhajit Kar ◽

Madhabi Ganguly ◽

Saptarshi Das

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Dna Sequences ◽

Digital Signal ◽

Biological Properties ◽

Frequency Noise ◽

Numerical Representation ◽

Coding Region ◽

Protein Coding ◽

Coding Regions

The new research platform on biomedical engineering by Digital Signal Processing (DSP) is playing a vital role in the prediction of protein coding regions (Exons) from genomic sequences with great accuracy. We can determine the protein coding area in DNA sequences with the help of period-3 property. It has been seen that in order to find out the period-3 property, the DFT algorithm is mostly used but in this paper, we have tested FFT algorithm instead of DFT algorithm. DSP is basically concerned with processing numerical sequences. When digital signal processing used in DNA sequences analysis, it requires conversion of base characters sequence to the numerical version. The numerical representation of DNA sequences strongly impacts the biological properties mirrored through the numerical genre. In this work, the proposed technique based on DIT-FFT algorithm has been used to identify the exonic area with the help of integer value representation for transforming the DNA sequences. Digital filters are used to read out period 3 components from the output spectrum and to eliminate the unwanted high frequency noise from DNA sequences. To overcome background noise means to suppress the non-coding regions, i.e., Introns. Proposed algorithm is tested on four nucleotide sequences having single or multiple numbers of exons.

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

Human DNA sequences homologous to a protein coding region conserved between homeotic genes of Drosophila

Cell ◽

10.1016/0092-8674(84)90261-7 ◽

1984 ◽

Vol 38 (3) ◽

pp. 667-673 ◽

Cited By ~ 121

Author(s):

Michael Levine ◽

Gerald M. Rubin ◽

Robert Tjian

Keyword(s):

Dna Sequences ◽

Homeotic Genes ◽

Coding Region ◽

Protein Coding ◽

Human Dna

Download Full-text

Structure and expression of canary myc family genes

Molecular and Cellular Biology ◽

10.1128/mcb.11.3.1770-1776.1991 ◽

1991 ◽

Vol 11 (3) ◽

pp. 1770-1776

Author(s):

R G Collum ◽

D F Clayton ◽

F W Alt

Keyword(s):

Untranslated Region ◽

Untranslated Regions ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Neuronal Precursors ◽

Myc Gene ◽

Mature Neurons

We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons.

Download Full-text

Identification of the principal promoter sequence of the c-H-ras transforming oncogene: deletion analysis of the 5'-flanking region by focus formation assay

Molecular and Cellular Biology ◽

10.1128/mcb.7.8.2933-2940.1987 ◽

1987 ◽

Vol 7 (8) ◽

pp. 2933-2940

Author(s):

H Honkawa ◽

W Masahashi ◽

S Hashimoto ◽

T Hashimoto-Gotoh

Keyword(s):

Dna Sequences ◽

Promoter Sequence ◽

Ras Oncogene ◽

Coding Region ◽

Focus Formation ◽

S1 Nuclease ◽

Protein Coding ◽

Consensus Sequences ◽

Flanking Region ◽

Virus C

A number of deletion mutants were isolated, including 5', 3', and internal deletions in the 5'-flanking region of the human cellular oncogene related to the Harvey sarcoma virus (c-H-ras), and their transforming activities were examined in NIH 3T3 cells. DNA sequences which could not be detected without losing transforming activity were localized to a relatively short stretch upstream of the region which showed homology to the 5'-flanking region of v-H-ras oncogene. S1 nuclease analysis indicated that there were two clusters of mRNA start sites at positions that were about 1,371 and 1,298 base pairs upstream of the first coding ATG. The minimum region required for promoter function was estimated to be a 51-base-pair-long (or less) DNA segment. The promoter was GC rich (78%) and did not contain the consensus sequences that are usually observed in PolII-directed promoters but contained a GC box within which one of the mRNA start sites was included. In addition, two sets of positive and negative elements seemed to be located between the promoter and the protein-coding region, which appeared to influence positively and negatively, respectively, the efficiency of transformation with the c-H-ras oncogene.

Download Full-text

Genome Size Estimation and Full-Length Transcriptome of Sphingonotus tsinlingensis: Genetic Background of a Drought-Adapted Grasshopper

Frontiers in Genetics ◽

10.3389/fgene.2021.678625 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lu Zhao ◽

Hang Wang ◽

Ping Li ◽

Kuo Sun ◽

De-Long Guan ◽

...

Keyword(s):

Genome Size ◽

Genetic Background ◽

Tandem Repeats ◽

Full Length ◽

Arid Environments ◽

Size Estimation ◽

Sequencing Data ◽

Protein Coding ◽

Grasshopper Species ◽

Hsp Genes

Sphingonotus Fieber, 1852 (Orthoptera: Acrididae), is a grasshopper genus comprising approximately 170 species, all of which prefer dry environments such as deserts, steppes, and stony benchlands. In this study, we aimed to examine the adaptation of grasshopper species to arid environments. The genome size of Sphingonotus tsinlingensis was estimated using flow cytometry, and the first high-quality full-length transcriptome of this species was produced. The genome size of S. tsinlingensis is approximately 12.8 Gb. Based on 146.98 Gb of PacBio sequencing data, 221.47 Mb full-length transcripts were assembled. Among these, 88,693 non-redundant isoforms were identified with an N50 value of 2,726 bp, which was markedly longer than previous grasshopper transcriptome assemblies. In total, 48,502 protein-coding sequences were identified, and 37,569 were annotated using public gene function databases. Moreover, 36,488 simple tandem repeats, 12,765 long non-coding RNAs, and 414 transcription factors were identified. According to gene functions, 61 cytochrome P450 (CYP450) and 66 heat shock protein (HSP) genes, which may be associated with drought adaptation of S. tsinlingensis, were identified. We compared the transcriptomes of S. tsinlingensis and two other grasshopper species which were less tolerant to drought, namely Mongolotettix japonicus and Gomphocerus licenti. We observed the expression of CYP450 and HSP genes in S. tsinlingensis were higher. We produced the first full-length transcriptome of a Sphingonotus species that has an ultra-large genome. The assembly characteristics were better than those of all known grasshopper transcriptomes. This full-length transcriptome may thus be used to understand the genetic background and evolution of grasshoppers.

Download Full-text

A porcine brain-wide RNA editing landscape

10.21203/rs.3.rs-110949/v1 ◽

2020 ◽

Author(s):

Jinrong Huang ◽

Lin Lin ◽

Zhanying Dong ◽

Ling Yang ◽

Tianyu Zheng ◽

...

Keyword(s):

Rna Editing ◽

Repetitive Sequences ◽

Brain Regions ◽

Mammalian Brain ◽

Protein Coding ◽

Porcine Brain ◽

Coding Regions ◽

Pig Brain ◽

Genome Wide ◽

A Genome

Abstract Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is an essential post-transcriptional modiﬁcation. Although hundreds of thousands of RNA editing sites have been reported in mammals, brain-wide analysis of the RNA editing in the mammalian brain remains rare. Here, a genome-wide RNA editing investigation is performed in 119 samples, representing 30 anatomically defined subregions in the pig brain. We identify a total of 682,037 A-to-I RNA editing sites of which 97% are not identified before. Within the pig brain, cerebellum and olfactory bulb are regions with most edited transcripts. The editing level of sites residing in protein-coding regions are similar across brain regions, whereas region-distinct editing is observed in repetitive sequences. Highly edited conserved recoding events in pig and human brain are found in neurotransmitter receptors, demonstrating the evolutionary importance of RNA editing in neurotransmission functions. The porcine brain-wide RNA landscape provides a rich resource to better understand the evolutionally importance of post-transcriptional RNA editing.

Download Full-text

Coding and functional defect region prediction of placental protein in an embryo cell of first trimester using ANN approach

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.9.9756 ◽

2018 ◽

Vol 7 (1.9) ◽

pp. 167

Author(s):

Bipin Nair B J ◽

Rahul Reghunath

Keyword(s):

Dna Sequences ◽

Energy Levels ◽

Threshold Energy ◽

First Trimester ◽

Embryo Cell ◽

Cell Protein ◽

Coding Region ◽

Protein Coding ◽

Functional Region ◽

Functional Regions

The protein coding and functional regions in DNA sequences has become an exciting task in bioinformatics. In particular, the coding region has a 3-base periodicity, which helps for exon identification. Many signal processing tools and techniques have been successfully applied to identify tasks, but still need to be improved in this direction. In our work, we employ ANN classifier to predict coding and functional region of proteinin human embryo cell protein in first trimester, and evaluate their performances according to the comparison energy levels of coding region. The obtained from the threshold energy level, results show that in a box plot finally predict the mutation.

Download Full-text

Multilocus Characterization, Gene Expression Analysis of Putative Immunodominant Protein Coding Regions, and Development of Recombinase Polymerase Amplification Assay for Detection of ‘Candidatus Phytoplasma Pruni’ in Prunus avium

Phytopathology ◽

10.1094/phyto-09-18-0326-r ◽

2019 ◽

Vol 109 (6) ◽

pp. 983-992 ◽

Cited By ~ 4

Author(s):

Dan Edward V. Villamor ◽

Kenneth C. Eastwell

Keyword(s):

High Throughput Sequencing ◽

Sweet Cherry ◽

Prunus Avium ◽

Protein A ◽

Recombinase Polymerase Amplification ◽

Sequencing Data ◽

Coding Region ◽

Protein Coding ◽

Coding Regions ◽

Reverse Transcription Pcr

Western X (WX) disease, caused by ‘Candidatus Phytoplasma pruni’, is a devastating disease of sweet cherry resulting in the production of small, bitter-flavored fruits that are unmarketable. Escalation of WX disease in Washington State prompted the development of a rapid detection assay based on recombinase polymerase amplification (RPA) to facilitate timely removal and replacement of diseased trees. Here, we report on a reliable RPA assay targeting putative immunodominant protein coding regions that showed comparable sensitivity to polymerase chain reaction (PCR) in detecting ‘Ca. Phytoplasma pruni’ from crude sap of sweet cherry tissues. Apart from the predominant strain of ‘Ca. Phytoplasma pruni’, the RPA assay also detected a novel strain of phytoplasma from several WX-affected trees. Multilocus sequence analyses using the immunodominant protein A (idpA), imp, rpoE, secY, and 16S ribosomal RNA regions from several ‘Ca. Phytoplasma pruni’ isolates from WX-affected trees showed that this novel phytoplasma strain represents a new subgroup within the 16SrIII group. Examination of high-throughput sequencing data from total RNA of WX-affected trees revealed that the imp coding region is highly expressed, and as supported by quantitative reverse transcription PCR data, it showed higher RNA transcript levels than the previously proposed idpA coding region of ‘Ca. Phytoplasma pruni’.

Download Full-text