Formation of human long intergenic non-coding RNA genes and pseudogenes: ancestral sequences are key players

Mapping Intimacies ◽

10.1101/826784 ◽

2019 ◽

Author(s):

Nicholas Delihas

Keyword(s):

Protein Gene ◽

Repeat Sequence ◽

Chromosome 22 ◽

Segmental Duplications ◽

Repeat Sequences ◽

Ancestral Sequences ◽

Non Coding Rna ◽

Low Copy Repeats ◽

Genomic Element ◽

Rna Genes

AbstractPathways leading to formation of non-coding RNA and protein genes are varied and complex. We report finding a highly conserved repeat sequence present in both human and chimpanzee genomes that appears to have originated from a common primate ancestor. This sequence is repeatedly copied in human chromosome 22 (chr22) low copy repeats (LCR22) or segmental duplications and forms twenty-one different genes, which include human long intergenic non-coding RNA (lincRNA) gene and pseudogene families, as well as the gamma-glutamyltransferase (GGT) protein gene family and the RNA pseudogenes that originate from GGT sequences. In sharp contrast, only predicted protein genes stem from the homologous repeat sequence present in chr22 of chimpanzee. The data point to an ancestral DNA sequence, highly conserved through evolution and duplicated in humans by chromosomal repeat sequences that serves as a functional genomic element in the development of new and diverse genes in humans and chimpanzee.

Download Full-text

Formation of human long intergenic non-coding RNA genes, pseudogenes, and protein genes: Ancestral sequences are key players

PLoS ONE ◽

10.1371/journal.pone.0230236 ◽

2020 ◽

Vol 15 (3) ◽

pp. e0230236 ◽

Cited By ~ 2

Author(s):

Nicholas Delihas

Keyword(s):

Ancestral Sequences ◽

Non Coding Rna ◽

Key Players ◽

Rna Genes

Download Full-text

Formation of a Family of Long Intergenic Noncoding RNA Genes with an Embedded Translocation Breakpoint Motif in Human Chromosomal Low Copy Repeats of 22q11.2—Some Surprises and Questions

Non-Coding RNA ◽

10.3390/ncrna4030016 ◽

2018 ◽

Vol 4 (3) ◽

pp. 16 ◽

Cited By ~ 3

Author(s):

Nicholas Delihas

Keyword(s):

Noncoding Rna ◽

Gene Sequence ◽

Translocation Breakpoint ◽

Segmental Duplications ◽

New Genes ◽

Genetic Abnormalities ◽

Low Copy Repeats ◽

Long Intergenic Noncoding Rna ◽

And Function ◽

Rna Genes

A family of long intergenic noncoding RNA (lincRNA) genes, FAM230 is formed via gene sequence duplication, specifically in human chromosomal low copy repeats (LCR) or segmental duplications. This is the first group of lincRNA genes known to be formed by segmental duplications and is consistent with current views of evolution and the creation of new genes via DNA low copy repeats. It appears to be an efficient way to form multiple lincRNA genes. But as these genes are in a critical chromosomal region with respect to the incidence of abnormal translocations and resulting genetic abnormalities, the 22q11.2 region, and also carry a translocation breakpoint motif, several intriguing questions arise concerning the presence and function of the translocation breakpoint sequence in RNA genes situated in LCR22s.

Download Full-text

Genesis of Non-Coding RNA Genes in Human Chromosome 22—A Sequence Connection with Protein Genes Separated by Evolutionary Time

Non-Coding RNA ◽

10.3390/ncrna6030036 ◽

2020 ◽

Vol 6 (3) ◽

pp. 36

Author(s):

Nicholas Delihas

Keyword(s):

Human Chromosome ◽

Noncoding Rna ◽

Nucleation Site ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Chromosome 22 ◽

Segmental Duplications ◽

Evolutionary Time ◽

Conserved Sequence ◽

Rna Genes

A small phylogenetically conserved sequence of 11,231 bp, termed FAM247, is repeated in human chromosome 22 by segmental duplications. This sequence forms part of diverse genes that span evolutionary time, the protein genes being the earliest as they are present in zebrafish and/or mice genomes, and the long noncoding RNA genes and pseudogenes the most recent as they appear to be present only in the human genome. We propose that the conserved sequence provides a nucleation site for new gene development at evolutionarily conserved chromosomal loci where the FAM247 sequences reside. The FAM247 sequence also carries information in its open reading frames that provides protein exon amino acid sequences; one exon plays an integral role in immune system regulation, specifically, the function of ubiquitin-specific protease (USP18) in the regulation of interferon. An analysis of this multifaceted sequence and the genesis of genes that contain it is presented.

Download Full-text

Genesis of Non-coding RNA Genes- A Sequence Connection with Protein Genes Separated by Evolutionary Time

10.20944/preprints202007.0454.v1 ◽

2020 ◽

Author(s):

Nicholas Delihas

Keyword(s):

Nucleation Site ◽

Amino Acid Sequences ◽

Open Reading Frames ◽

Segmental Duplications ◽

Evolutionary Time ◽

Conserved Sequence ◽

Non Coding Rna ◽

Integral Role ◽

Specific Protease ◽

Rna Genes

A small phylogenetically conserved sequence of 11,231 bp termed FAM247 is repeated in human chromosome 22 by segmental duplications. This sequence forms part of diverse genes that span evolutionary time, the protein genes being the earliest as they are present in zebrafish and/or mice genomes, the long non-coding RNA genes and pseudogenes the most recent as they appear to be present only in the human genome. We propose that the conserved sequence provides a nucleation site for new gene development at evolutionary conserved chromosomal loci where the FAM247 sequences reside. The FAM247 sequence also carries information in its open reading frames that provides protein exon amino acid sequences; one exon plays an integral role in immune system regulation, specifically, the function of ubiquitin specific protease (USP18) in the regulation of interferon. An analysis of this multifaceted sequence and the genesis of genes that contain it are presented.

Download Full-text

Formation of Large Palindromic DNA by Homologous Recombination of Short Inverted Repeat Sequences in Saccharomyces cerevisiae

Genetics ◽

10.1093/genetics/161.3.1065 ◽

2002 ◽

Vol 161 (3) ◽

pp. 1065-1075

Author(s):

David K Butler ◽

David Gillespie ◽

Brandi Steele

Keyword(s):

Homologous Recombination ◽

Inverted Repeat ◽

Excision Repair ◽

Strand Break ◽

Double Strand Break ◽

Repeat Sequence ◽

Inverted Repeat Sequence ◽

Repeat Sequences ◽

Dna Double Strand Break ◽

Intermolecular Reaction

Abstract Large DNA palindromes form sporadically in many eukaryotic and prokaryotic genomes and are often associated with amplified genes. The presence of a short inverted repeat sequence near a DNA double-strand break has been implicated in the formation of large palindromes in a variety of organisms. Previously we have established that in Saccharomyces cerevisae a linear DNA palindrome is efficiently formed from a single-copy circular plasmid when a DNA double-strand break is introduced next to a short inverted repeat sequence. In this study we address whether the linear palindromes form by an intermolecular reaction (that is, a reaction between two identical fragments in a head-to-head arrangement) or by an unusual intramolecular reaction, as it apparently does in other examples of palindrome formation. Our evidence supports a model in which palindromes are primarily formed by an intermolecular reaction involving homologous recombination of short inverted repeat sequences. We have also extended our investigation into the requirement for DNA double-strand break repair genes in palindrome formation. We have found that a deletion of the RAD52 gene significantly reduces palindrome formation by intermolecular recombination and that deletions of two other genes in the RAD52-epistasis group (RAD51 and MRE11) have little or no effect on palindrome formation. In addition, palindrome formation is dramatically reduced by a deletion of the nucleotide excision repair gene RAD1.

Download Full-text

Screening and survival analysis of melanoma immunodrug response-related genes and the function of magnetic nanoparticles in gene extraction

Materials Express ◽

10.1166/mex.2021.2037 ◽

2021 ◽

Vol 11 (8) ◽

pp. 1306-1312

Author(s):

Li Song ◽

Ningchao Du ◽

Haitao Luo ◽

Furong Li

Keyword(s):

Survival Analysis ◽

Magnetic Nanoparticles ◽

Drug Response ◽

High Throughput Sequencing ◽

Cox Proportional Hazards ◽

Sequencing Data ◽

Protein Coding ◽

Non Coding Rna ◽

Long Non Coding Rna ◽

Rna Genes

This study aimed to identify the association of protein coding and long non coding RNA genes with immunotherapy response in melanoma. Based on RNA sequencing data of melanoma specimens, the expression levels of protein coding and long non coding RNA genes were calculated using the Kallisto RNA-seq quantification method, and differently expressed genes were detected using the DESeq2 method. Cox proportional hazards regression was used to evaluate the effects of gene expression on survival. According to the clinical data of 14 patients with drug response and 11 patients without drug response, 18 protein coding genes and 14 long non coding RNAs showed differential expressions (multiple of difference > 2 and P < 0.01 after correction), among which the coding genes of differential expression were significantly enriched through the process of cell adhesion (P < 0.01). The results of survival analysis showed that 18 coding genes and 14 long non coding RNA genes had significant effects on patient survival (P < 0.01). In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. The method has the advantages of removing dangerous reagents such as phenol and chloroform, replacing inorganic coating such as silica with organic oil, and shortening reaction time. Protein coding and long non coding RNA genes as well as magnetic nanoparticles may serve as potential cancer immune biomarker targets for developing future oncological treatments.

Download Full-text

GENT-49. SYSTEMATIC IDENTIFICATION OF ESSENTIAL LONG NON-CODING RNA GENES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/now212.354 ◽

2016 ◽

Vol 18 (suppl_6) ◽

pp. vi84-vi85

Author(s):

Siyuan Liu ◽

Max Horlbeck ◽

Seung Woo Cho ◽

Harjus Birk ◽

Martina Malatesta ◽

...

Keyword(s):

Non Coding Rna ◽

Systematic Identification ◽

Long Non Coding Rna ◽

Rna Genes

Download Full-text

PSoL: a positive sample only learning algorithm for finding non-coding RNA genes

Bioinformatics ◽

10.1093/bioinformatics/btl441 ◽

2006 ◽

Vol 22 (21) ◽

pp. 2590-2596 ◽

Cited By ~ 56

Author(s):

C. Wang ◽

C. Ding ◽

R. F. Meraz ◽

S. R. Holbrook

Keyword(s):

Learning Algorithm ◽

Positive Sample ◽

Non Coding Rna ◽

Rna Genes

Download Full-text

Landscape of Long Noncoding RNA Genes, Pseudogenes, and Protein Genes in Segmental Duplications in the Critical Human Chromosomal Region 22q11.2

RNA Technologies - The Chemical Biology of Long Noncoding RNAs ◽

10.1007/978-3-030-44743-4_6 ◽

2020 ◽

pp. 149-166

Author(s):

Nicholas Delihas

Keyword(s):

Long Noncoding Rna ◽

Noncoding Rna ◽

Chromosomal Region ◽

Segmental Duplications ◽

Rna Genes ◽

Human Chromosomal Region

Download Full-text

Cross-Specificities Between cII-like Proteins and pRE-like Promoters of Lambdoid Bacteriophages

Genetics ◽

10.1093/genetics/115.4.597 ◽

1987 ◽

Vol 115 (4) ◽

pp. 597-604

Author(s):

Daniel L Wulff ◽

Michael E Mahoney

Keyword(s):

Base Change ◽

Repeat Sequence ◽

Sequence Information ◽

Multicopy Plasmid ◽

Recognition Sequence ◽

Repeat Sequences ◽

Activation Of Transcription ◽

Single Base Change ◽

Acid Alteration

ABSTRACT We have investigated the activation of transcription from the pRE promoters of phages λ, 21 and P22 by the λ and 21 cII proteins and the P22 c1 (cII-like) protein, using an in vivo system in which cII protein from a derepressed prophage activates transcription from a pRE DNA fragment on a multicopy plasmid. We find that each protein is highly specific for its own cognate pRE promoter, although measureable cross-reactions are observed. The primary recognition sequence for cII protein on λ pRE is a pair of TTGC repeat sequences in the sequence 5′-TTGCN6TTGC-3′ at the -35 region of the promoter. This same sequence is found in 21 pRE, while P22 pRE has the sequence 5′-TTGCN6TTGT-3′, which is the same as that of λctr1, a pRE + variant of λ. λctr1 pRE is half as active as λ+ pRE when assayed with either the λ cII or the P22 c1 proteins. Therefore, the single base change in the P22 repeat sequence cannot explain why the P22 c1 protein is much more active with P22 pRE than λ pRE. The dya5 mutation, a G→A change at position -43 of pRE, makes pRE a stronger promoter when assayed with either the λ or 21 cII proteins or the P22 c1 protein. We conclude that efficient activation of a cII-dependent promoter by a cII protein requires sequence information in addition to the TTGC repeat sequences. We do not know the characteristics of the proteins which are responsible for the specificity of each protein for its own cognate promoter. However, λdya8, which has a Glu27→Lys alteration in the λ cII protein and a cII + phenotype, results in a mutant cII protein that is much more highly specific than wild-type cII protein for its own cognate λ pRE promoter. This is especially remarkable because the dya8 amino acid alteration makes the helix-2 region (the region of the protein predicted to make contact with the phosphodiester backbone of the DNA) of λ cII protein conform exactly with the helix-2 region of the P22 c1 protein in both charge and charge distribution.

Download Full-text