SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

Mapping Intimacies ◽

10.1101/420307 ◽

2018 ◽

Author(s):

Safa Jammali ◽

Jean-David Aguilar ◽

Esaie Kuitche ◽

Aïda Ouangraoua

Keyword(s):

Gene Family ◽

Dna Sequences ◽

Cdna Sequence ◽

Genomic Sequence ◽

Sequence Similarity ◽

Basic Step ◽

Alignment Algorithms ◽

Spliced Alignment ◽

Gene Structures ◽

Family Based

AbstractMotivationThe inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments.ResultsThe experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. We show its usefulness for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses.AvailabilitySplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/[email protected]

Download Full-text

Sequence and phylogenetic analysis of the SNF4/AMPK gamma subunit gene from Drosophila melanogaster

Genome ◽

10.1139/g99-059 ◽

1999 ◽

Vol 42 (6) ◽

pp. 1077-1087 ◽

Cited By ~ 1

Author(s):

Erin N Yoshida ◽

Bernhard F Benkel ◽

Ying Fong ◽

Donal A Hickey

Keyword(s):

Phylogenetic Analysis ◽

Drosophila Melanogaster ◽

Gene Family ◽

Genomic Sequence ◽

Sequence Similarity ◽

Energy Levels ◽

Metabolic Stress ◽

Gene Copy ◽

Gamma Subunit ◽

Snf1 Kinase

To optimize gene expression under different environmental conditions, many organisms have evolved systems which can quickly up- and down-regulate the activity of other genes. Recently, the SNF1 kinase complex from yeast and the AMP-activated protein kinase complex from mammals have been shown to represent homologous metabolic sensors that are key to regulating energy levels under times of metabolic stress. Using heterologous probing, we have cloned the Drosophila melanogaster homologue of SNF4, the noncatalytic effector subunit from this kinase complex. A sequence corresponding to the partial genomic sequence as well as the full-length cDNA was obtained, and shows that the D. melanogaster SNF4 is encoded in a 1944-bp cDNA representing a protein of 648 amino acids (aa). Southern analysis of Drosophila genomic DNA in concert with a survey of mammalian SNF4 ESTs indicates that in metazoans, SNF4 is a duplicated gene, and possibly even a larger gene family. We propose that one gene copy codes for a short (330 aa) protein, whereas the second locus codes for a longer version (<410 aa) that is extended at the carboxy terminus, as typified by the Drosophila homologue presented here. Phylogenetic analysis of yeast, invertebrate, and multiple mammalian isoforms of SNF4 shows that the gene duplication likely occurred early in the metazoan lineage, as the protein products of the different loci are relatively divergent. When the phylogeny was extended beyond the SNF4 gene family, SNF4 shares sequence similarity with other cystathionine-β-synthase domain-containing proteins, including IMP dehydrogenase and a variety of uncharacterized Methanococcus proteins.Key words: SNF4, AMPK gamma subunit, derepression, gene family, phylogeny.

Download Full-text

Research note: Large gene family of phosphoenolpyruvate carboxylase in the crassulacean acid metabolism plant Kalanchoe pinnata (Crassulaceae) characterised by partial cDNA sequence analysis

Functional Plant Biology ◽

10.1071/fp05079 ◽

2005 ◽

Vol 32 (5) ◽

pp. 467 ◽

Cited By ~ 12

Author(s):

Hans H. Gehrig ◽

Joshua A. Wood ◽

Mary Ann Cushman ◽

Aurelio Virgo ◽

John C. Cushman ◽

...

Keyword(s):

Gene Family ◽

Phosphoenolpyruvate Carboxylase ◽

Crassulacean Acid Metabolism ◽

Cdna Sequence ◽

Acid Metabolism ◽

Sequence Similarity ◽

Vascular Plant ◽

Polymorphism Analysis ◽

Kalanchoe Pinnata ◽

Cam Plant

Clones coding for a 1100-bp cDNA sequence of phosphoenolpyruvate carboxylase (PEPC) of the constitutive crassulacean acid metabolism (CAM) plant Kalanchoe pinnata (Lam.) Pers., were isolated by reverse transcription-polymerase chain reaction (RT–PCR) and characterised by restriction fragment length polymorphism analysis and DNA sequencing. Seven distinct PEPC isogenes were recovered, four in leaves and three in roots (EMBL accession numbers: AJ344052–AJ344058). Sequence similarity comparisons and distance neighbour-joining calculations separate the seven PEPC isoforms into two clades, one of which contains the three PEPCs found in roots. The second clade contains the four isoforms found in leaves and is divided into two branches, one of which contains two PEPCs most similar with described previously CAM isoforms. Of these two isoforms, however, only one exhibited abundant expression in CAM-performing leaves, but not in very young leaves, which do not exhibit CAM, suggesting this isoform encodes a CAM-specific PEPC. Protein sequence calculations suggest that all isogenes are likely derived from a common ancestor gene, presumably by serial gene duplication events. To our knowledge, this is the most comprehensive identification of a PEPC gene family from a CAM plant, and the greatest number of PEPC isogenes reported for any vascular plant to date.

Download Full-text

Genome-Wide Analysis of the β-Amylase Gene Family in Brassica L. Crops and Expression Profiles of BnaBAM Genes in Response to Abiotic Stresses

Agronomy ◽

10.3390/agronomy10121855 ◽

2020 ◽

Vol 10 (12) ◽

pp. 1855

Author(s):

Dan Luo ◽

Ziqi Jia ◽

Yong Cheng ◽

Xiling Zou ◽

Yan Lv

Keyword(s):

Gene Family ◽

Abiotic Stresses ◽

Sequence Similarity ◽

Expression Profiles ◽

Expression Patterns ◽

Chromosomal Distribution ◽

Brassica Napus L ◽

Oil Crops ◽

Genome Wide ◽

Gene Structures

The β-amylase (BAM) gene family, known for their property of catalytic ability to hydrolyze starch to maltose units, has been recognized to play critical roles in metabolism and gene regulation. To date, BAM genes have not been characterized in oil crops. In this study, the genome-wide survey revealed the identification of 30 BnaBAM genes in Brassica napus L. (B. napus L.), 11 BraBAM genes in Brassica rapa L. (B. rapa L.), and 20 BoBAM genes in Brassica oleracea L. (B. oleracea L.), which were divided into four subfamilies according to the sequence similarity and phylogenetic relationships. All the BAM genes identified in the allotetraploid genome of B. napus, as well as two parental-related species (B. rapa and B. oleracea), were analyzed for the gene structures, chromosomal distribution and collinearity. The sequence alignment of the core glucosyl-hydrolase domains was further applied, demonstrating six candidate β-amylase (BnaBAM1, BnaBAM3.1-3.4 and BnaBAM5) and 25 β-amylase-like proteins. The current results also showed that 30 BnaBAMs, 11 BraBAMs and 17 BoBAMs exhibited uneven distribution on chromosomes of Brassica L. crops. The similar structural compositions of BAM genes in the same subfamily suggested that they were relatively conserved. Abiotic stresses pose one of the significant constraints to plant growth and productivity worldwide. Thus, the responsiveness of BnaBAM genes under abiotic stresses was analyzed in B. napus. The expression patterns revealed a stress-responsive behaviour of all members, of which BnaBAM3s were more prominent. These differential expression patterns suggested an intricate regulation of BnaBAMs elicited by environmental stimuli. Altogether, the present study provides first insights into the BAM gene family of Brassica crops, which lays the foundation for investigating the roles of stress-responsive BnaBAM candidates in B. napus.

Download Full-text

Pl-Bh, an anthocyanin regulatory gene of maize that leads to variegated pigmentation.

Genetics ◽

10.1093/genetics/135.2.575 ◽

1993 ◽

Vol 135 (2) ◽

pp. 575-588 ◽

Cited By ~ 4

Author(s):

S M Cocciolone ◽

K C Cone

Keyword(s):

Gene Family ◽

Dna Sequences ◽

Sequence Similarity ◽

Regulatory Gene ◽

Regulatory Genes ◽

Anthocyanin Synthesis ◽

Mrna Levels ◽

Wild Type ◽

Specific Expression ◽

Organ Specific

Abstract Anthocyanins are purple pigments that can be produced in virtually all parts of the maize plant. The spatial distribution of anthocyanin synthesis is dictated by the organ-specific expression of a few regulatory genes that control the transcription of the structural genes. The regulatory genes are grouped into families based on functional identity and DNA sequence similarity. The C1/Pl gene family consists of C1, which controls pigmentation of the kernel, and Pl, which controls pigmentation of the vegetative and floral organs. We have determined the relationship of another gene, Blotched (Bh), to the C1 gene family. Bh was originally described as a gene that conditions blotches of pigmentation in kernels homozygous for recessive c1, suggesting that Bh could functionally replace C1 in the kernel. Our genetic and molecular analyses indicate that Bh is an allele of Pl, that we designate Pl-Bh. Pl-Bh differs from wild-type Pl alleles in two respects. In contrast to the uniform pigmentation observed in plants carrying Pl, the pattern of pigmentation in plants carrying Pl-Bh is variegated. Pl-Bh leads to variegated pigmentation in virtually all tissues of the plant, including the kernel, an organ not pigmented by other Pl alleles. To address the molecular basis for the unusual pattern of expression of Pl-Bh, we cloned and sequenced the gene. The nucleotide sequence of Pl-Bh showed only a single base-pair difference from that of Pl. However, genomic DNA sequences associated with Pl-Bh were found to be hypermethylated relative to the same sequences around the wild-type Pl allele. The methylation was inversely correlated with Pl mRNA levels in variegated plant tissues. Thus, we conclude that DNA methylation may play a role in regulating Pl-Bh expression.

Download Full-text

The β- Amylase Gene Family in Brassica Napus: Genome Wide Analysis and Expression Profiles in Response to Abiotic Stresses

10.21203/rs.3.rs-70066/v1 ◽

2020 ◽

Author(s):

Yan Lv ◽

Dan Luo ◽

Ziqi Jia ◽

Yong Cheng ◽

Xiling Zou

Keyword(s):

Brassica Napus ◽

Gene Family ◽

Abiotic Stresses ◽

Sequence Similarity ◽

Expression Profiles ◽

Expression Patterns ◽

Chromosomal Distribution ◽

Brassica Crops ◽

Genome Wide ◽

Gene Structures

Abstract Background: The β amylase (BAM) gene family, known for their property of catalytic ability to hydrolyze starch to maltose units, has been recognized to play critical roles in metabolism and gene regulation. To date, BAM genes have not been characterized in oil crops.Results: In this study, the genome wide survey revealed the identification of 30 BnaBAM genes in Brassica napus (B. napus), 11 BraBAM genes in Brassica rapa (B. rapa), 20 BoBAM genes in Brassica oleracea (B. oleracea), which were divided into 4 subfamilies according to the sequence similarity and phylogenetic relationships. All the BAM genes identified in the allotetraploid genome of B. napus, as well as two parental related species (B. rapa and B. oleracea), were analyzed for the gene structures, chromosomal distribution and collinearity, the sequence alignment of the core glucosyl hydrolase domains was further applied. 30 BnaBAMs, 11 BraBAMs and 17 BoBAMs exhibited uneven distribution on chromosomes of Brassica crops. The similar structural compositions of BAM genes in the same subfamily suggested that they were relatively conserved. Abiotic stresses pose one of the major constraints to plant growth and productivity worldwide. Thus, the responsiveness of BnaBAM genes under abiotic stresses were analyzed in B. napus. The expression patterns revealed a stress responsive behavior of all members, of which BnaBAM3s were more prominent. These differential expression patterns suggested an intricate regulation of BnaBAMs elicited by environmental stimuli. Conclusion: Altogether, the present study provides first insights into the BAM gene family of Brassica crops, which lays the foundation for investigating the roles of stress--responsive BnaBAM candidates in B. napus.

Download Full-text

Meiotic Recombination Between Paralogous RBCSB Genes on Sister Chromatids of Arabidopsis thaliana

Genetics ◽

10.1093/genetics/166.2.947 ◽

2004 ◽

Vol 166 (2) ◽

pp. 947-957 ◽

Cited By ~ 1

Author(s):

John G Jelesko ◽

Kristy Carter ◽

Whitney Thompson ◽

Yuki Kinoshita ◽

Wilhelm Gruissem

Keyword(s):

Gene Cluster ◽

Dna Sequences ◽

Meiotic Recombination ◽

Sequence Similarity ◽

Specific Gene ◽

High Sequence Similarity ◽

Paralogous Genes ◽

Chimeric Genes ◽

Unequal Recombination ◽

Sister Chromatids

Abstract Paralogous genes organized as a gene cluster can rapidly evolve by recombination between misaligned paralogs during meiosis, leading to duplications, deletions, and novel chimeric genes. To model unequal recombination within a specific gene cluster, we utilized a synthetic RBCSB gene cluster to isolate recombinant chimeric genes resulting from meiotic recombination between paralogous genes on sister chromatids. Several F1 populations hemizygous for the synthRBCSB1 gene cluster gave rise to Luc+ F2 plants at frequencies ranging from 1 to 3 × 10-6. A nonuniform distribution of recombination resolution sites resulted in the biased formation of recombinant RBCS3B/1B::LUC genes with nonchimeric exons. The positioning of approximately half of the mapped resolution sites was effectively modeled by the fractional length of identical DNA sequences. In contrast, the other mapped resolution sites fit an alternative model in which recombination resolution was stimulated by an abrupt transition from a region of relatively high sequence similarity to a region of low sequence similarity. Thus, unequal recombination between paralogous RBCSB genes on sister chromatids created an allelic series of novel chimeric genes that effectively resulted in the diversification rather than the homogenization of the synthRBCSB1 gene cluster.

Download Full-text

Characterization of a gene family abundantly expressed in Oenothera organensis pollen that shows sequence similarity to polygalacturonase.

The Plant Cell ◽

10.1105/tpc.2.3.263 ◽

1990 ◽

Vol 2 (3) ◽

pp. 263-274 ◽

Cited By ~ 93

Author(s):

S M Brown ◽

M L Crouch

Keyword(s):

Gene Family ◽

Sequence Similarity

Download Full-text

Characterization of Salt Overly Sensitive 1 (SOS1) gene homoeologs in quinoa (Chenopodium quinoa Willd.)

Genome ◽

10.1139/g09-041 ◽

2009 ◽

Vol 52 (7) ◽

pp. 647-657 ◽

Cited By ~ 60

Author(s):

P. J. Maughan ◽

T. B. Turner ◽

C. E. Coleman ◽

D. B. Elzinga ◽

E. N. Jellen ◽

...

Keyword(s):

Salt Tolerance ◽

Dna Sequences ◽

Genomic Sequence ◽

Chenopodium Quinoa ◽

Fish Analysis ◽

Cdna Sequences ◽

Grain Crop ◽

High Level ◽

Salt Overly Sensitive

Salt tolerance is an agronomically important trait that affects plant species around the globe. The Salt Overly Sensitive 1 (SOS1) gene encodes a plasma membrane Na+/H+ antiporter that plays an important role in germination and growth of plants in saline environments. Quinoa (Chenopodium quinoa Willd.) is a halophytic, allotetraploid grain crop of the family Amaranthaceae with impressive nutritional content and an increasing worldwide market. Many quinoa varieties have considerable salt tolerance, and research suggests quinoa may utilize novel mechanisms to confer salt tolerance. Here we report the cloning and characterization of two homoeologous SOS1 loci (cqSOS1A and cqSOS1B) from C. quinoa, including full-length cDNA sequences, genomic sequences, relative expression levels, fluorescent in situ hybridization (FISH) analysis, and a phylogenetic analysis of SOS1 genes from 13 plant taxa. The cqSOS1A and cqSOS1B genes each span 23 exons spread over 3477 bp and 3486 bp of coding sequence, respectively. These sequences share a high level of similarity with SOS1 homologs of other species and contain two conserved domains, a Nhap cation-antiporter domain and a cyclic-nucleotide binding domain. Genomic sequence analysis of two BAC clones (98 357 bp and 132 770 bp) containing the homoeologous SOS1 genes suggests possible conservation of synteny across the C. quinoa sub-genomes. This report represents the first molecular characterization of salt-tolerance genes in a halophytic species in the Amaranthaceae as well as the first comparative analysis of coding and non-coding DNA sequences of the two homoeologous genomes of C. quinoa.

Download Full-text

Leveraging X-Ray Structural Information in Gene Family-Based Drug Discovery

Spectral Techniques In Proteomics ◽

10.1201/9781420017090.ch18 ◽

2007 ◽

pp. 373-390

Author(s):

Alex Aronov ◽

Al Pierce ◽

Guy Bemis ◽

Marc Jacobs ◽

Harmon Zuccola ◽

...

Keyword(s):

Drug Discovery ◽

Gene Family ◽

Structural Information ◽

X Ray ◽

Family Based

Download Full-text

A spontaneous chromosomal amplification of the ADH2 gene in Saccharomyces cerevisiae.

Genetics ◽

10.1093/genetics/130.2.263 ◽

1992 ◽

Vol 130 (2) ◽

pp. 263-271

Author(s):

C E Paquin ◽

M Dorsey ◽

S Crable ◽

K Sprinkel ◽

M Sondej ◽

...

Keyword(s):

Dna Sequences ◽

Sequence Similarity ◽

Resistant Mutant ◽

Yeast Genome ◽

Crossing Over ◽

Antimycin A ◽

Extra Copy ◽

Multiple Copies ◽

Or Gene ◽

Functional Copy

Abstract A spontaneous antimycin A-resistant mutant carrying approximately four extra copies of ADH2 on chromosome XII was isolated from yeast strain 315-1D which lacks a functional copy of ADH1 and thus is antimycin A-sensitive. The additional copies of the normally glucose-repressed ADH2 are expressed during growth on glucose accounting for the antimycin A resistance. These extra copies are inserted into nonadjacent ribosomal DNA sequences (rDNA) near the recombination stimulating sequence HOT1. Each extra copy of the ADH2 gene (1548 bp) replaces most of the 37S transcript (approximately 7400 bp) in one of the approximately 200 copies of the rDNA present in the yeast genome. All four extra copies of ADH2 are lost at a rate of approximately 1 x 10(-5) deletions per cell per generation. One of the joints between the rDNA and ADH2 DNA is located 7 nucleotides downstream from 20 adenine residues in the normal copy of ADH2. This joint occurs at the end of a stretch of 16-29 thymidines in the rDNA which has been expanded to 57-59 thymidines. The other novel joint is located in a short region of sequence similarity between ADH2 and the rDNA. These observations suggest that amplification of ADH2 was a two step process: first the ADH2 gene was inserted into the rDNA, then multiple copies were generated by unequal crossing over or gene conversion within the rDNA.

Download Full-text