Identification and qualification of 500 nuclear, single-copy, orthologous genes for the Eupulmonata (Gastropoda) using transcriptome sequencing and exon-capture

Mapping Intimacies ◽

10.1101/035543 ◽

2016 ◽

Author(s):

Luisa C. Teasdale ◽

Frank Köehler ◽

Kevin D. Murray ◽

Tim O’Hara ◽

Adnan Moussalli

Keyword(s):

Gene Tree ◽

Single Copy ◽

Land Snails ◽

Data Matrix ◽

Probe Design ◽

Protein Coding ◽

Gene Set ◽

Exon Capture ◽

Gene Data ◽

Automated Methods

ABSTRACTThe qualification of orthology is a significant challenge when developing large, multiloci phylogenetic datasets from assembled transcripts. Transcriptome assemblies have various attributes, such as fragmentation, frameshifts, and mis-indexing, which pose problems to automated methods of orthology assessment. Here, we identify a set of orthologous single-copy genes from transcriptome assemblies for the land snails and slugs (Eupulmonata) using a thorough approach to orthology determination involving manual alignment curation, gene tree assessment and sequencing from genomic DNA. We qualified the orthology of 500 nuclear, protein coding genes from the transcriptome assemblies of 21 eupulmonate species to produce the most complete gene data matrix for a major molluscan lineage to date, both in terms of taxon and character completeness. Exon-capture targeting 490 of the 500 genes (those with at least one exon > 120 bp) from 22 species of Australian Camaenidae successfully captured sequences of 2,825 exons (representing all targeted genes), with only a 3.7% reduction in the data matrix due to the presence of putative paralogs or pseudogenes. The automated pipeline Agalma retrieved the majority of the manually qualified 500 single-copy gene set and identified a further 375 putative single-copy genes, although it failed to account for fragmented transcripts resulting in lower data matrix completeness. This could potentially explain the minor inconsistencies we observed in the supported topologies for the 21 eupulmonate species between the manually curated and Agalma-equivalent dataset (sharing 458 genes). Overall, our study confirms the utility of the 500 gene set to resolve phylogenetic relationships at a broad range of evolutionary depths, and highlights the importance of addressing fragmentation at the homolog alignment stage for probe design.

Download Full-text

Complete Plastome of Three Korean Asarum (Aristolochiaceae): Confirmation Tripartite Structure within Korean Asarum and Comparative Analyses

Plants ◽

10.3390/plants10102056 ◽

2021 ◽

Vol 10 (10) ◽

pp. 2056

Author(s):

Mi-Jeong Yoo ◽

Dong-Pil Jin ◽

Hyun-Oh Lee ◽

Chae Eun Lim

Keyword(s):

Ornamental Plants ◽

Illumina Miseq ◽

Single Copy ◽

Data Set ◽

Protein Coding ◽

Morphological Variations ◽

Oxford Nanopore ◽

Plastid Protein ◽

Gene Data ◽

Small Single Copy

The genus Asarum (Aristolochiaceae) is a well-known resource of medicinal and ornamental plants. However, the taxonomy of Korean Asarum is ambiguous due to their considerable morphological variations. Previously, a unique plastome structure has been reported from this genus. Therefore, we investigated the structural change in the plastomes within three Korean Asarum species and inferred their phylogenetic relationships. The plastome sizes of Asarum species assembled here range from 190,168 to 193,356 bp, which are longer than a typical plastome size (160 kb). This is due to the incorporation and duplication of the small single copy into the inverted repeat, which resulted in a unique tripartite structure. We first verified this unique structure using the Illumina Miseq and Oxford Nanopore MinION platforms. We also investigated the phylogeny of 26 Aristolochiaceae species based on 79 plastid protein-coding genes, which supports the monophyly of Korean Asarum species. Although the 79 plastid protein-coding gene data set showed some limitations in supporting the previous classification, it exhibits its effectiveness in delineating some sections and species. Thus, it can serve as an effective tool for resolving species-level phylogeny in Aristolochiaceae. Last, we evaluated variable sites and simple sequence repeats in the plastome as potential molecular markers for species delimitation.

Download Full-text

Gene markers for exon capture and phylogenomics in ray-finned fishes

10.1101/180786 ◽

2017 ◽

Cited By ~ 2

Author(s):

Jiamei Jiang ◽

Hao Yuan ◽

Xin Zheng ◽

Qian Wang ◽

Ting Kuang ◽

...

Keyword(s):

Target Gene ◽

Single Copy ◽

A Posteriori ◽

Protein Coding ◽

Gene Markers ◽

Taxonomic Level ◽

Exon Capture ◽

Phylogenomic Study ◽

Generation Sequencing ◽

Selection Of

AbstractGene capture coupled with the next generation sequencing has become one of the favorable methods in subsampling genomes for phylogenomic studies. Many target gene markers have been developed in plants, sharks, frogs, reptiles and others, but few have been reported in the ray-finned fishes. Here, we identified a suite of “single-copy” protein coding sequence (CDS) markers through comparing eight fish genomes, and tested them empirically in 83 species (33 families and 11 orders) of ray-finned fishes. Sorting through the markers according to their completeness and phylogenetic decisiveness in taxa tested resulted in a selection of 4,434 markers, which were proven to be useful in reconstructing phylogenies of the ray-finned fishes at different taxonomic level. We also proposed a strategy of refining baits (probes) design a posteriori based on empirical data. The markers that we have developed may fill a gap in the tool kit of phylogenomic study in vertebrates.

Download Full-text

Development and evaluation of a custom bait design based on 469 single-copy protein-coding genes for exon capture of isopods (Philosciidae: Haloniscus)

PLoS ONE ◽

10.1371/journal.pone.0256861 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0256861

Author(s):

Danielle N. Stringer ◽

Terry Bertozzi ◽

Karen Meusemann ◽

Steven Delean ◽

Michelle T. Guzik ◽

...

Keyword(s):

Sequence Data ◽

Methodological Approach ◽

Cost Effective ◽

Single Copy ◽

Model Organisms ◽

Protein Coding ◽

Genetics Research ◽

Exon Capture ◽

High Uniformity ◽

Multiple Samples

Transcriptome-based exon capture approaches, along with next-generation sequencing, are allowing for the rapid and cost-effective production of extensive and informative phylogenomic datasets from non-model organisms for phylogenetics and population genetics research. These approaches generally employ a reference genome to infer the intron-exon structure of targeted loci and preferentially select longer exons. However, in the absence of an existing and well-annotated genome, we applied this exon capture method directly, without initially identifying intron-exon boundaries for bait design, to a group of highly diverse Haloniscus (Philosciidae), paraplatyarthrid and armadillid isopods, and examined the performance of our methods and bait design for phylogenetic inference. Here, we identified an isopod-specific set of single-copy protein-coding loci, and a custom bait design to capture targeted regions from 469 genes, and analysed the resulting sequence data with a mapping approach and newly-created post-processing scripts. We effectively recovered a large and informative dataset comprising both short (<100 bp) and longer (>300 bp) exons, with high uniformity in sequencing depth. We were also able to successfully capture exon data from up to 16-year-old museum specimens along with more distantly related outgroup taxa, and efficiently pool multiple samples prior to capture. Our well-resolved phylogenies highlight the overall utility of this methodological approach and custom bait design, which offer enormous potential for application to future isopod, as well as broader crustacean, molecular studies.

Download Full-text

Sequencing of Organellar Genomes of Nowellia curvifolia (Cephaloziaceae Jungermanniales) Revealed the Smallest Plastome with Complete Gene Set and High Intraspecific Variation Suggesting Cryptic Speciation

Diversity ◽

10.3390/d13020081 ◽

2021 ◽

Vol 13 (2) ◽

pp. 81

Author(s):

Jakub Sawicki ◽

Katarzyna Krawczyk ◽

Monika Ślipiko ◽

Monika Szczecińska

Keyword(s):

Cryptic Speciation ◽

Editing Event ◽

Protein Coding ◽

Gene Set ◽

Protein Coding Genes ◽

Organellar Genomes ◽

The Family ◽

Leafy Liverwort ◽

Aneura Mirabilis ◽

First Time

The leafy liverwort Nowellia curvifolia is a widespread Holarctic species belonging to the family Cephaloziaceae. It is made up of a newly sequenced, assembled and annotated organellar genomes of two European specimens, which revealed the structure typical for liverworts, but also provided new insights into its microevolution. The plastome of N. curvifolia is the second smallest among photosynthetic liverworts, with the shortest known inverted repeats. Moreover, it is the smallest liverwort genome with a complete gene set, since two smaller genomes of Aneura mirabilis and Cololejeunea lanciloba are missing six and four protein-coding genes respectively. The reduction of plastome size in leafy liverworts seems to be mainly impacted by deletion within specific region between psbA and psbD genes. The comparative intraspecific analysis revealed single SNPs difference among European individuals and a low number of 35 mutations differentiating European and North American specimens. However, the genetic resources of Asian specimen enabled to identify 1335 SNPs in plastic protein-coding genes suggesting an advanced cryptic speciation within N. curvifolia or the presence of undescribed morphospecies in Asia. Newly sequenced mitogenomes from European specimens revealed identical gene content and structure to previously published and low intercontinental differentiation limited to one substitution and three indels. The RNA-seq based RNA editing analysis revealed 17 and 127 edited sites in plastome and mitogenome respectively including one non-canonical editing event in plastid chiL gene. The U to C editing is common in non-seed plants, but in liverwort plastome is reported for the first time.

Download Full-text

The Chloroplast Phylogenomics and Systematics of Zoysia (Poaceae)

Plants ◽

10.3390/plants10081517 ◽

2021 ◽

Vol 10 (8) ◽

pp. 1517

Author(s):

Se-Hwan Cheon ◽

Min-Ah Woo ◽

Sangjin Jo ◽

Young-Kee Kim ◽

Ki-Joong Kim

Keyword(s):

Northeast Asia ◽

Single Copy ◽

Rrna Genes ◽

Bootstrap Support ◽

Trna Genes ◽

Protein Coding ◽

Tropical Regions ◽

Relationship Of ◽

The Relationship ◽

Simple Sequence

The genus Zoysia Willd. (Chloridoideae) is widely distributed from the temperate regions of Northeast Asia—including China, Japan, and Korea—to the tropical regions of Southeast Asia. Among these, four species—Zoysia japonica Steud., Zoysia sinica Hance, Zoysia tenuifolia Thiele, and Zoysia macrostachya Franch. & Sav.—are naturally distributed in the Korean Peninsula. In this study, we report the complete plastome sequences of these Korean Zoysia species (NCBI acc. nos. MF953592, MF967579~MF967581). The length of Zoysia plastomes ranges from 135,854 to 135,904 bp, and the plastomes have a typical quadripartite structure, which consists of a pair of inverted repeat regions (20,962~20,966 bp) separated by a large (81,348~81,392 bp) and a small (12,582~12,586 bp) single-copy region. In terms of gene order and structure, Zoysia plastomes are similar to the typical plastomes of Poaceae. The plastomes encode 110 genes, of which 76 are protein-coding genes, 30 are tRNA genes, and four are rRNA genes. Fourteen genes contain single introns and one gene has two introns. Three evolutionary hotspot spacer regions—atpB~rbcL, rps16~rps3, and rpl32~trnL-UAG—were recognized among six analyzed Zoysia species. The high divergences in the atpB~rbcL spacer and rpl16~rpl3 region are primarily due to the differences in base substitutions and indels. In contrast, the high divergence between rpl32~trnL-UAG spacers is due to a small inversion with a pair of 22 bp stem and an 11 bp loop. Simple sequence repeats (SSRs) were identified in 59 different locations in Z. japonica, 63 in Z. sinica, 62 in Z. macrostachya, and 63 in Z. tenuifolia plastomes. Phylogenetic analysis showed that the Zoysia (Zoysiinae) forms a monophyletic group, which is sister to Sporobolus (Sporobolinae), with 100% bootstrap support. Within the Zoysia clade, the relationship of (Z. sinica, Z japonica), (Z. tenuifolia, Z. matrella), (Z. macrostachya, Z. macrantha) was suggested.

Download Full-text

Intergeneric Relationships within the Early-Diverging Angiosperm Family Nymphaeaceae Based on Chloroplast Phylogenomics

International Journal of Molecular Sciences ◽

10.3390/ijms19123780 ◽

2018 ◽

Vol 19 (12) ◽

pp. 3780 ◽

Cited By ~ 3

Author(s):

Dingxuan He ◽

Andrew Gichira ◽

Zhizhong Li ◽

John Nzei ◽

Youhao Guo ◽

...

Keyword(s):

Genome Structure ◽

Morphological Data ◽

Basal Angiosperm ◽

Phylogenetic Position ◽

Data Set ◽

Protein Coding ◽

Plastid Genomes ◽

Chloroplast Genomes ◽

Gene Data ◽

High Degree

The order Nymphaeales, consisting of three families with a record of eight genera, has gained significant interest from botanists, probably due to its position as a basal angiosperm. The phylogenetic relationships within the order have been well studied; however, a few controversial nodes still remain in the Nymphaeaceae. The position of the Nuphar genus and the monophyly of the Nymphaeaceae family remain uncertain. This study adds to the increasing number of the completely sequenced plastid genomes of the Nymphaeales and applies a large chloroplast gene data set in reconstructing the intergeneric relationships within the Nymphaeaceae. Five complete chloroplast genomes were newly generated, including a first for the monotypic Euryale genus. Using a set of 66 protein-coding genes from the chloroplast genomes of 17 taxa, the phylogenetic position of Nuphar was determined and a monophyletic Nymphaeaceae family was obtained with convincing statistical support from both partitioned and unpartitioned data schemes. Although genomic comparative analyses revealed a high degree of synteny among the chloroplast genomes of the ancient angiosperms, key minor variations were evident, particularly in the contraction/expansion of the inverted-repeat regions and in RNA-editing events. Genome structure, and gene content and arrangement were highly conserved among the chloroplast genomes. The intergeneric relationships defined in this study are congruent with those inferred using morphological data.

Download Full-text

Genomic Analysis of Sarcomyxa edulis Reveals the Basis of Its Medicinal Properties and Evolutionary Relationships

Frontiers in Microbiology ◽

10.3389/fmicb.2021.652324 ◽

2021 ◽

Vol 12 ◽

Author(s):

Fenghua Tian ◽

Changtian Li ◽

Yu Li

Keyword(s):

Single Molecule ◽

De Novo ◽

Genomic Analysis ◽

Single Copy ◽

Whole Genome Sequence ◽

Type I ◽

Whole Genome ◽

Uridine Diphosphate ◽

Protein Coding ◽

Medicinal Value

Yuanmo [Sarcomyxa edulis (Y.C. Dai, Niemelä & G.F. Qin) T. Saito, Tonouchi & T. Harada] is an important edible and medicinal mushroom endemic to Northeastern China. Here we report the de novo sequencing and assembly of the S. edulis genome using single-molecule real-time sequencing technology. The whole genome was approximately 35.65 Mb, with a G + C content of 48.31%. Genome assembly generated 41 contigs with an N50 length of 1,772,559 bp. The genome comprised 9,364 annotated protein-coding genes, many of which encoded enzymes involved in the modification, biosynthesis, and degradation of glycoconjugates and carbohydrates or enzymes predicted to be involved in the biosynthesis of secondary metabolites such as terpene, type I polyketide, siderophore, and fatty acids, which are responsible for the pharmacodynamic activities of S. edulis. We also identified genes encoding 1,3-β-glucan synthase and endo-1,3(4)-β-glucanase, which are involved in polysaccharide and uridine diphosphate glucose biosynthesis. Phylogenetic and comparative analyses of Basidiomycota fungi based on a single-copy orthologous protein indicated that the Sarcomyxa genus is an independent group that evolved from the Pleurotaceae family. The annotated whole-genome sequence of S. edulis can serve as a reference for investigations of bioactive compounds with medicinal value and the development and commercial production of superior S. edulis varieties.

Download Full-text

Prenatal diagnosis of a 4.5-Mb deletion at chromosome 4q35.1q35.2: Case report and literature review

Molecular Cytogenetics ◽

10.1186/s13039-021-00573-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Gefei Xiao ◽

Xianrong Qiu ◽

Yuqiu Zhou ◽

Gongjun Tan ◽

Yao Shen

Keyword(s):

Single Copy ◽

Peripheral Blood Lymphocytes ◽

Chromosomal Microarray ◽

Congenital Defects ◽

Banding Technique ◽

Chromosomal Microarray Analysis ◽

Phenotype Correlation ◽

Protein Coding ◽

Genotype Phenotype Correlation ◽

Number Variation

Abstract Objective We present a genetic analysis of an asymptomatic family with a 4q terminal deletion; we also review other similar published studies and discuss the genotype–phenotype correlation. Methods A karyotype analysis was performed on the amniotic fluid cells of a woman at 24 weeks of pregnancy and peripheral blood lymphocytes from both parents and their older son with the conventional G-banding technique. Chromosomal microarray analysis (CMA) testing was carried out for both parents and the fetus to analyze copy number variation (CNV) in the whole genome. Results The results showed no abnormalities in the karyotypes of the father and older son, and the karyotypes of the mother and fetus were 46,XX,del(4)(q35.1) and 46,XY,del(4)(q35.1), respectively. CMA results showed a partial deletion at the 4q terminus in both the fetus and mother. The deletion region of the fetus was arr[GRCh37] 4q35.1q35.2(186,431,008_190,957,460) × 1; the loss size of the CNV was approximately 4.5 Mb and involved 14 protein-coding genes, namely, CYP4V2, F11, FAM149A, FAT1, FRG1, FRG2, KLKB1, MTNR1A, PDLIM3, SORBS2, TLR3, TRIML1, TRIML2, and ZFP42. No variation on chromosome 4 was detected in the father’s CMA results. Conclusion Deletion of the 4q subtelomeric region is a familial variation. The arr[GRCh37] 4q35.1q35.2(186,431,008_190,957,460) region single-copy deletion did not cause obvious congenital defects or mental retardation. The application of high-resolution genetic testing technology combined with the analysis of public genetic database information can more clearly elucidate the genotype–phenotype correlation of the disease and provide support for both prenatal and postnatal genetic counseling.

Download Full-text

Comparative genome analysis revealed gene inversions, boundary expansion and contraction, and gene loss in Stemona sessilifolia (Miq.) Miq. chloroplast genome

10.1101/2021.02.15.431246 ◽

2021 ◽

Author(s):

Jingting Liu ◽

Mei Jiang ◽

Haimei Chen ◽

Yu Liu ◽

Chang Liu ◽

...

Keyword(s):

Chloroplast Genome ◽

Gene Loss ◽

Herbal Medicines ◽

Single Copy ◽

Asparagus Officinalis ◽

Accurate Identification ◽

Protein Coding ◽

Variable Regions ◽

Complete Chloroplast Genome ◽

Next Generation Sequencing Technology

AbstractStemona sessilifolia (Miq.) Miq., commonly known as Baibu, is one of the most popular herbal medicines in Asia. In Chinese Pharmacopoeia, Baibu has multiple authentic sources, and there are many homonym herbs sold as Baibu in the herbal medicine market. The existence of the counterfeits of Baibu brings challenges to its identification. To assist the accurate identification of Baibu, we sequenced and analyzed the complete chloroplast genome of Stemona sessilifolia using next-generation sequencing technology. The genome was 154,039 bp in length, possessing a typical quadripartite structure consisting of a pair of inverted repeats (IRs: 27,094 bp) separating by a large single copy (LSC: 81,950 bp) and a small single copy (SSC: 17,901 bp). A total of 112 unique genes were identified, including 80 protein-coding, 28 transfer RNA, and four ribosomal RNA genes. Besides, 45 tandem, 27 forward, 23 palindromic, and 72 simple sequence repeats were detected in the genome by repeat analysis. Compared with its counterfeits (Asparagus officinalis and Carludovica palmate), we found that IR expansion and SSC contraction events of Stemona sessilifolia resulted in two copies of the rpl22 gene in the IR regions and partial duplication of the ndhF gene in the SSC region. Secondly, an approximately 3-kb-long inversion was identified in the LSC region, leading to the petA and cemA gene presented in the complementary strand of the chloroplast DNA molecule. Comparative analysis revealed some highly variable regions, including trnF-GAA_ndhJ, atpB_rbcL, rps15_ycf1, trnG-UCC_trnR-UCU, ndhF_rpl32. Finally, gene loss events were investigated in the context of phylogenetic relationships. In summary, the complete plastome of Stemona sessilifolia will provide valuable information for the molecular identification of Baibu and assist in elucidating the evolution of Stemona sessilifolia.

Download Full-text

Draft Genome Sequence of Bacillus sp. Strain IGA-FME-2, Isolated from the Bulk Soil of Soybean (Glycine max L.) in Northeast China

Microbiology Resource Announcements ◽

10.1128/mra.00004-21 ◽

2021 ◽

Vol 10 (16) ◽

Author(s):

Zhenhua Yu ◽

Sergio de los Santos-Villalobos ◽

Yansheng Li ◽

Jian Jin ◽

Fannie Isela Parra Cota ◽

...

Keyword(s):

Glycine Max ◽

Draft Genome ◽

Gc Content ◽

Single Copy ◽

Bulk Soil ◽

23S Rrna ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Glycine Max L

ABSTRACT Here, we present the draft genome of Bacillus sp. strain IGA-FME-2. This strain was isolated from the bulk soil of soybean (Glycine max L.). Its genome consists of 3,810 protein-coding genes, 44 tRNAs, two 16S rRNAs, and a single copy of 23S rRNA, with a GC content of 46.4%.

Download Full-text