The sequence and de novo assembly of Oxygymnocypris stewartii genome

Hai-Ping Liu; Shi-Jun Xiao; Nan Wu; Di Wang; Yan-Chao Liu; Chao-Wei Zhou; Qi-Yong Liu; Rui-Bin Yang; Wen-Kai Jiang; Qi-Qi Liang;  Wangjiu; Chi Zhang; Jun-Hua Gong; Xiao-Hui Yuan; Zhen-Bo Mou

doi:10.1038/sdata.2019.9

The sequence and de novo assembly of Oxygymnocypris stewartii genome

Scientific Data ◽

10.1038/sdata.2019.9 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 11

Author(s):

Hai-Ping Liu ◽

Shi-Jun Xiao ◽

Nan Wu ◽

Di Wang ◽

Yan-Chao Liu ◽

...

Keyword(s):

Tibetan Plateau ◽

De Novo ◽

The Tibetan Plateau ◽

Sequencing Data ◽

Protein Coding ◽

Sequencing Platform ◽

A Genome ◽

Altitude Adaptation ◽

Oxygymnocypris Stewartii ◽

Local Ecology

Abstract Animal genomes in the Qinghai-Tibetan Plateau provide valuable resources for scientists to understand the molecular mechanism of environmental adaptation. Tibetan fish species play essential roles in the local ecology; however, the genomic information for native fishes was still insufficient. Oxygymnocypris stewartii, belonging to Oxygymnocypris genus, Schizothoracinae subfamily, is a native fish in the Tibetan plateau living within the elevation from roughly 3,000 m to 4,200 m. In this report, PacBio and Illumina sequencing platform were used to generate ~385.3 Gb genomic sequencing data. A genome of about 1,849.2 Mb was obtained with a contig N50 length of 257.1 kb. More than 44.5% of the genome were identified as repetitive elements, and 46,400 protein-coding genes were annotated in the genome. The assembled genome can be used as a reference for future population genetic studies of O. stewartii and will improve our understanding of high altitude adaptation of fishes in the Qinghai-Tibetan Plateau.

Download Full-text

Next generation sequencing allows deeper analysis and understanding of genomes and transcriptomes including aspects to fertility

Reproduction Fertility and Development ◽

10.1071/rd10247 ◽

2011 ◽

Vol 23 (1) ◽

pp. 75 ◽

Cited By ~ 7

Author(s):

Thomas Werner

Keyword(s):

Next Generation Sequencing ◽

Transcriptional Control ◽

Target Genes ◽

De Novo ◽

Alternative Promoters ◽

Next Generation ◽

Sequencing Data ◽

Genome Wide ◽

A Genome ◽

Generation Sequencing

Reproduction and fertility are controlled by specific events naturally linked to oocytes, testes and early embryonal tissues. A significant part of these events involves gene expression, especially transcriptional control and alternative transcription (alternative promoters and alternative splicing). While methods to analyse such events for carefully predetermined target genes are well established, until recently no methodology existed to extend such analyses into a genome-wide de novo discovery process. With the arrival of next generation sequencing (NGS) it becomes possible to attempt genome-wide discovery in genomic sequences as well as whole transcriptomes at a single nucleotide level. This does not only allow identification of the primary changes (e.g. alternative transcripts) but also helps to elucidate the regulatory context that leads to the induction of transcriptional changes. This review discusses the basics of the new technological and scientific concepts arising from NGS, prominent differences from microarray-based approaches and several aspects of its application to reproduction and fertility research. These concepts will then be illustrated in an application example of NGS sequencing data analysis involving postimplantation endometrium tissue from cows.

Download Full-text

De Novo Whole-Genome Sequencing of the Wood Rot Fungus Polyporus brumalis, Which Exhibits Potential Terpenoid Metabolism

Genome Announcements ◽

10.1128/genomea.00586-17 ◽

2017 ◽

Vol 5 (28) ◽

Author(s):

Su-Yeon Lee ◽

Ji-eun An ◽

Sun-Hwa Ryu ◽

Myungkil Kim

Keyword(s):

Single Molecule ◽

De Novo ◽

Gene Annotation ◽

Draft Genome ◽

Fungal Growth ◽

Protein Coding ◽

Sequencing Platform ◽

Protein Coding Genes ◽

Polyporus Brumalis ◽

Terpenoid Metabolism

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.

Download Full-text

The human gene damage index as a gene-level approach to prioritizing exome variants

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1518646112 ◽

2015 ◽

Vol 112 (44) ◽

pp. 13615-13620 ◽

Cited By ~ 131

Author(s):

Yuval Itan ◽

Lei Shang ◽

Bertrand Boisson ◽

Etienne Patin ◽

Alexandre Bolze ◽

...

Keyword(s):

General Population ◽

De Novo ◽

Damage Index ◽

Monogenic Disease ◽

Sequence Length ◽

De Novo Mutations ◽

Protein Coding ◽

A Genome ◽

Gene Level ◽

Gene Damage

The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.

Download Full-text

Genetic signatures of high-altitude adaptation in Tibetans

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1617042114 ◽

2017 ◽

Vol 114 (16) ◽

pp. 4189-4194 ◽

Cited By ~ 59

Author(s):

Jian Yang ◽

Zi-Bing Jin ◽

Jie Chen ◽

Xiu-Feng Huang ◽

Xiao-Man Li ◽

...

Keyword(s):

High Altitude ◽

Methylenetetrahydrofolate Reductase ◽

The Tibetan Plateau ◽

High Altitude Adaptation ◽

Drift Model ◽

Genome Wide ◽

A Genome ◽

Altitude Adaptation ◽

Genetic Signatures ◽

Genome Wide Study

Indigenous Tibetan people have lived on the Tibetan Plateau for millennia. There is a long-standing question about the genetic basis of high-altitude adaptation in Tibetans. We conduct a genome-wide study of 7.3 million genotyped and imputed SNPs of 3,008 Tibetans and 7,287 non-Tibetan individuals of Eastern Asian ancestry. Using this large dataset, we detect signals of high-altitude adaptation at nine genomic loci, of which seven are unique. The alleles under natural selection at two of these loci [methylenetetrahydrofolate reductase (MTHFR) and EPAS1] are strongly associated with blood-related phenotypes, such as hemoglobin, homocysteine, and folate in Tibetans. The folate-increasing allele of rs1801133 at the MTHFR locus has an increased frequency in Tibetans more than expected under a drift model, which is probably a consequence of adaptation to high UV radiation. These findings provide important insights into understanding the genomic consequences of high-altitude adaptation in Tibetans.

Download Full-text

High altitude adaptation mitigates anemia risk associated with diabetes among the Mosuo of Southwest China

10.1101/406579 ◽

2018 ◽

Cited By ~ 1

Author(s):

M Su ◽

K Wander ◽

MK Shenk ◽

T Blumenfield ◽

H Li ◽

...

Keyword(s):

Tibetan Plateau ◽

High Altitude ◽

Market Integration ◽

Hypoxia Inducible Factor ◽

Human Populations ◽

The Tibetan Plateau ◽

Low Oxygen ◽

High Altitude Adaptation ◽

Altitude Adaptation ◽

Hb Concentration

AbstractHuman populations native to high altitude regions (≥2500 m) exhibit numerous adaptations to hypoxic stress. On the Tibetan Plateau, these include modifications of the hypoxia inducible factor (HIF) pathway to essentially uncouple erythropoiesis (red blood cell production) and blood hemoglobin (Hb) concentration—which normally increase in response to low oxygen—from hypoxia. Uncoupling of erythropoiesis and hypoxia is also observed among people with diabetes due to damage to kidney tissues. This is hypothesized to result in elevated risk for anemia among diabetics, which increases risk for cardiovascular disease and death. We tested the hypothesis that the independence of erythropoiesis from HIF among high-altitude adapted populations of the Tibetan Plateau may protect against diabetes-associated anemia. We investigated this hypothesis among the Mosuo, a population living in Yunnan Province, China (at ~2800 m altitude) that is undergoing rapid market integration and lifestyle change, with concomitant increase in risk for type 2 diabetes. We found that, although diabetes (glycated hemoglobin, HbA1c ≥6.5%) is associated with anemia (females: Hb<12g/dl; males: Hb<13g/dl) among the Chinese population as a whole (N: 5,606; OR: 1.48; p: 0.008), this is not the case among the Mosuo (N: 316; OR: 1.36; p: 0.532). Both pathways uncoupling hypoxia from erythropoiesis (diabetic disease and high altitude adaptation) are incompletely understood; their intersection in protecting Mosuo with diabetes from anemia may provide insight into the mechanisms underlying each. Further, these findings point to the importance of understanding how high-altitude adaptations interact with chronic disease processes, as populations like the Mosuo experience rapid market integration.

Download Full-text

De Novo Assembly and Species-Specific Marker Development as a Useful Tool for the Identification of Scutellaria L. Species

Current Issues in Molecular Biology ◽

10.3390/cimb43030152 ◽

2021 ◽

Vol 43 (3) ◽

pp. 2177-2188

Author(s):

Hakjoon Choi ◽

Wan Seok Kang ◽

Jin Seok Kim ◽

Chang-Su Na ◽

Sunoh Kim

Keyword(s):

De Novo ◽

Dna Barcode ◽

Core Gene ◽

Nucleotide Sequences ◽

Specific Marker ◽

Sequencing Platform ◽

Pcr Product ◽

A Genome ◽

Folk Remedy ◽

Species Specific

Scutellaria L. (family Lamiaceae) includes approximately 470 species found in most parts of the world and is commonly known as skullcaps. Scutellaria L. is a medicinal herb used as a folk remedy in Korea and East Asia, but it is difficult to identify and classify various subspecies by morphological methods. Since Scutellaria L. has not been studied genetically, to expand the knowledge of species in the genus Scutellaria L., de novo whole-genome assembly was performed in Scutellaria indica var. tsusimensis (H. Hara) Ohwi using the Illumina sequencing platform. We aimed to develop a molecular method that could be used to classify S.indica var. tsusimensis (H. Hara) Ohwi, S. indica L. and three other Scutellaria L. species. The assembly results for S.indica var. tsusimensis (H. Hara) Ohwi revealed a genome size of 318,741,328 bp and a scaffold N50 of 78,430. The assembly contained 92.08% of the conserved BUSCO core gene set and was estimated to cover 94.65% of the genome. The obtained genes were compared with previously registered Scutellaria nucleotide sequences and similar regions using the NCBI BLAST service, and a total of 279 similar nucleotide sequences were detected. By selecting the 279 similar nucleotide sequences and nine chloroplast DNA barcode genes, primers were prepared so that the size of the PCR product was 100 to 1000 bp. As a result, a species-specific primer set capable of distinguishing five species of Scutellaria L. was developed.

Download Full-text

Draft Genome Assembly and Annotation of Red Raspberry Rubus Idaeus

10.1101/546135 ◽

2019 ◽

Cited By ~ 4

Author(s):

Haley Wight ◽

Junhui Zhou ◽

Muzi Li ◽

Sridhar Hannenhalli ◽

Stephen M. Mount ◽

...

Keyword(s):

De Novo ◽

Draft Genome ◽

Rubus Idaeus ◽

Slow Process ◽

Red Raspberry ◽

Protein Coding ◽

Draft Genome Assembly ◽

Protein Coding Genes ◽

A Genome ◽

Exceptional Value

AbstractThe red raspberry, Rubus idaeus, is widely distributed in all temperate regions of Europe, Asia, and North America and is a major commercial fruit valued for its taste, high antioxidant and vitamin content. However, Rubus breeding is a long and slow process hampered by limited genomic and molecular resources. Genomic resources such as a complete genome sequencing and transcriptome will be of exceptional value to improve research and breeding of this high value crop. Using a hybrid sequence assembly approach including data from both long and short sequence reads, we present the first assembly of the Rubus idaeus genome (Joan J. variety). The de novo assembled genome consists of 2,145 scaffolds with a genome completeness of 95.3% and an N50 score of 638 KB. Leveraging a linkage map, we anchored 80.1% of the genome onto seven chromosomes. Using over 1 billion paired-end RNAseq reads, we annotated 35,566 protein coding genes with a transcriptome completeness score of 97.2%. The Rubus idaeus genome provides an important new resource for researchers and breeders.

Download Full-text

Germline mosaicism of a missense variant in KCNC2 in a multiplex family with autism and epilepsy

10.1101/2021.12.06.21264306 ◽

2021 ◽

Author(s):

Elvisa Mehinovic ◽

Teddi Gray ◽

Meghan Campbell ◽

Jenny Ekholm ◽

Aaron Wenger ◽

...

Keyword(s):

De Novo ◽

Copy Number Variants ◽

Missense Variant ◽

Missense Mutations ◽

Sequencing Data ◽

Multiplex Family ◽

Protein Coding ◽

Germline Mosaicism ◽

Current Decay ◽

Long Read

ABSTRACTCurrently, protein-coding de novo variants and large copy number variants have been identified as important for ∼30% of individuals with autism. One approach to identify relevant variation in individuals who lack these types of events is by utilizing newer genomic technologies. In this study, highly accurate PacBio HiFi long-read sequencing was applied to a family with autism, treatment-refractory epilepsy, cognitive impairment, and mild dysmorphic features (two affected female full siblings, parents, and one unaffected sibling) with no known clinical variant. From our long-read sequencing data, a de novo missense variant in the KCNC2 gene (encodes Kv3.2 protein) was identified in both affected children. This variant was phased to the paternal chromosome of origin and is likely a germline mosaic. In silico assessment of the variant revealed it was in the top 0.05% of all conserved bases in the genome, and was predicted damaging by Polyphen2, MutationTaster, and SIFT. It was not present in any controls from public genome databases nor in a joint-call set we generated across 49 individuals with publicly available PacBio HiFi data. This specific missense mutation (Val473Ala) has been shown in both an ortholog and paralog of Kv3.2 to accelerate current decay, shift the voltage dependence of activation, and prevent the channel from entering a long-lasting open state. Seven additional missense mutations have been identified in other individuals with neurodevelopmental disorders (p = 1.03 × 10−5). KCNC2 is most highly expressed in the brain; in particular, in the thalamus and is enriched in GABAergic neurons. Long-read sequencing was useful in discovering the relevant variant in this family with autism that had remained a mystery for several years and will potentially have great benefits in the clinic once it is widely available.

Download Full-text

A chromosomal-level genome assembly for the giant African snail Achatina fulica

GigaScience ◽

10.1093/gigascience/giz124 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 7

Author(s):

Yunhai Guo ◽

Yi Zhang ◽

Qin Liu ◽

Yun Huang ◽

Guangyao Mao ◽

...

Keyword(s):

Pacific Islands ◽

High Growth ◽

Reproductive Capacity ◽

Genome Comparison ◽

Achatina Fulica ◽

Sequencing Data ◽

Protein Coding ◽

Sequencing Platform ◽

Protein Coding Genes ◽

Terrestrial Mollusk

AbstractBackgroundAchatina fulica, the giant African snail, is the largest terrestrial mollusk species. Owing to its voracious appetite, wide environmental adaptability, high growth rate, and reproductive capacity, it has become an invasive species across the world, mainly in Southeast Asia, Japan, the western Pacific islands, and China. This pest can damage agricultural crops and is an intermediate host of many parasites that can threaten human health. However, genomic information of A. fulica remains limited, hindering genetic and genomic studies for invasion control and management of the species.FindingsUsing a k-mer–based method, we estimated the A. fulica genome size to be 2.12 Gb, with a high repeat content up to 71%. Roughly 101.6 Gb genomic long-read data of A. fulica were generated from the Pacific Biosciences sequencing platform and assembled to produce a first A. fulica genome of 1.85 Gb with a contig N50 length of 726 kb. Using contact information from the Hi-C sequencing data, we successfully anchored 99.32% contig sequences into 31 chromosomes, leading to the final contig and scaffold N50 length of 721 kb and 59.6 Mb, respectively. The continuity, completeness, and accuracy were evaluated by genome comparison with other mollusk genomes, BUSCO assessment, and genomic read mapping. A total of 23,726 protein-coding genes were predicted from the assembled genome, among which 96.34% of the genes were functionally annotated. The phylogenetic analysis using whole-genome protein-coding genes revealed that A. fulica separated from a common ancestor with Biomphalaria glabrata ∼182 million years ago.ConclusionTo our knowledge, the A. fulica genome is the first terrestrial mollusk genome published to date. The chromosome sequence of A. fulica will provide the research community with a valuable resource for population genetics and environmental adaptation studies for the species, as well as investigations of the chromosome-level of evolution within mollusks.

Download Full-text

PHASIS: A computational suite for de novo discovery and characterization of phased, siRNA-generating loci and their miRNA triggers

10.1101/158832 ◽

2017 ◽

Cited By ~ 7

Author(s):

Atul Kakrana ◽

Pingchuan Li ◽

Parth Patel ◽

Reza Hammond ◽

Deepti Anand ◽

...

Keyword(s):

De Novo ◽

Sequencing Data ◽

Protein Coding ◽

Secondary Sirnas ◽

Integrated Methods ◽

Non Coding Rnas

AbstractPhased, secondary siRNAs (phasiRNAs) are found widely in plants, from protein-coding transcripts and long, non-coding RNAs; animal piRNAs are also phased. Integrated methods characterizing “PHAS” loci are unavailable, and existing methods are quite limited and inefficient in handling large volumes of sequencing data. The PHASIS suite described here provides complete tools for the computational characterization of PHAS loci, with an emphasis on plants, in which these loci are numerous. Benchmarked comparisons demonstrate that PHASIS is sensitive, highly scalable and fast. Importantly, PHASIS eliminates the requirement of a sequenced genome and PARE/degradome data for discovery of phasiRNAs and their miRNA triggers.

Download Full-text