scholarly journals The sequence and de novo assembly of Oxygymnocypris stewartii genome

2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Hai-Ping Liu ◽  
Shi-Jun Xiao ◽  
Nan Wu ◽  
Di Wang ◽  
Yan-Chao Liu ◽  
...  

Abstract Animal genomes in the Qinghai-Tibetan Plateau provide valuable resources for scientists to understand the molecular mechanism of environmental adaptation. Tibetan fish species play essential roles in the local ecology; however, the genomic information for native fishes was still insufficient. Oxygymnocypris stewartii, belonging to Oxygymnocypris genus, Schizothoracinae subfamily, is a native fish in the Tibetan plateau living within the elevation from roughly 3,000 m to 4,200 m. In this report, PacBio and Illumina sequencing platform were used to generate ~385.3 Gb genomic sequencing data. A genome of about 1,849.2 Mb was obtained with a contig N50 length of 257.1 kb. More than 44.5% of the genome were identified as repetitive elements, and 46,400 protein-coding genes were annotated in the genome. The assembled genome can be used as a reference for future population genetic studies of O. stewartii and will improve our understanding of high altitude adaptation of fishes in the Qinghai-Tibetan Plateau.

2011 ◽  
Vol 23 (1) ◽  
pp. 75 ◽  
Author(s):  
Thomas Werner

Reproduction and fertility are controlled by specific events naturally linked to oocytes, testes and early embryonal tissues. A significant part of these events involves gene expression, especially transcriptional control and alternative transcription (alternative promoters and alternative splicing). While methods to analyse such events for carefully predetermined target genes are well established, until recently no methodology existed to extend such analyses into a genome-wide de novo discovery process. With the arrival of next generation sequencing (NGS) it becomes possible to attempt genome-wide discovery in genomic sequences as well as whole transcriptomes at a single nucleotide level. This does not only allow identification of the primary changes (e.g. alternative transcripts) but also helps to elucidate the regulatory context that leads to the induction of transcriptional changes. This review discusses the basics of the new technological and scientific concepts arising from NGS, prominent differences from microarray-based approaches and several aspects of its application to reproduction and fertility research. These concepts will then be illustrated in an application example of NGS sequencing data analysis involving postimplantation endometrium tissue from cows.


2017 ◽  
Vol 5 (28) ◽  
Author(s):  
Su-Yeon Lee ◽  
Ji-eun An ◽  
Sun-Hwa Ryu ◽  
Myungkil Kim

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.


2015 ◽  
Vol 112 (44) ◽  
pp. 13615-13620 ◽  
Author(s):  
Yuval Itan ◽  
Lei Shang ◽  
Bertrand Boisson ◽  
Etienne Patin ◽  
Alexandre Bolze ◽  
...  

The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.


2017 ◽  
Vol 114 (16) ◽  
pp. 4189-4194 ◽  
Author(s):  
Jian Yang ◽  
Zi-Bing Jin ◽  
Jie Chen ◽  
Xiu-Feng Huang ◽  
Xiao-Man Li ◽  
...  

Indigenous Tibetan people have lived on the Tibetan Plateau for millennia. There is a long-standing question about the genetic basis of high-altitude adaptation in Tibetans. We conduct a genome-wide study of 7.3 million genotyped and imputed SNPs of 3,008 Tibetans and 7,287 non-Tibetan individuals of Eastern Asian ancestry. Using this large dataset, we detect signals of high-altitude adaptation at nine genomic loci, of which seven are unique. The alleles under natural selection at two of these loci [methylenetetrahydrofolate reductase (MTHFR) and EPAS1] are strongly associated with blood-related phenotypes, such as hemoglobin, homocysteine, and folate in Tibetans. The folate-increasing allele of rs1801133 at the MTHFR locus has an increased frequency in Tibetans more than expected under a drift model, which is probably a consequence of adaptation to high UV radiation. These findings provide important insights into understanding the genomic consequences of high-altitude adaptation in Tibetans.


2018 ◽  
Author(s):  
M Su ◽  
K Wander ◽  
MK Shenk ◽  
T Blumenfield ◽  
H Li ◽  
...  

AbstractHuman populations native to high altitude regions (≥2500 m) exhibit numerous adaptations to hypoxic stress. On the Tibetan Plateau, these include modifications of the hypoxia inducible factor (HIF) pathway to essentially uncouple erythropoiesis (red blood cell production) and blood hemoglobin (Hb) concentration—which normally increase in response to low oxygen—from hypoxia. Uncoupling of erythropoiesis and hypoxia is also observed among people with diabetes due to damage to kidney tissues. This is hypothesized to result in elevated risk for anemia among diabetics, which increases risk for cardiovascular disease and death. We tested the hypothesis that the independence of erythropoiesis from HIF among high-altitude adapted populations of the Tibetan Plateau may protect against diabetes-associated anemia. We investigated this hypothesis among the Mosuo, a population living in Yunnan Province, China (at ~2800 m altitude) that is undergoing rapid market integration and lifestyle change, with concomitant increase in risk for type 2 diabetes. We found that, although diabetes (glycated hemoglobin, HbA1c ≥6.5%) is associated with anemia (females: Hb<12g/dl; males: Hb<13g/dl) among the Chinese population as a whole (N: 5,606; OR: 1.48; p: 0.008), this is not the case among the Mosuo (N: 316; OR: 1.36; p: 0.532). Both pathways uncoupling hypoxia from erythropoiesis (diabetic disease and high altitude adaptation) are incompletely understood; their intersection in protecting Mosuo with diabetes from anemia may provide insight into the mechanisms underlying each. Further, these findings point to the importance of understanding how high-altitude adaptations interact with chronic disease processes, as populations like the Mosuo experience rapid market integration.


2021 ◽  
Vol 43 (3) ◽  
pp. 2177-2188
Author(s):  
Hakjoon Choi ◽  
Wan Seok Kang ◽  
Jin Seok Kim ◽  
Chang-Su Na ◽  
Sunoh Kim

Scutellaria L. (family Lamiaceae) includes approximately 470 species found in most parts of the world and is commonly known as skullcaps. Scutellaria L. is a medicinal herb used as a folk remedy in Korea and East Asia, but it is difficult to identify and classify various subspecies by morphological methods. Since Scutellaria L. has not been studied genetically, to expand the knowledge of species in the genus Scutellaria L., de novo whole-genome assembly was performed in Scutellaria indica var. tsusimensis (H. Hara) Ohwi using the Illumina sequencing platform. We aimed to develop a molecular method that could be used to classify S.indica var. tsusimensis (H. Hara) Ohwi, S. indica L. and three other Scutellaria L. species. The assembly results for S.indica var. tsusimensis (H. Hara) Ohwi revealed a genome size of 318,741,328 bp and a scaffold N50 of 78,430. The assembly contained 92.08% of the conserved BUSCO core gene set and was estimated to cover 94.65% of the genome. The obtained genes were compared with previously registered Scutellaria nucleotide sequences and similar regions using the NCBI BLAST service, and a total of 279 similar nucleotide sequences were detected. By selecting the 279 similar nucleotide sequences and nine chloroplast DNA barcode genes, primers were prepared so that the size of the PCR product was 100 to 1000 bp. As a result, a species-specific primer set capable of distinguishing five species of Scutellaria L. was developed.


2019 ◽  
Author(s):  
Haley Wight ◽  
Junhui Zhou ◽  
Muzi Li ◽  
Sridhar Hannenhalli ◽  
Stephen M. Mount ◽  
...  

AbstractThe red raspberry, Rubus idaeus, is widely distributed in all temperate regions of Europe, Asia, and North America and is a major commercial fruit valued for its taste, high antioxidant and vitamin content. However, Rubus breeding is a long and slow process hampered by limited genomic and molecular resources. Genomic resources such as a complete genome sequencing and transcriptome will be of exceptional value to improve research and breeding of this high value crop. Using a hybrid sequence assembly approach including data from both long and short sequence reads, we present the first assembly of the Rubus idaeus genome (Joan J. variety). The de novo assembled genome consists of 2,145 scaffolds with a genome completeness of 95.3% and an N50 score of 638 KB. Leveraging a linkage map, we anchored 80.1% of the genome onto seven chromosomes. Using over 1 billion paired-end RNAseq reads, we annotated 35,566 protein coding genes with a transcriptome completeness score of 97.2%. The Rubus idaeus genome provides an important new resource for researchers and breeders.


2021 ◽  
Author(s):  
Elvisa Mehinovic ◽  
Teddi Gray ◽  
Meghan Campbell ◽  
Jenny Ekholm ◽  
Aaron Wenger ◽  
...  

ABSTRACTCurrently, protein-coding de novo variants and large copy number variants have been identified as important for ∼30% of individuals with autism. One approach to identify relevant variation in individuals who lack these types of events is by utilizing newer genomic technologies. In this study, highly accurate PacBio HiFi long-read sequencing was applied to a family with autism, treatment-refractory epilepsy, cognitive impairment, and mild dysmorphic features (two affected female full siblings, parents, and one unaffected sibling) with no known clinical variant. From our long-read sequencing data, a de novo missense variant in the KCNC2 gene (encodes Kv3.2 protein) was identified in both affected children. This variant was phased to the paternal chromosome of origin and is likely a germline mosaic. In silico assessment of the variant revealed it was in the top 0.05% of all conserved bases in the genome, and was predicted damaging by Polyphen2, MutationTaster, and SIFT. It was not present in any controls from public genome databases nor in a joint-call set we generated across 49 individuals with publicly available PacBio HiFi data. This specific missense mutation (Val473Ala) has been shown in both an ortholog and paralog of Kv3.2 to accelerate current decay, shift the voltage dependence of activation, and prevent the channel from entering a long-lasting open state. Seven additional missense mutations have been identified in other individuals with neurodevelopmental disorders (p = 1.03 × 10−5). KCNC2 is most highly expressed in the brain; in particular, in the thalamus and is enriched in GABAergic neurons. Long-read sequencing was useful in discovering the relevant variant in this family with autism that had remained a mystery for several years and will potentially have great benefits in the clinic once it is widely available.


GigaScience ◽  
2019 ◽  
Vol 8 (10) ◽  
Author(s):  
Yunhai Guo ◽  
Yi Zhang ◽  
Qin Liu ◽  
Yun Huang ◽  
Guangyao Mao ◽  
...  

AbstractBackgroundAchatina fulica, the giant African snail, is the largest terrestrial mollusk species. Owing to its voracious appetite, wide environmental adaptability, high growth rate, and reproductive capacity, it has become an invasive species across the world, mainly in Southeast Asia, Japan, the western Pacific islands, and China. This pest can damage agricultural crops and is an intermediate host of many parasites that can threaten human health. However, genomic information of A. fulica remains limited, hindering genetic and genomic studies for invasion control and management of the species.FindingsUsing a k-mer–based method, we estimated the A. fulica genome size to be 2.12 Gb, with a high repeat content up to 71%. Roughly 101.6 Gb genomic long-read data of A. fulica were generated from the Pacific Biosciences sequencing platform and assembled to produce a first A. fulica genome of 1.85 Gb with a contig N50 length of 726 kb. Using contact information from the Hi-C sequencing data, we successfully anchored 99.32% contig sequences into 31 chromosomes, leading to the final contig and scaffold N50 length of 721 kb and 59.6 Mb, respectively. The continuity, completeness, and accuracy were evaluated by genome comparison with other mollusk genomes, BUSCO assessment, and genomic read mapping. A total of 23,726 protein-coding genes were predicted from the assembled genome, among which 96.34% of the genes were functionally annotated. The phylogenetic analysis using whole-genome protein-coding genes revealed that A. fulica separated from a common ancestor with Biomphalaria glabrata ∼182 million years ago.ConclusionTo our knowledge, the A. fulica genome is the first terrestrial mollusk genome published to date. The chromosome sequence of A. fulica will provide the research community with a valuable resource for population genetics and environmental adaptation studies for the species, as well as investigations of the chromosome-level of evolution within mollusks.


2017 ◽  
Author(s):  
Atul Kakrana ◽  
Pingchuan Li ◽  
Parth Patel ◽  
Reza Hammond ◽  
Deepti Anand ◽  
...  

AbstractPhased, secondary siRNAs (phasiRNAs) are found widely in plants, from protein-coding transcripts and long, non-coding RNAs; animal piRNAs are also phased. Integrated methods characterizing “PHAS” loci are unavailable, and existing methods are quite limited and inefficient in handling large volumes of sequencing data. The PHASIS suite described here provides complete tools for the computational characterization of PHAS loci, with an emphasis on plants, in which these loci are numerous. Benchmarked comparisons demonstrate that PHASIS is sensitive, highly scalable and fast. Importantly, PHASIS eliminates the requirement of a sequenced genome and PARE/degradome data for discovery of phasiRNAs and their miRNA triggers.


Sign in / Sign up

Export Citation Format

Share Document