long fragment read
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 1)

H-INDEX

2
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Kenta Shirasawa ◽  
Nobuo Kobayashi ◽  
Akira Nakatsuka ◽  
Hideya Ohta ◽  
Sachiko Isobe

To enhance the genomics and genetics of azalea, the whole-genome sequences of two species of Rhododendron were determined and analyzed in this study: Rhododendron ripense, the cytoplasmic donor and ancestral species of large-flowered and evergreen azalea cultivars, respectively; and Rhododendron kiyosumense, a native of Chiba prefecture (Japan) seldomly bred and cultivated. A chromosome-level genome sequence assembly of R. ripense was constructed by single-molecule real-time (SMRT) sequencing and genetic mapping, while the genome sequence of R. kiyosumense was assembled using the single-tube long fragment read (stLFR) sequencing technology. The R. ripense genome assembly contained 319 contigs (506.7 Mb; N50 length: 2.5 Mb) and was assigned to the genetic map to establish 13 pseudomolecule sequences. On the other hand, the genome of R. kiyosumense was assembled into 32,308 contigs (601.9 Mb; N50 length: 245.7 kb). A total of 34,606 genes were predicted in the R. ripense genome, while 35,785 flower and 48,041 leaf transcript isoforms were identified in R. kiyosumense through Iso-Seq analysis. Overall, the genome sequence information generated in this study enhances our understanding of genome evolution in the Ericales and reveals the phylogenetic relationship of closely-related species. This information will also facilitate the development of phenotypically attractive azalea cultivars.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Chuanfeng Huang ◽  
Libin Shao ◽  
Shoufang Qu ◽  
Junhua Rao ◽  
Tao Cheng ◽  
...  

Abstract Sequencing technologies have been rapidly developed recently, leading to the breakthrough of sequencing-based clinical diagnosis, but accurate and complete genome variation benchmark would be required for further assessment of precision medicine applications. Despite the human cell line of NA12878 has been successfully developed to be a variation benchmark, population-specific variation benchmark is still lacking. Here, we established an Asian human variation benchmark by constructing and sequencing a stabilized cell line of a Chinese Han volunteer. By using seven different sequencing strategies, we obtained ~3.88 Tb clean data from different laboratories, hoping to reach the point of high sequencing depth and accurate variation detection. Through the combination of variations identified from different sequencing strategies and different analysis pipelines, we identified 3.35 million SNVs and 348.65 thousand indels, which were well supported by our sequencing data and passed our strict quality control, thus should be high confidence variation benchmark. Besides, we also detected 5,913 high-quality SNVs which had 969 sites were novel and  located in the high homologous regions supported by long-range information in both the co-barcoding single tube Long Fragment Read (stLFR) data and PacBio HiFi CCS data. Furthermore, by using the long reads data (stLFR and HiFi CCS), we were able to phase more than 99% heterozygous SNVs, which helps to improve the benchmark to be haplotype level. Our study provided comprehensive sequencing data as well as the integrated variation benchmark of an Asian derived cell line, which would be valuable for future sequencing-based clinical development.


2020 ◽  
Author(s):  
Xiao Du ◽  
Xiaoning Hong ◽  
Guangyi Fan ◽  
Xiaoyun Huang ◽  
Shuai Sun ◽  
...  

AbstractThe order Characiformes is one of the largest components of the freshwater teleost fauna inhabiting exclusively in South America and Africa with great ecological and economical significance. Yet, quite limited genomic resources are available to study this group and their transatlantic vicariance. In this study we present a chromosome-level genome assembly of the African pike (Hepsetus odoe), a representative member of the African Characiformes. To this end, we generated 119, 11, and 67 Gb reads using the single tube long fragment read (stLFR), Oxford Nanopore, and Hi-C sequencing technologies, respectively. We obtained an 862.1 Mb genome assembly with the contig and scaffold N50 of 347.4 kb and 25.8 Mb, respectively. Hi-C sequencing produced 29 chromosomes with 742.5 Mb, representing 86.1% of the genome. 24,314 protein-coding genes were predicted and 23,999 (98.7%) genes were functionally annotated. The chromosomal-scale genome assembly will be useful for functional and evolutionary studies of the African pike and promote the study of Characiformes speciation and evolution.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8431
Author(s):  
Jiancong Weng ◽  
Tian Chen ◽  
Yinlong Xie ◽  
Xun Xu ◽  
Gengyun Zhang ◽  
...  

Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies. We therefore designed IterCluster, a novel, alignment-free clustering algorithm that can cluster barcodes from the same target region of a genome, using -mer frequency-based features and a Markov Cluster (MCL) approach to identify enough reads in a target region of a genome to ensure sufficient target genome sequence depth. The IterCluster method was validated using BGI stLFR and 10X Genomics chromium reads datasets. IterCluster had a higher precision and recall rate on BGI stLFR data compared to 10X Genomics Chromium read data. In addition, we demonstrated how IterCluster improves the de novo assembly results when using a divide-and-conquer strategy on a human genome data set (scaffold/contig N50 = 13.2 kbp/7.1 kbp vs. 17.1 kbp/11.9 kbp before and after IterCluster, respectively). IterCluster provides a new way for determining LFR barcode enrichment and a novel approach for de novo assembly using LFR data. IterCluster is OpenSource and available on https://github.com/JianCong-WENG/IterCluster.


Sign in / Sign up

Export Citation Format

Share Document