Defining Blood Group Gene Reference Alleles by Long-Read Sequencing: Proof of Concept in the ACKR1 Gene Encoding the Duffy Antigens

Yann Fichou; Isabelle Berlivet; Gaëlle Richard; Christophe Tournamille; Lilian Castilho; Claude Férec

doi:10.1159/000504584

Defining Blood Group Gene Reference Alleles by Long-Read Sequencing: Proof of Concept in the ACKR1 Gene Encoding the Duffy Antigens

Transfusion Medicine and Hemotherapy ◽

10.1159/000504584 ◽

2019 ◽

Vol 47 (1) ◽

pp. 23-32 ◽

Cited By ~ 2

Author(s):

Yann Fichou ◽

Isabelle Berlivet ◽

Gaëlle Richard ◽

Christophe Tournamille ◽

Lilian Castilho ◽

...

Keyword(s):

Blood Group ◽

Single Molecule ◽

Pcr Amplification ◽

Null Alleles ◽

Sequencing Technology ◽

Gene Encoding ◽

Next Generation Sequencing Technology ◽

Sequencing Technologies ◽

Long Read ◽

Long Range Pcr

Background: In the novel era of blood group genomics, (re-)defining reference gene/allele sequences of blood group genes has become an important goal to achieve, both for diagnostic and research purposes. As novel potent sequencing technologies are available, we thought to investigate the variability encountered in the three most common alleles of ACKR1, the gene encoding the clinically relevant Duffy antigens, at the haplotype level by a long-read sequencing approach. Materials and Methods: After long-range PCR amplification spanning the whole ACKR1 gene locus (∼2.5 kilobases), amplicons generated from 81 samples with known genotypes were sequenced in a single read by using the Pacific Biosciences (PacBio) single molecule, real-time (SMRT) sequencing technology. Results: High-quality sequencing reads were obtained for the 162 alleles (accuracy >0.999). Twenty-two nucleotide variations reported in databases were identified, defining 19 haplotypes: four, eight, and seven haplotypes in 46 ACKR1*01, 63 ACKR1*02, and 53 ACKR1*02N.01 alleles, respectively. Discussion: Overall, we have defined a subset of reference alleles by third-generation (long-read) sequencing. This technology, which provides a “longitudinal” overview of the loci of interest (several thousand base pairs) and is complementary to the second-generation (short-read) next-generation sequencing technology, is of critical interest for resolving novel, rare, and null alleles.

Download Full-text

Review on the Development and Applications of Medicinal Plant Genomes

Frontiers in Plant Science ◽

10.3389/fpls.2021.791219 ◽

2021 ◽

Vol 12 ◽

Author(s):

Qi-Qing Cheng ◽

Yue Ouyang ◽

Zi-Yu Tang ◽

Chi-Chou Lao ◽

Yan-Yu Zhang ◽

...

Keyword(s):

Medicinal Plants ◽

Medicinal Plant ◽

Sequencing Technology ◽

Effective Utilization ◽

Next Generation Sequencing Technology ◽

Plant Genomes ◽

Sequencing Technologies ◽

Long Read ◽

Genetic Level ◽

Sustainable Protection

With the development of sequencing technology, the research on medicinal plants is no longer limited to the aspects of chemistry, pharmacology, and pharmacodynamics, but reveals them from the genetic level. As the price of next-generation sequencing technology becomes affordable, and the long-read sequencing technology is established, the medicinal plant genomes with large sizes have been sequenced and assembled more easily. Although the review of plant genomes has been reported several times, there is no review giving a systematic and comprehensive introduction about the development and application of medicinal plant genomes that have been reported until now. Here, we provide a historical perspective on the current situation of genomes in medicinal plant biology, highlight the use of the rapidly developing sequencing technologies, and conduct a comprehensive summary on how the genomes apply to solve the practical problems in medicinal plants, like genomics-assisted herb breeding, evolution history revelation, herbal synthetic biology study, and geoherbal research, which are important for effective utilization, rational use and sustainable protection of medicinal plants.

Download Full-text

Comparative Transcriptome Profiling of Disruptive Technology, Single- Molecule Direct RNA Sequencing

Current Bioinformatics ◽

10.2174/1574893614666191017154427 ◽

2020 ◽

Vol 15 (2) ◽

pp. 165-172

Author(s):

Chaithra Pradeep ◽

Dharam Nandan ◽

Arya A. Das ◽

Dinesh Velayutham

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Transcriptome Assembly ◽

Transcriptome Profiling ◽

Read Length ◽

Complex Nature ◽

Disruptive Technology ◽

Sequencing Technology ◽

Sequencing Technologies ◽

Long Read

Background: The standard approach for transcriptomic profiling involves high throughput short-read sequencing technology, mainly dominated by Illumina. However, the short reads have limitations in transcriptome assembly and in obtaining full-length transcripts due to the complex nature of transcriptomes with variable length and multiple alternative spliced isoforms. Recent advances in long read sequencing by the Oxford Nanopore Technologies (ONT) offered both cDNA as well as direct RNA sequencing and has brought a paradigm change in the sequencing technology to greatly improve the assembly and expression estimates. ONT enables molecules to be sequenced without fragmentation resulting in ultra-long read length enabling the entire genes and transcripts to be fully characterized. The direct RNA sequencing method, in addition, circumvents the reverse transcription and amplification steps. Objective: In this study, RNA sequencing methods were assessed by comparing data from Illumina (ILM), ONT cDNA (OCD) and ONT direct RNA (ODR). Methods: The sensitivity & specificity of the isoform detection was determined from the data generated by Illumina, ONT cDNA and ONT direct RNA sequencing technologies using Saccharomyces cerevisiae as model. Comparative studies were conducted with two pipelines to detect the isoforms, novel genes and variable gene length. Results: Mapping metrics and qualitative profiles for different pipelines are presented to understand these disruptive technologies. The variability in sequencing technology and the analysis pipeline were studied.

Download Full-text

Survey of the Bradysia odoriphaga Transcriptome Using PacBio Single-Molecule Long-Read Sequencing

Genes ◽

10.3390/genes10060481 ◽

2019 ◽

Vol 10 (6) ◽

pp. 481 ◽

Cited By ~ 1

Author(s):

Chen ◽

Lin ◽

Xie ◽

Zhong ◽

Zhang ◽

...

Keyword(s):

Insecticide Resistance ◽

Single Molecule ◽

Functional Categories ◽

Genetic Studies ◽

Sequencing Technologies ◽

Clusters Of Orthologous Groups ◽

Long Read ◽

Main Gene ◽

First Time ◽

Main Factor

The damage caused by Bradysia odoriphaga is the main factor threatening the production of vegetables in the Liliaceae family. However, few genetic studies of B. odoriphaga have been conducted because of a lack of genomic resources. Many long-read sequencing technologies have been developed in the last decade; therefore, in this study, the transcriptome including all development stages of B. odoriphaga was sequenced for the first time by Pacific single-molecule long-read sequencing. Here, 39,129 isoforms were generated, and 35,645 were found to have annotation results when checked against sequences available in different databases. Overall, 18,473 isoforms were distributed in 25 various Clusters of Orthologous Groups, and 11,880 isoforms were categorized into 60 functional groups that belonged to the three main Gene Ontology classifications. Moreover, 30,610 isoforms were assigned into 44 functional categories belonging to six main Kyoto Encyclopedia of Genes and Genomes functional categories. Coding DNA sequence (CDS) prediction showed that 36,419 out of 39,129 isoforms were predicted to have CDS, and 4319 simple sequence repeats were detected in total. Finally, 266 insecticide resistance and metabolism-related isoforms were identified as candidate genes for further investigation of insecticide resistance and metabolism in B. odoriphaga.

Download Full-text

Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab034 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Jean-Marc Aury ◽

Benjamin Istace

Keyword(s):

Single Molecule ◽

Direct Consequence ◽

High Quality ◽

Sequencing Errors ◽

Coding Regions ◽

Sequencing Technologies ◽

Long Reads ◽

Oxford Nanopore ◽

Long Read ◽

Genome Assemblies

Abstract Single-molecule sequencing technologies have recently been commercialized by Pacific Biosciences and Oxford Nanopore with the promise of sequencing long DNA fragments (kilobases to megabases order) and then, using efficient algorithms, provide high quality assemblies in terms of contiguity and completeness of repetitive regions. However, the error rate of long-read technologies is higher than that of short-read technologies. This has a direct consequence on the base quality of genome assemblies, particularly in coding regions where sequencing errors can disrupt the coding frame of genes. In the case of diploid genomes, the consensus of a given gene can be a mixture between the two haplotypes and can lead to premature stop codons. Several methods have been developed to polish genome assemblies using short reads and generally, they inspect the nucleotide one by one, and provide a correction for each nucleotide of the input assembly. As a result, these algorithms are not able to properly process diploid genomes and they typically switch from one haplotype to another. Herein we proposed Hapo-G (Haplotype-Aware Polishing Of Genomes), a new algorithm capable of incorporating phasing information from high-quality reads (short or long-reads) to polish genome assemblies and in particular assemblies of diploid and heterozygous genomes.

Download Full-text

Mapping and phasing of structural variation in patient genomes using nanopore sequencing

10.1101/129379 ◽

2017 ◽

Cited By ~ 4

Author(s):

Mircea Cretu Stancu ◽

Markus J. van Roosmalen ◽

Ivo Renkens ◽

Marleen Nieboer ◽

Sjors Middelkamp ◽

...

Keyword(s):

Single Molecule ◽

De Novo ◽

Structural Variants ◽

Human Genetic Disease ◽

Structural Genomic ◽

Short Read ◽

Sequencing Technologies ◽

Genome Wide ◽

Long Read ◽

Complex Structural

AbstractStructural genomic variants form a common type of genetic alteration underlying human genetic disease and phenotypic variation. Despite major improvements in genome sequencing technology and data analysis, the detection of structural variants still poses challenges, particularly when variants are of high complexity. Emerging long-read single-molecule sequencing technologies provide new opportunities for detection of structural variants. Here, we demonstrate sequencing of the genomes of two patients with congenital abnormalities using the ONT MinION at 11x and 16x mean coverage, respectively. We developed a bioinformatic pipeline - NanoSV - to efficiently map genomic structural variants (SVs) from the long-read data. We demonstrate that the nanopore data are superior to corresponding short-read data with regard to detection of de novo rearrangements originating from complex chromothripsis events in the patients. Additionally, genome-wide surveillance of SVs, revealed 3,253 (33%) novel variants that were missed in short-read data of the same sample, the majority of which are duplications < 200bp in size. Long sequencing reads enabled efficient phasing of genetic variations, allowing the construction of genome-wide maps of phased SVs and SNVs. We employed read-based phasing to show that all de novo chromothripsis breakpoints occurred on paternal chromosomes and we resolved the long-range structure of the chromothripsis. This work demonstrates the value of long-read sequencing for screening whole genomes of patients for complex structural variants.

Download Full-text

LAMBDR: Long-range amplification and Nanopore sequencing of the Mycobacterium bovis direct-repeat region. A novel method for in-silico spoligotyping of M. bovis directly from badger faeces

10.1101/791129 ◽

2019 ◽

Cited By ~ 1

Author(s):

R.S. James ◽

E.R. Travis ◽

A. D. Millard ◽

P.C. Hewlett ◽

L. Kravar-Garde ◽

...

Keyword(s):

Long Range ◽

Mycobacterium Bovis ◽

Pcr Amplification ◽

Direct Repeat ◽

Repeat Region ◽

Culture Independent ◽

Long Read ◽

Novel Method ◽

Long Range Pcr ◽

Template Dna

AbstractThe environment is an overlooked source of Mycobacterium bovis, the causative agent of bovine TB. Long read, end to end sequencing of variable repeat regions across the M. bovis genome was evaluated as a method of acquiring rapid strain level resolution directly from environmental samples. Eight samples of M. bovis, two BCG strains (Danish and Pasteur), and a single M. tuberculosis type culture (NCTC 13144) were used to generate data for this method. Long range PCR amplification of the direct repeat region was used to synthesize ∼5kb template DNA for onward sequence analysis. This has permitted culture independent identification of M. bovis spoligotypes present in the environment. Sequence level analysis of the direct repeat region showed that spoligotyping may underestimate strain diversity due to the inability to identify both SNPs and primer binding mutations using a biotinylated hybridisation approach.

Download Full-text

ISOdb: A Comprehensive Database of Full-Length Isoforms Generated by Iso-Seq

International Journal of Genomics ◽

10.1155/2018/9207637 ◽

2018 ◽

Vol 2018 ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Shang-Qian Xie ◽

Yue Han ◽

Xiao-Zhou Chen ◽

Tai-Yu Cao ◽

Kai-Kai Ji ◽

...

Keyword(s):

Single Molecule ◽

Full Length ◽

Public Access ◽

Transcript Isoforms ◽

Sequencing Technologies ◽

Long Reads ◽

Depth Analysis ◽

Gene Level ◽

Long Read ◽

Full Length Transcript

The accurate landscape of transcript isoforms plays an important role in the understanding of gene function and gene regulation. However, building complete transcripts is very challenging for short reads generated using next-generation sequencing. Fortunately, isoform sequencing (Iso-Seq) using single-molecule sequencing technologies, such as PacBio SMRT, provides long reads spanning entire transcript isoforms which do not require assembly. Therefore, we have developed ISOdb, a comprehensive resource database for hosting and carrying out an in-depth analysis of Iso-Seq datasets and visualising the full-length transcript isoforms. The current version of ISOdb has collected 93 publicly available Iso-Seq samples from eight species and presents the samples in two levels: (1) sample level, including metainformation, long read distribution, isoform numbers, and alternative splicing (AS) events of each sample; (2) gene level, including the total isoforms, novel isoform number, novel AS number, and isoform visualisation of each gene. In addition, ISOdb provides a user interface in the website for uploading sample information to facilitate the collection and analysis of researchers’ datasets. Currently, ISOdb is the first repository that offers comprehensive resources and convenient public access for hosting, analysing, and visualising Iso-Seq data, which is freely available.

Download Full-text

Genetic Adaptation of Porcine Circovirus Type 1 to Cultured Porcine Kidney Cells Revealed by Single-Molecule Long-Read Sequencing Technology

Genome Announcements ◽

10.1128/genomea.01539-16 ◽

2017 ◽

Vol 5 (5) ◽

Cited By ~ 1

Author(s):

Dóra Tombácz ◽

Norbert Moldován ◽

Zsolt Balázs ◽

Zsolt Csabai ◽

Michael Snyder ◽

...

Keyword(s):

Single Molecule ◽

Porcine Circovirus Type ◽

Porcine Circovirus ◽

Genetic Adaptation ◽

Kidney Cells ◽

Porcine Kidney ◽

Sequencing Technology ◽

Sequencing Platform ◽

Long Read

ABSTRACT Porcine circovirus type 1 (PCV1) is a nonpathogenic circovirus, and a contaminant of the porcine kidney (PK-15) cell line. We present the complete and annotated genome sequence of strain Szeged of PCV1, determined by Pacific Biosciences RSII long-read sequencing platform.

Download Full-text

Quantifying the Benefit Offered by Transcript Assembly on Single-Molecule Long Reads

10.1101/632703 ◽

2019 ◽

Cited By ~ 1

Author(s):

Laura H. Tung ◽

Mingfu Shao ◽

Carl Kingsford

Keyword(s):

Single Molecule ◽

Error Rates ◽

Human Transcriptome ◽

Sequencing Technologies ◽

Third Generation Sequencing ◽

Long Reads ◽

Long Read ◽

Transcript Assembly ◽

Novel Isoforms ◽

Generation Sequencing

AbstractThird-generation sequencing technologies benefit transcriptome analysis by generating longer sequencing reads. However, not all single-molecule long reads represent full transcripts due to incomplete cDNA synthesis and the sequencing length limit of the platform. This drives a need for long read transcript assembly. We quantify the benefit that can be achieved by using a transcript assembler on long reads. Adding long-read-specific algorithms, we evolved Scallop to make Scallop-LR, a long-read transcript assembler, to handle the computational challenges arising from long read lengths and high error rates. Analyzing 26 SRA PacBio datasets using Scallop-LR, Iso-Seq Analysis, and StringTie, we quantified the amount by which assembly improved Iso-Seq results. Through combined evaluation methods, we found that Scallop-LR identifies 2100–4000 more (for 18 human datasets) or 1100–2200 more (for eight mouse datasets) known transcripts than Iso-Seq Analysis, which does not do assembly. Further, Scallop-LR finds 2.4–4.4 times more potentially novel isoforms than Iso-Seq Analysis for the human and mouse datasets. StringTie also identifies more transcripts than Iso-Seq Analysis. Adding long-read-specific optimizations in Scallop-LR increases the numbers of predicted known transcripts and potentially novel isoforms for the human transcriptome compared to several recent short-read assemblers (e.g. StringTie). Our findings indicate that transcript assembly by Scallop-LR can reveal a more complete human transcriptome.

Download Full-text

Prospects for the use of third generation sequencers for quantitative profiling of transcriptome

Biomedical Chemistry Research and Methods ◽

10.18097/bmcrm00086 ◽

2018 ◽

Vol 1 (4) ◽

pp. e00086

Author(s):

S.P. Radko ◽

L.K. Kurbatov ◽

K.G. Ptitsyn ◽

Y.Y. Kiseleva ◽

E.A. Ponomarenko ◽

...

Keyword(s):

Single Molecule ◽

Transcriptome Profiling ◽

Third Generation ◽

Sequencing Technology ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Oxford Nanopore ◽

Biotechnology Companies ◽

Oxford Nanopore Technologies ◽

Quantitative Profiling

Transcriptome profiling is widely employed to analyze transcriptome dynamics when studying various biological processes at the cell and tissue levels. Unlike the second generation sequencers, which sequence relatively short fragments of nucleic acids, the third generation DNA/RNA sequencers developed by biotechnology companies “PacBio” and “Oxford Nanopore Technologies” allow one to sequence transcripts as single molecules and may be considered as potential molecular counters capable to measure the number of copies of each transcript with high throughput, sensitivity, and specificity. In the present review, the features of single molecule sequencing technologies offered by “PacBio” and “Oxford Nanopore Technologies” are considered alongside with their utility for transcriptome analysis, including the analysis of transcript isoforms. The prospects and limitations of the single molecule sequencing technology in application to quantitative transcriptome profiling are also discussed.

Download Full-text