scholarly journals Investigations of sequencing data and sample type on HLA class Ia typing with different computational tools

Author(s):  
Jian Yi ◽  
Longyun Chen ◽  
Yajie Xiao ◽  
Zhikun Zhao ◽  
Xiaofan Su

Abstract Human leukocyte antigen (HLA) can encode the human major histocompatibility complex (MHC) proteins and play a key role in adaptive and innate immunity. Emerging clinical evidences suggest that the presentation of tumor neoantigens and neoantigen-specific T cell response associated with MHC class I molecules are of key importance to activate the adaptive immune systemin cancer immunotherapy. Therefore, accurate HLA typing is very essential for the clinical application of immunotherapy. In this study, we conducted performance evaluations of 4 widely used HLA typing tools (OptiType, Phlat, Polysolver and seq2hla) for predicting HLA class Ia genes from WES and RNA-seq data of 28 cancer patients. HLA genotyping data using PCR-SBT method was firstly obtained as the golden standard and was subsequently compared with HLA typing data by using NGS techniques. For both WES data and RNA-seq data, OptiType showed the highest accuracy for HLA-Ia typing than the other 3 programs at 2-digit and 4-digit resolution. Additionally, HLA typing accuracy from WES data was higher than from RNA-seq data (99.11% for WES data versus 96.42% for RNA-seq data). The accuracy of HLA-Ia typing by OptiType can reach 100% with the average depth of HLA gene regions >20x. Besides, the accuracy of 2-digit and 4-digit HLA-Ia typing based on control samples was higher than tumor tissues. In conclusion, OptiType by using WES data from control samples with the high average depth (>20x) of HLA gene regions can present a probably superior performance for HLA-Ia typing, enabling its application in cancer immunotherapy.

BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Qian Liu ◽  
Yu Hu ◽  
Andres Stucky ◽  
Li Fang ◽  
Jiang F. Zhong ◽  
...  

Abstract Background Long-read RNA-Seq techniques can generate reads that encompass a large proportion or the entire mRNA/cDNA molecules, so they are expected to address inherited limitations of short-read RNA-Seq techniques that typically generate < 150 bp reads. However, there is a general lack of software tools for gene fusion detection from long-read RNA-seq data, which takes into account the high basecalling error rates and the presence of alignment errors. Results In this study, we developed a fast computational tool, LongGF, to efficiently detect candidate gene fusions from long-read RNA-seq data, including cDNA sequencing data and direct mRNA sequencing data. We evaluated LongGF on tens of simulated long-read RNA-seq datasets, and demonstrated its superior performance in gene fusion detection. We also tested LongGF on a Nanopore direct mRNA sequencing dataset and a PacBio sequencing dataset generated on a mixture of 10 cancer cell lines, and found that LongGF achieved better performance to detect known gene fusions over existing computational tools. Furthermore, we tested LongGF on a Nanopore cDNA sequencing dataset on acute myeloid leukemia, and pinpointed the exact location of a translocation (previously known in cytogenetic resolution) in base resolution, which was further validated by Sanger sequencing. Conclusions In summary, LongGF will greatly facilitate the discovery of candidate gene fusion events from long-read RNA-Seq data, especially in cancer samples. LongGF is implemented in C++ and is available at https://github.com/WGLab/LongGF.


2021 ◽  
Author(s):  
Ram Ayyala ◽  
Junghyun Jung ◽  
Sergey Knyazev ◽  
SERGHEI MANGUL

Although precise identification of the human leukocyte antigen (HLA) allele is crucial for various clinical and research applications, HLA typing remains challenging due to high polymorphism of the HLA loci. However, with Next-Generation Sequencing (NGS) data becoming widely accessible, many computational tools have been developed to predict HLA types from RNA sequencing (RNA-seq) data. However, there is a lack of comprehensive and systematic benchmarking of RNA-seq HLA callers using large-scale and realist gold standards. In order to address this limitation, we rigorously compared the performance of 12 HLA callers over 50,000 HLA tasks including searching 30 pairwise combinations of HLA callers and reference in over 1,500 samples. In each case, we produced evaluation metrics of accuracy that is the percentage of correctly predicted alleles (two and four-digit resolution) based on six gold standard datasets spanning 650 RNA-seq samples. To determine the influence of the relationship of the read length over the HLA region on prediction quality using each tool, we explored the read length effect by considering read length in the range 37-126 bp, which was available in our gold standard datasets. Moreover, using the Genotype-Tissue Expression (GTEx) v8 data, we carried out evaluation metrics by calculating the concordance of the same HLA type across different tissues from the same individual to evaluate how well the HLA callers can maintain consistent results across various tissues of the same individual. This study offers crucial information for researchers regarding appropriate choices of methods for an HLA analysis.


2018 ◽  
Author(s):  
Juan Xie ◽  
Anjun Ma ◽  
Yu Zhang ◽  
Bingqiang Liu ◽  
Changlin Wan ◽  
...  

ABSTRACTThe combination of biclustering and large-scale gene expression data holds a promising potential for inference of the condition specific functional pathways/networks. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-sequencing (RNA-Seq) data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, e.g., the massive zeros or lowly expressed genes in the data, especially for single-cell RNA-Seq (scRNA-Seq) data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. Here we presented a novel biclustering algorithm namely QUBIC2, for the analysis of large-scale bulk RNA-Seq and scRNA-Seq data. Key novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression, (ii) adopted the mixture Gaussian distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes, (iii) utilized a Core-Dual strategy to identify biclusters and optimize relevant parameters, and (iv) developed a size-based P-value framework to evaluate the statistical significances of all the identified biclusters. Our method validation on comprehensive data sets of bulk and single cell RNA-seq data suggests that QUBIC2 had superior performance in functional modules detection and cell type classification compared with the other five widely-used biclustering tools. In addition, the applications of temporal and spatial data demonstrated that QUBIC2 can derive meaningful biological information from scRNA-Seq data. The source code for QUBIC2 can be freely accessed at https://github.com/maqin2001/qubic2.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Zeeshan Ahmed ◽  
Eduard Gibert Renart ◽  
Saman Zeeshan ◽  
XinQi Dong

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Yixin Kong ◽  
Ariangela Kozik ◽  
Cindy H. Nakatsu ◽  
Yava L. Jones-Hall ◽  
Hyonho Chun

Abstract A latent factor model for count data is popularly applied in deconvoluting mixed signals in biological data as exemplified by sequencing data for transcriptome or microbiome studies. Due to the availability of pure samples such as single-cell transcriptome data, the accuracy of the estimates could be much improved. However, the advantage quickly disappears in the presence of excessive zeros. To correctly account for this phenomenon in both mixed and pure samples, we propose a zero-inflated non-negative matrix factorization and derive an effective multiplicative parameter updating rule. In simulation studies, our method yielded the smallest bias. We applied our approach to brain gene expression as well as fecal microbiome datasets, illustrating the superior performance of the approach. Our method is implemented as a publicly available R-package, iNMF.


Viruses ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 343
Author(s):  
Manjin Li ◽  
Dan Xing ◽  
Duo Su ◽  
Di Wang ◽  
Heting Gao ◽  
...  

Dengue virus (DENV), a member of the Flavivirus genus of the Flaviviridae family, can cause dengue fever (DF) and more serious diseases and thus imposes a heavy burden worldwide. As the main vector of DENV, mosquitoes are a serious hazard. After infection, they induce a complex host–pathogen interaction mechanism. Our goal is to further study the interaction mechanism of viruses in homologous, sensitive, and repeatable C6/36 cell vectors. Transcriptome sequencing (RNA-Seq) technology was applied to the host transcript profiles of C6/36 cells infected with DENV2. Then, bioinformatics analysis was used to identify significant differentially expressed genes and the associated biological processes. Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) was performed to verify the sequencing data. A total of 1239 DEGs were found by transcriptional analysis of Aedes albopictus C6/36 cells that were infected and uninfected with dengue virus, among which 1133 were upregulated and 106 were downregulated. Further bioinformatics analysis showed that the upregulated DEGs were significantly enriched in signaling pathways such as the MAPK, Hippo, FoxO, Wnt, mTOR, and Notch; metabolic pathways and cellular physiological processes such as autophagy, endocytosis, and apoptosis. Downregulated DEGs were mainly enriched in DNA replication, pyrimidine metabolism, and repair pathways, including BER, NER, and MMR. The qRT-PCR results showed that the concordance between the RNA-Seq and RT-qPCR data was very high (92.3%). The results of this study provide more information about DENV2 infection of C6/36 cells at the transcriptome level, laying a foundation for further research on mosquito vector–virus interactions. These data provide candidate antiviral genes that can be used for further functional verification in the future.


Nanomaterials ◽  
2021 ◽  
Vol 11 (7) ◽  
pp. 1700
Author(s):  
In-Cheol Sun ◽  
SeongHoon Jo ◽  
Diego Dumani ◽  
Wan Su Yun ◽  
Hong Yeol Yoon ◽  
...  

Lymph node mapping is important in cancer immunotherapy because the morphology of lymph nodes is one of the crucial evaluation criteria of immune responses. We developed new theragnostic glycol-chitosan-coated gold nanoparticles (GC-AuNPs), which highlighted lymph nodes in ultrasound-guided photoacoustic (US/PA) imaging. Moreover, the ovalbumin epitope was conjugated GC-AuNPs (OVA-GC-AuNPs) for delivering tumor antigen to lymph node resident macrophage. In vitro studies proved the vigorous endocytosis activity of J774A.1 macrophage and consequent strong photoacoustic signals from them. The macrophages also presented a tumor antigen when OVA-GC-AuNPs were used for cellular uptake. After the lingual injection of GC-AuNPs into healthy mice, cervical lymph nodes were visible in a US/PA imaging system with high contrast. Three-dimensional analysis of lymph nodes revealed that the accumulation of GC-AuNPs in the lymph node increased as the post-injection time passed. Histological analysis showed GC-AuNPs or OVA-GC-AuNPs located in subcapsular and medullar sinuses where macrophages are abundant. Our new theragnostic GC-AuNPs present a superior performance in US/PA imaging of lymph nodes without targeting moieties or complex surface modification. Simultaneously, GC-AuNPs were able to deliver tumor antigens to cause macrophages to present the OVA epitope at targeted lymph nodes, which would be valuable for cancer immunotherapy.


Genes ◽  
2020 ◽  
Vol 11 (4) ◽  
pp. 397
Author(s):  
Dadong Deng ◽  
Xihong Tan ◽  
Kun Han ◽  
Ruimin Ren ◽  
Jianhua Cao ◽  
...  

The development of the placental fold, which increases the maternal–fetal interacting surface area, is of primary importance for the growth of the fetus throughout the whole pregnancy. However, the mechanisms involved remain to be fully elucidated. Increasing evidence has revealed that long non-coding RNAs (lncRNAs) are a new class of RNAs with regulatory functions and could be epigenetically regulated by histone modifications. In this study, 141 lncRNAs (including 73 up-regulated and 68 down-regulated lncRNAs) were identified to be differentially expressed in the placentas of pigs during the establishment and expanding stages of placental fold development. The differentially expressed lncRNAs and genes (DElncRNA-DEgene) co-expression network analysis revealed that these differentially expressed lncRNAs (DElncRNAs) were mainly enriched in pathways of cell adhesion, cytoskeleton organization, epithelial cell differentiation and angiogenesis, indicating that the DElncRNAs are related to the major events that occur during placental fold development. In addition, we integrated the RNA-seq (RNA sequencing) data with the ChIP-seq (chromatin immunoprecipitation sequencing) data of H3K4me3/H3K27ac produced from the placental samples of pigs from the two stages (gestational days 50 and 95). The analysis revealed that the changes in H3K4me3 and/or H3K27ac levels were significantly associated with the changes in the expression levels of 37 DElncRNAs. Furthermore, several H3K4me3/H3K27ac-lncRNAs were characterized to be significantly correlated with genes functionally related to placental development. Thus, this study provides new insights into understanding the mechanisms for the placental development of pigs.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.


Sign in / Sign up

Export Citation Format

Share Document