Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations

HoJoon Lee; Ahmed Shuaibi; John M Bell; Dmitri S Pavlichin; Hanlee P Ji

doi:10.1093/narcan/zcaa034

Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations

NAR Cancer ◽

10.1093/narcan/zcaa034 ◽

2020 ◽

Vol 2 (4) ◽

Author(s):

HoJoon Lee ◽

Ahmed Shuaibi ◽

John M Bell ◽

Dmitri S Pavlichin ◽

Hanlee P Ji

Keyword(s):

Genome Sequencing ◽

Somatic Mutations ◽

Variant Calling ◽

Cancer Genome ◽

Cancer Genes ◽

Sequencing Data ◽

Deletion Mutations ◽

Insertion And Deletion ◽

Cancer Genome Sequencing ◽

Significant Difference

Abstract Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.

Download Full-text

Unique K-mer sequences for validating cancer-related substitution, insertion and deletion mutations

10.1101/2020.06.20.163113 ◽

2020 ◽

Author(s):

HoJoon Lee ◽

Ahmed Shuaibi ◽

John M. Bell ◽

Dmitri S. Pavlichin ◽

Hanlee P. Ji

Keyword(s):

Genome Sequencing ◽

Somatic Mutations ◽

Variant Calling ◽

Cancer Genome ◽

Sequencing Data ◽

Deletion Mutations ◽

Insertion And Deletion ◽

Cancer Genome Sequencing ◽

Significant Difference ◽

Model Tuning

ABSTRACTThe cancer genome sequencing has led to important discoveries such as identifying cancer gene. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.

Download Full-text

Comprehensive fundamental somatic variant calling and quality management strategies for human cancer genomes

Briefings in Bioinformatics ◽

10.1093/bib/bbaa083 ◽

2020 ◽

Author(s):

Xiaoyu He ◽

Shanyu Chen ◽

Ruilin Li ◽

Xinyin Han ◽

Zhipeng He ◽

...

Keyword(s):

Genome Sequencing ◽

High Throughput Sequencing ◽

Cancer Genomics ◽

Sequence Data ◽

Human Cancer ◽

Management Strategies ◽

Variant Calling ◽

Cancer Genome ◽

Sequencing Data ◽

Cancer Genome Sequencing

Abstract Next-generation sequencing (NGS) technology has revolutionised human cancer research, particularly via detection of genomic variants with its ultra-high-throughput sequencing and increasing affordability. However, the inundation of rich cancer genomics data has resulted in significant challenges in its exploration and translation into biological insights. One of the difficulties in cancer genome sequencing is software selection. Currently, multiple tools are widely used to process NGS data in four stages: raw sequence data pre-processing and quality control (QC), sequence alignment, variant calling and annotation and visualisation. However, the differences between these NGS tools, including their installation, merits, drawbacks and application, have not been fully appreciated. Therefore, a systematic review of the functionality and performance of NGS tools is required to provide cancer researchers with guidance on software and strategy selection. Another challenge is the multidimensional QC of sequencing data because QC can not only report varied sequence data characteristics but also reveal deviations in diverse features and is essential for a meaningful and successful study. However, monitoring of QC metrics in specific steps including alignment and variant calling is neglected in certain pipelines such as the ‘Best Practices Workflows’ in GATK. In this review, we investigated the most widely used software for the fundamental analysis and QC of cancer genome sequencing data and provided instructions for selecting the most appropriate software and pipelines to ensure precise and efficient conclusions. We further discussed the prospects and new research directions for cancer genomics.

Download Full-text

FaSD-somatic: a fast and accurate somatic SNV detection algorithm for cancer genome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btu338 ◽

2014 ◽

Vol 30 (17) ◽

pp. 2498-2500 ◽

Cited By ~ 12

Author(s):

Weixin Wang ◽

Panwen Wang ◽

Feng Xu ◽

Ruibang Luo ◽

Maria Pik Wong ◽

...

Keyword(s):

Genome Sequencing ◽

Detection Algorithm ◽

Cancer Genome ◽

Sequencing Data ◽

Cancer Genome Sequencing

Download Full-text

A heuristic platform for clinical interpretation of cancer genome sequencing data.

Journal of Clinical Oncology ◽

10.1200/jco.2012.30.15_suppl.10502 ◽

2012 ◽

Vol 30 (15_suppl) ◽

pp. 10502-10502

Author(s):

Eliezer Mendel Van Allen ◽

Nikhil Wagle ◽

Gregory Kryukov ◽

Alexis Ramos ◽

Gad Getz ◽

...

Keyword(s):

Prostate Cancer ◽

Genome Sequencing ◽

Heuristic Algorithm ◽

Genetic Alterations ◽

Cancer Genome ◽

Clinical Samples ◽

Whole Genome ◽

Cancer Genes ◽

Sequencing Data ◽

Clinical Interpretation

10502 Background: The ability to identify and effectively sort the full spectrum of biologically and therapeutically relevant genetic alterations identified by massively parallel sequencing may improve cancer care. A major challenge involves rapid and rational categorization of data-intensive output, including somatic mutations, insertions/deletions, copy number alterations, and rearrangements into ranked categories for clinician review. Methods: A database of clinically actionable alterations was created, consisting of over 100 annotated genes known to undergo somatic genomic alterations in cancer that may impact clinical decision-making. A heuristic algorithm was developed, which selectively identifies somatic alterations based on the clinically actionable alterations database. Remaining variants are sorted based on additional heuristics, including high priority alterations based on presence in the Cancer Gene Census, biologically significant cancer genes based on presence in COSMIC or MSigDB, and low priority alterations in the same gene family as biologically significant cancer genes. The heuristic algorithm was applied to whole exome sequencing data of clinical samples and whole genome sequencing data from a cohort of prostate cancer samples processed using established Broad Institute pipelines. Results: Application of the heuristic algorithm to the prostate cancer whole genome rearrangement data identified 172 (out of 5978) rearrangements involving actionable genes (averaging 2-3 events per tumor). Furthermore, two clinical samples processed prospectively were analyzed, yielding three potentially actionable alterations for clinical review. Conclusions: The heuristic model for clinical interpretation of next generation sequencing data may facilitate rapid analysis of tumor genomic information for clinician review by identifying and prioritizing alterations that can directly impact care. Our platform can also be applied to research data to prospectively explore clinically relevant findings from existing cohorts. Future analytical approaches using heuristic or probabilistic algorithms should underpin a robust prospective assessment of clinical cancer genome data.

Download Full-text

Genomon ITDetector: a tool for somatic internal tandem duplication detection from cancer genome sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btu593 ◽

2014 ◽

Vol 31 (1) ◽

pp. 116-118 ◽

Cited By ~ 14

Author(s):

Kenichi Chiba ◽

Yuichi Shiraishi ◽

Yasunobu Nagata ◽

Kenichi Yoshida ◽

Seiya Imoto ◽

...

Keyword(s):

Genome Sequencing ◽

Tandem Duplication ◽

Cancer Genome ◽

Sequencing Data ◽

Internal Tandem Duplication ◽

Cancer Genome Sequencing ◽

Duplication Detection

Download Full-text

Detecting somatic point mutations in cancer genome sequencing data: a comparison of mutation callers

Genome Medicine ◽

10.1186/gm495 ◽

2013 ◽

Vol 5 (10) ◽

pp. 91 ◽

Cited By ~ 113

Author(s):

Qingguo Wang ◽

Peilin Jia ◽

Fei Li ◽

Haiquan Chen ◽

Hongbin Ji ◽

...

Keyword(s):

Genome Sequencing ◽

Point Mutations ◽

Cancer Genome ◽

Sequencing Data ◽

Cancer Genome Sequencing

Download Full-text

Looking beyond drivers and passengers in cancer genome sequencing data

Annals of Oncology ◽

10.1093/annonc/mdw677 ◽

2017 ◽

Vol 28 (5) ◽

pp. 938-945 ◽

Cited By ~ 10

Author(s):

S. De ◽

S. Ganesan

Keyword(s):

Genome Sequencing ◽

Cancer Genome ◽

Sequencing Data ◽

Cancer Genome Sequencing

Download Full-text

SomatiCA: Identifying, Characterizing and Quantifying Somatic Copy Number Aberrations from Cancer Genome Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0078143 ◽

2013 ◽

Vol 8 (11) ◽

pp. e78143 ◽

Cited By ~ 19

Author(s):

Mengjie Chen ◽

Murat Gunel ◽

Hongyu Zhao

Keyword(s):

Genome Sequencing ◽

Copy Number ◽

Cancer Genome ◽

Sequencing Data ◽

Copy Number Aberrations ◽

Cancer Genome Sequencing

Download Full-text

Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btv430 ◽

2015 ◽

pp. btv430 ◽

Cited By ~ 14

Author(s):

Runjun D. Kumar ◽

Adam C. Searleman ◽

S. Joshua Swamidass ◽

Obi L. Griffith ◽

Ron Bose

Keyword(s):

Genome Sequencing ◽

Tumor Suppressors ◽

Cancer Genome ◽

Sequencing Data ◽

Cancer Genome Sequencing ◽

Pan Cancer

Download Full-text

Prioritizing Potentially Druggable Mutations with dGene: An Annotation Tool for Cancer Genome Sequencing Data

PLoS ONE ◽

10.1371/journal.pone.0067980 ◽

2013 ◽

Vol 8 (6) ◽

pp. e67980 ◽

Cited By ~ 14

Author(s):

Runjun D. Kumar ◽

Li-Wei Chang ◽

Matthew J. Ellis ◽

Ron Bose

Keyword(s):

Genome Sequencing ◽

Cancer Genome ◽

Annotation Tool ◽

Sequencing Data ◽

Cancer Genome Sequencing

Download Full-text