scholarly journals Unique k-mer sequences for validating cancer-related substitution, insertion and deletion mutations

NAR Cancer ◽  
2020 ◽  
Vol 2 (4) ◽  
Author(s):  
HoJoon Lee ◽  
Ahmed Shuaibi ◽  
John M Bell ◽  
Dmitri S Pavlichin ◽  
Hanlee P Ji

Abstract Cancer genome sequencing has led to important discoveries such as the identification of cancer genes. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.

2020 ◽  
Author(s):  
HoJoon Lee ◽  
Ahmed Shuaibi ◽  
John M. Bell ◽  
Dmitri S. Pavlichin ◽  
Hanlee P. Ji

ABSTRACTThe cancer genome sequencing has led to important discoveries such as identifying cancer gene. However, challenges remain in the analysis of cancer genome sequencing. One significant issue is that mutations identified by multiple variant callers are frequently discordant even when using the same genome sequencing data. For insertion and deletion mutations, oftentimes there is no agreement among different callers. Identifying somatic mutations involves read mapping and variant calling, a complicated process that uses many parameters and model tuning. To validate the identification of true mutations, we developed a method using k-mer sequences. First, we characterized the landscape of unique versus non-unique k-mers in the human genome. Second, we developed a software package, KmerVC, to validate the given somatic mutations from sequencing data. Our program validates the occurrence of a mutation based on statistically significant difference in frequency of k-mers with and without a mutation from matched normal and tumor sequences. Third, we tested our method on both simulated and cancer genome sequencing data. Counting k-mer involving mutations effectively validated true positive mutations including insertions and deletions across different individual samples in a reproducible manner. Thus, we demonstrated a straightforward approach for rapidly validating mutations from cancer genome sequencing data.


Author(s):  
Xiaoyu He ◽  
Shanyu Chen ◽  
Ruilin Li ◽  
Xinyin Han ◽  
Zhipeng He ◽  
...  

Abstract Next-generation sequencing (NGS) technology has revolutionised human cancer research, particularly via detection of genomic variants with its ultra-high-throughput sequencing and increasing affordability. However, the inundation of rich cancer genomics data has resulted in significant challenges in its exploration and translation into biological insights. One of the difficulties in cancer genome sequencing is software selection. Currently, multiple tools are widely used to process NGS data in four stages: raw sequence data pre-processing and quality control (QC), sequence alignment, variant calling and annotation and visualisation. However, the differences between these NGS tools, including their installation, merits, drawbacks and application, have not been fully appreciated. Therefore, a systematic review of the functionality and performance of NGS tools is required to provide cancer researchers with guidance on software and strategy selection. Another challenge is the multidimensional QC of sequencing data because QC can not only report varied sequence data characteristics but also reveal deviations in diverse features and is essential for a meaningful and successful study. However, monitoring of QC metrics in specific steps including alignment and variant calling is neglected in certain pipelines such as the ‘Best Practices Workflows’ in GATK. In this review, we investigated the most widely used software for the fundamental analysis and QC of cancer genome sequencing data and provided instructions for selecting the most appropriate software and pipelines to ensure precise and efficient conclusions. We further discussed the prospects and new research directions for cancer genomics.


2014 ◽  
Vol 30 (17) ◽  
pp. 2498-2500 ◽  
Author(s):  
Weixin Wang ◽  
Panwen Wang ◽  
Feng Xu ◽  
Ruibang Luo ◽  
Maria Pik Wong ◽  
...  

2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 10502-10502
Author(s):  
Eliezer Mendel Van Allen ◽  
Nikhil Wagle ◽  
Gregory Kryukov ◽  
Alexis Ramos ◽  
Gad Getz ◽  
...  

10502 Background: The ability to identify and effectively sort the full spectrum of biologically and therapeutically relevant genetic alterations identified by massively parallel sequencing may improve cancer care. A major challenge involves rapid and rational categorization of data-intensive output, including somatic mutations, insertions/deletions, copy number alterations, and rearrangements into ranked categories for clinician review. Methods: A database of clinically actionable alterations was created, consisting of over 100 annotated genes known to undergo somatic genomic alterations in cancer that may impact clinical decision-making. A heuristic algorithm was developed, which selectively identifies somatic alterations based on the clinically actionable alterations database. Remaining variants are sorted based on additional heuristics, including high priority alterations based on presence in the Cancer Gene Census, biologically significant cancer genes based on presence in COSMIC or MSigDB, and low priority alterations in the same gene family as biologically significant cancer genes. The heuristic algorithm was applied to whole exome sequencing data of clinical samples and whole genome sequencing data from a cohort of prostate cancer samples processed using established Broad Institute pipelines. Results: Application of the heuristic algorithm to the prostate cancer whole genome rearrangement data identified 172 (out of 5978) rearrangements involving actionable genes (averaging 2-3 events per tumor). Furthermore, two clinical samples processed prospectively were analyzed, yielding three potentially actionable alterations for clinical review. Conclusions: The heuristic model for clinical interpretation of next generation sequencing data may facilitate rapid analysis of tumor genomic information for clinician review by identifying and prioritizing alterations that can directly impact care. Our platform can also be applied to research data to prospectively explore clinically relevant findings from existing cohorts. Future analytical approaches using heuristic or probabilistic algorithms should underpin a robust prospective assessment of clinical cancer genome data.


2014 ◽  
Vol 31 (1) ◽  
pp. 116-118 ◽  
Author(s):  
Kenichi Chiba ◽  
Yuichi Shiraishi ◽  
Yasunobu Nagata ◽  
Kenichi Yoshida ◽  
Seiya Imoto ◽  
...  

10.1186/gm495 ◽  
2013 ◽  
Vol 5 (10) ◽  
pp. 91 ◽  
Author(s):  
Qingguo Wang ◽  
Peilin Jia ◽  
Fei Li ◽  
Haiquan Chen ◽  
Hongbin Ji ◽  
...  

2015 ◽  
pp. btv430 ◽  
Author(s):  
Runjun D. Kumar ◽  
Adam C. Searleman ◽  
S. Joshua Swamidass ◽  
Obi L. Griffith ◽  
Ron Bose

PLoS ONE ◽  
2013 ◽  
Vol 8 (6) ◽  
pp. e67980 ◽  
Author(s):  
Runjun D. Kumar ◽  
Li-Wei Chang ◽  
Matthew J. Ellis ◽  
Ron Bose

Sign in / Sign up

Export Citation Format

Share Document