Helmsman: fast and efficient generation of input matrices for mutation signature analysis
AbstractMotivationThe spectrum of somatic single-nucleotide variants in cancer genomes often reflects the signatures of multiple distinct mutational processes, which can provide clinically actionable insights into cancer etiology. Existing software tools for identifying and evaluating these mutational signatures do not scale to analyze large datasets containing thousands of individuals or millions of variants.ResultsWe introduce Helmsman, a program designed to rapidly generate mutation spectra matrices from arbitrarily large datasets. Helmsman is up to 300 times faster than existing methods and can provide more than a 100-fold reduction in memory usage, making mutation signature analysis tractable for any collection of single nucleotide variants, no matter how large.AvailabilityHelmsman is freely available for download at https://github.com/carjed/helmsman under the MIT license. Detailed documentation can be found at https://www.jedidiahcarlson.com/docs/helmsman/, and an interactive Jupyter notebook containing a guided tutorial can be accessed at https://mybinder.org/v2/gh/carjed/helmsman/[email protected] informationSupplementary information for this article is available.