Chromosomal scale length variation of germline DNA can predict individual cancer risk

Mapping Intimacies ◽

10.1101/303339 ◽

2018 ◽

Author(s):

Chris Toh ◽

James P. Brody

Keyword(s):

Cancer Risk ◽

Genetic Risk ◽

Germ Line ◽

The Cancer Genome Atlas ◽

Risk Scores ◽

Length Variation ◽

Uk Biobank ◽

Genetic Risk Scores ◽

The Uk ◽

Scale Length

AbstractInherited factors are thought to be responsible for a substantial fraction of many different forms of cancer. However, individual cancer risk cannot currently be well quantified by analyzing germ line DNA. Most analyses of germline DNA focus on the additive effects of single nucleotide polymorphisms (SNPs) found. Here we show that chromosomal-scale length variation of germline DNA can be used to predict whether a person will develop cancer. In two independent datasets, the Cancer Genome Atlas (TCGA) project and the UK Biobank, we could classify whether or not a patient had a certain cancer based solely on chromosomal scale length variation. In the TCGA data, we found that all 32 different types of cancer could be predicted better than chance using chromosomal scale length variation data. We found a model that could predict ovarian cancer in women with an area under the receiver operator curve, AUC=0.89. In the UK Biobank data, we could predict breast cancer in women with an AUC=0.83. This method could be used to develop genetic risk scores for other conditions known to have a substantial genetic component and complements genetic risk scores derived from SNPs.

Download Full-text

Genetic risk score for ovarian cancer based on chromosomal-scale length variation

BioData Mining ◽

10.1186/s13040-021-00253-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Christopher Toh ◽

James P. Brody

Keyword(s):

Ovarian Cancer ◽

Confidence Interval ◽

Risk Score ◽

Genetic Risk ◽

Germ Line ◽

Genetic Risk Score ◽

Risk Scores ◽

Length Variation ◽

Genetic Risk Scores ◽

Scale Length

Abstract Introduction Twin studies indicate that a substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce ovarian cancer death rates. However, current ovarian cancer genetic risk scores do not work that well. We developed a genetic risk score based on variations in the length of chromosomes. Methods We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. We synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer. Results The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (95% confidence interval 0.86–0.91). Conclusion A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.

Download Full-text

Genetic Risk Score for Ovarian Cancer Based on Chromosomal-scale Length Variation

10.21203/rs.3.rs-48991/v1 ◽

2020 ◽

Author(s):

Chris Toh ◽

James Brody

Keyword(s):

Ovarian Cancer ◽

Confidence Interval ◽

Risk Score ◽

Genetic Risk ◽

Germ Line ◽

Genetic Risk Score ◽

Risk Scores ◽

Length Variation ◽

Genetic Risk Scores ◽

Scale Length

Abstract Introduction.Twin studies indicate thata substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce overall death rates due to ovarian cancer. However, current ovarian cancer genetic risk scores, based on SNPs, do not work that well. We developed a genetic risk score based on structural variation, quantified by variations in the length of chromosomes.Methods. We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. From this dataset, we synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine, a machine learning algorithm, to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer in this dataset.Results. The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (estimated 95% confidence interval 0.86-0.91).Conclusion. A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.

Download Full-text

Genetic risk score for ovarian cancer based on chromosomal-scale length variation.

10.1101/2020.07.18.20156976 ◽

2020 ◽

Author(s):

Chris Toh ◽

James P Brody

Keyword(s):

Ovarian Cancer ◽

Confidence Interval ◽

Risk Score ◽

Genetic Risk ◽

Germ Line ◽

Genetic Risk Score ◽

Risk Scores ◽

Length Variation ◽

Genetic Risk Scores ◽

Scale Length

Introduction. Twin studies indicate that a substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce overall death rates due to ovarian cancer. However, current ovarian cancer genetic risk scores, based on SNPs, do not work that well. We developed a genetic risk score based on structural variation, quantified by variations in the length of chromosomes. Methods. We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. From this dataset, we synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine, a machine learning algorithm, to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer in this dataset. Results. The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (estimated 95% confidence interval 0.86-0.91). Conclusion. A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.

Download Full-text

Genetic Risk Score for Predicting Schizophrenia Using Human Chromosomal-Scale Length Variation

10.21203/rs.3.rs-268559/v2 ◽

2021 ◽

Author(s):

Christopher Toh ◽

James P. Brody

Keyword(s):

Risk Score ◽

Genetic Risk ◽

Germ Line ◽

Genetic Risk Score ◽

Single Gene ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Length Variation ◽

Uk Biobank ◽

Scale Length

Abstract Studies indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We aimed to determine how well one could predict that a person will develop schizophrenia based on their germ line genetics. We compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of numbers. Each number characterized the length of segments of chromosomes. We tested several machine learning algorithms to determine which was most effective in predicting schizophrenia and if any improvement in prediction occurs by breaking the chromosomes into smaller chunks. We found that the stacked ensemble, performed best with an area under the receiver operating characteristic curve (AUC) of 0.545 (95% CI 0.539-0.550). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. We conclude that germ line chromosomal scale length variation data could provide an effective genetic risk score for schizophrenia which performs better than chance.

Download Full-text

Genetic Risk Score for Predicting Schizophrenia Using Human Chromosomal-Scale Length Variation.

10.21203/rs.3.rs-268559/v1 ◽

2021 ◽

Author(s):

Christopher Toh ◽

James P. Brody

Keyword(s):

Machine Learning ◽

X Chromosome ◽

Risk Score ◽

Genetic Risk ◽

Germ Line ◽

Genetic Risk Score ◽

Machine Learning Algorithms ◽

Length Variation ◽

Uk Biobank ◽

Scale Length

Abstract IntroductionSchizophrenia is a neurological disorder that often manifests itself as a combination of psychotic symptoms such as delusions, hallucinations, and disorganized cognitive functions. Several lines of evidence indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We set out to determine how well one could predict that a person will develop schizophrenia based on their germ line DNA.MethodsWe compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of a sequence of numbers. Each number characterized the length of a segment of one of their chromosomes. We tested several machine learning algorithms using the h2o.ai framework to determine which was most effective in predicting schizophrenia. We also tested whether there was any improvement in prediction by breaking the chromosomes into smaller chunks. We used SHAP values to better understand features important to the predictive model.ResultsWe found that the stacked ensemble, a combination of four different machine learning algorithms, performed best with an area under the receiver operating characteristic curve (AUC) of 0.583 (95% CI 0.581-0.586). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. ConclusionWe conclude that germ line chromosomal scale length variation data can provide an effective genetic risk score for schizophrenia. Length variations of several regions of the X Chromosome are the greatest contributing factor.

Download Full-text

Associations of coffee genetic risk scores with coffee, tea and other beverages in the UK Biobank

10.1101/096214 ◽

2016 ◽

Cited By ~ 1

Author(s):

Amy E. Taylor ◽

Marcus R. Munafò

Keyword(s):

Genetic Risk ◽

Coffee Consumption ◽

Risk Scores ◽

Tea Consumption ◽

Uk Biobank ◽

Beverage Consumption ◽

Dietary Recall ◽

Genetic Risk Scores ◽

Decaffeinated Coffee ◽

The Uk

AbstractBackgroundGenetic variants which determine amount of coffee consumed have been identified in genome-wide association studies (GWAS) of coffee consumption; these may help to further understanding of the effects of coffee on health outcomes. However, there is limited information about how these variants relate to caffeinated beverage consumption more generally.AimsTo improve phenotype definition for coffee consumption related genetic risk scores by testing their association with coffee, tea and other beverages.MethodsWe tested the associations of genetic risk scores for coffee consumption with beverage consumption in 114,316 individuals of European ancestry from the UK Biobank. Drinks were self-reported in a baseline questionnaire and in detailed 24 dietary recall questionnaires in a subset.ResultsGenetic risk scores including two and eight single nucleotide polymorphisms (SNPs) explained up to 0.39%, 0.19% and 0.77% of the variance in coffee, tea and combined coffee and tea consumption respectively. A one standard deviation increase in the 8 SNP genetic risk score was associated with a 0.13 cup per day (95% CI: 0.12, 0.14), 0.12 cup per day (95%CI: 0.11, 0.14) and 0.25 cup per day (95% CI: 0.24, 0.27) increase in coffee, tea and combined tea and coffee consumption, respectively. Genetic risk scores also demonstrated positive associations with both caffeinated and decaffeinated coffee and tea consumption. In 48,692 individuals with dietary recall data, the genetic risk scores were positively associated with coffee and tea, (apart from herbal teas) consumption, but did not show clear evidence for positive associations with other beverages. However, there was evidence that the genetic risk scores were associated with lower daily water consumption and lower overall drink consumption.ConclusionsGenetic risk scores created from variants identified in coffee consumption GWAS associate more broadly with caffeinated beverage consumption and also with decaffeinated coffee and tea consumption.

Download Full-text

Polygenic prediction of breast cancer: comparison of genetic predictors and implications for screening

10.1101/448597 ◽

2018 ◽

Author(s):

Kristi Läll ◽

Maarja Lepamets ◽

Marili Palover ◽

Tõnu Esko ◽

Andres Metspalu ◽

...

Keyword(s):

Breast Cancer ◽

Genetic Risk ◽

Odds Ratio ◽

Population Based ◽

Risk Scores ◽

Full Potential ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Risk Scores ◽

The Uk

AbstractBackgroundPublished genetic risk scores for breast cancer (BC) so far have been based on a relatively small number of markers and are not necessarily using the full potential of large-scale Genome-Wide Association Studies. This study aims to identify an efficient polygenic predictor for BC based on best available evidence and to assess its potential for personalized risk prediction and screening strategies.MethodsFour different genetic risk scores (two already published and two newly developed) and their combinations (metaGRS) are compared in the subsets of two population-based biobank cohorts: the UK Biobank (UKBB, 3157 BC cases, 43,827 controls) and Estonian Biobank (EstBB, 317 prevalent and 308 incident BC cases in 32,557 women). In addition, correlations between different genetic risk scores and their associations with BC risk factors are studied in both cohorts.ResultsThe metaGRS that combines two genetic risk scores (metaGRS2 - based on 75 and 898 Single Nucleotide Polymorphisms, respectively) has the strongest association with prevalent BC status in both cohorts. One standard deviation difference in the metaGRS2 corresponds to an Odds Ratio = 1.6 (95% CI 1.54 to 1.66, p = 9.7*10-135) in the UK Biobank and accounting for family history marginally attenuates the effect (Odds Ratio = 1.58, 95% CI 1.53 to 1.64, p = 9.1*10-129). In the EstBB cohort, the hazard ratio of incident BC for the women in the top 5% of the metaGRS2 compared to women in the lowest 50% is 4.2 (95% CI 2.8 to 6.2, p = 8.1*10-13). The different GRSs are only moderately correlated with each other and are associated with different known predictors of BC. The classification of genetic risk for the same individual may vary considerably depending on the chosen GRS.ConclusionsWe have shown that metaGRS2 that combines on the effects of more than 900 SNPs provides best predictive ability for breast cancer in two different population-based cohorts. The strength of the effect of metaGRS2 indicates that the GRS could potentially be used to develop more efficient strategies for breast cancer screening for genotyped women.

Download Full-text

Human chromosomal-scale length variation and severity of COVID-19 infection using the UK Biobank dataset

10.1101/2020.07.06.20147637 ◽

2020 ◽

Author(s):

Chris Toh ◽

James P. Brody

Keyword(s):

Machine Learning ◽

Germ Line ◽

Learning Algorithm ◽

Genetic Risk Score ◽

Severe Reaction ◽

Length Variation ◽

Uk Biobank ◽

Machine Learning Classification ◽

The Uk ◽

Scale Length

AbstractIntroductionThe course of COVID-19 varies from asymptomatic to severe (acute respiratory distress, cytokine storms, and death) in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is responsible for the highly variable response to infection. We evaluated how well a genetic risk score based on chromosome-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection.MethodsWe compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-COV-2 infection before 27 April 2020 to a similar number of age matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosome-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to Covid-19 based only on their 88-number classification.ResultsWe found that the XGBoost classifier could differentiate between the two classes at a significant level p = 2 · 10 as measured against a randomized control and p = 3 · 10 measured against the expected value of a random guessing algorithm (AUC=0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test.Conclusion

Download Full-text

Abstract PR-09: Genetic risk scores for breast cancer based on machine learning analysis of chromosomal-scale length variation

10.1158/1557-3265.adi21-pr-09 ◽

2021 ◽

Author(s):

Charmeine Ko ◽

Christopher Toh ◽

James P. Brody

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Genetic Risk ◽

Risk Scores ◽

Length Variation ◽

Genetic Risk Scores ◽

Learning Analysis ◽

Scale Length

Download Full-text

Associations of coffee genetic risk scores with consumption of coffee, tea and other beverages in the UK Biobank

Addiction ◽

10.1111/add.13975 ◽

2017 ◽

Vol 113 (1) ◽

pp. 148-157 ◽

Cited By ~ 10

Author(s):

Amy E. Taylor ◽

George Davey Smith ◽

Marcus R. Munafò

Keyword(s):

Genetic Risk ◽

Risk Scores ◽

Uk Biobank ◽

Genetic Risk Scores ◽

The Uk

Download Full-text