Chromosomal scale length variation of germline DNA can predict individual cancer risk

2018 ◽  
Author(s):  
Chris Toh ◽  
James P. Brody

AbstractInherited factors are thought to be responsible for a substantial fraction of many different forms of cancer. However, individual cancer risk cannot currently be well quantified by analyzing germ line DNA. Most analyses of germline DNA focus on the additive effects of single nucleotide polymorphisms (SNPs) found. Here we show that chromosomal-scale length variation of germline DNA can be used to predict whether a person will develop cancer. In two independent datasets, the Cancer Genome Atlas (TCGA) project and the UK Biobank, we could classify whether or not a patient had a certain cancer based solely on chromosomal scale length variation. In the TCGA data, we found that all 32 different types of cancer could be predicted better than chance using chromosomal scale length variation data. We found a model that could predict ovarian cancer in women with an area under the receiver operator curve, AUC=0.89. In the UK Biobank data, we could predict breast cancer in women with an AUC=0.83. This method could be used to develop genetic risk scores for other conditions known to have a substantial genetic component and complements genetic risk scores derived from SNPs.

2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract Introduction Twin studies indicate that a substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce ovarian cancer death rates. However, current ovarian cancer genetic risk scores do not work that well. We developed a genetic risk score based on variations in the length of chromosomes. Methods We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. We synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer. Results The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (95% confidence interval 0.86–0.91). Conclusion A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.


2020 ◽  
Author(s):  
Chris Toh ◽  
James Brody

Abstract Introduction.Twin studies indicate thata substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce overall death rates due to ovarian cancer. However, current ovarian cancer genetic risk scores, based on SNPs, do not work that well. We developed a genetic risk score based on structural variation, quantified by variations in the length of chromosomes.Methods. We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. From this dataset, we synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine, a machine learning algorithm, to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer in this dataset.Results. The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (estimated 95% confidence interval 0.86-0.91).Conclusion. A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.


2020 ◽  
Author(s):  
Chris Toh ◽  
James P Brody

Introduction. Twin studies indicate that a substantial fraction of ovarian cancers should be predictable from genetic testing. Genetic risk scores can stratify women into different classes of risk. Higher risk women can be treated or screened for ovarian cancer, which should reduce overall death rates due to ovarian cancer. However, current ovarian cancer genetic risk scores, based on SNPs, do not work that well. We developed a genetic risk score based on structural variation, quantified by variations in the length of chromosomes. Methods. We evaluated this genetic risk score using data collected by The Cancer Genome Atlas. From this dataset, we synthesized a dataset of 414 women who had ovarian serous carcinoma and 4225 women who had no form of ovarian cancer. We characterized each woman by 22 numbers, representing the length of each chromosome in their germ line DNA. We used a gradient boosting machine, a machine learning algorithm, to build a classifier that can predict whether a woman had been diagnosed with ovarian cancer in this dataset. Results. The genetic risk score based on chromosomal-scale length variation could stratify women such that the highest 20% had a 160x risk (95% confidence interval 50x-450x) compared to the lowest 20%. The genetic risk score we developed had an area under the curve of the receiver operating characteristic curve of 0.88 (estimated 95% confidence interval 0.86-0.91). Conclusion. A genetic risk score based on chromosomal-scale length variation of germ line DNA provides an effective means of predicting whether or not a woman will develop ovarian cancer.


2021 ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract Studies indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We aimed to determine how well one could predict that a person will develop schizophrenia based on their germ line genetics. We compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of numbers. Each number characterized the length of segments of chromosomes. We tested several machine learning algorithms to determine which was most effective in predicting schizophrenia and if any improvement in prediction occurs by breaking the chromosomes into smaller chunks. We found that the stacked ensemble, performed best with an area under the receiver operating characteristic curve (AUC) of 0.545 (95% CI 0.539-0.550). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. We conclude that germ line chromosomal scale length variation data could provide an effective genetic risk score for schizophrenia which performs better than chance.


2021 ◽  
Author(s):  
Christopher Toh ◽  
James P. Brody

Abstract IntroductionSchizophrenia is a neurological disorder that often manifests itself as a combination of psychotic symptoms such as delusions, hallucinations, and disorganized cognitive functions. Several lines of evidence indicate that schizophrenia has a genetic component, however it cannot be isolated to a single gene. We set out to determine how well one could predict that a person will develop schizophrenia based on their germ line DNA.MethodsWe compared 1129 people from the UK Biobank dataset who had a diagnosis of schizophrenia to an equal number of age matched people drawn from the general UK Biobank population. For each person, we constructed a profile consisting of a sequence of numbers. Each number characterized the length of a segment of one of their chromosomes. We tested several machine learning algorithms using the h2o.ai framework to determine which was most effective in predicting schizophrenia. We also tested whether there was any improvement in prediction by breaking the chromosomes into smaller chunks. We used SHAP values to better understand features important to the predictive model.ResultsWe found that the stacked ensemble, a combination of four different machine learning algorithms, performed best with an area under the receiver operating characteristic curve (AUC) of 0.583 (95% CI 0.581-0.586). We noted an increase in the AUC by breaking the chromosomes into smaller chunks for analysis. Using SHAP values, we identified the X chromosome as the most important contributor to the predictive model. ConclusionWe conclude that germ line chromosomal scale length variation data can provide an effective genetic risk score for schizophrenia. Length variations of several regions of the X Chromosome are the greatest contributing factor.


2016 ◽  
Author(s):  
Amy E. Taylor ◽  
Marcus R. Munafò

AbstractBackgroundGenetic variants which determine amount of coffee consumed have been identified in genome-wide association studies (GWAS) of coffee consumption; these may help to further understanding of the effects of coffee on health outcomes. However, there is limited information about how these variants relate to caffeinated beverage consumption more generally.AimsTo improve phenotype definition for coffee consumption related genetic risk scores by testing their association with coffee, tea and other beverages.MethodsWe tested the associations of genetic risk scores for coffee consumption with beverage consumption in 114,316 individuals of European ancestry from the UK Biobank. Drinks were self-reported in a baseline questionnaire and in detailed 24 dietary recall questionnaires in a subset.ResultsGenetic risk scores including two and eight single nucleotide polymorphisms (SNPs) explained up to 0.39%, 0.19% and 0.77% of the variance in coffee, tea and combined coffee and tea consumption respectively. A one standard deviation increase in the 8 SNP genetic risk score was associated with a 0.13 cup per day (95% CI: 0.12, 0.14), 0.12 cup per day (95%CI: 0.11, 0.14) and 0.25 cup per day (95% CI: 0.24, 0.27) increase in coffee, tea and combined tea and coffee consumption, respectively. Genetic risk scores also demonstrated positive associations with both caffeinated and decaffeinated coffee and tea consumption. In 48,692 individuals with dietary recall data, the genetic risk scores were positively associated with coffee and tea, (apart from herbal teas) consumption, but did not show clear evidence for positive associations with other beverages. However, there was evidence that the genetic risk scores were associated with lower daily water consumption and lower overall drink consumption.ConclusionsGenetic risk scores created from variants identified in coffee consumption GWAS associate more broadly with caffeinated beverage consumption and also with decaffeinated coffee and tea consumption.


2018 ◽  
Author(s):  
Kristi Läll ◽  
Maarja Lepamets ◽  
Marili Palover ◽  
Tõnu Esko ◽  
Andres Metspalu ◽  
...  

AbstractBackgroundPublished genetic risk scores for breast cancer (BC) so far have been based on a relatively small number of markers and are not necessarily using the full potential of large-scale Genome-Wide Association Studies. This study aims to identify an efficient polygenic predictor for BC based on best available evidence and to assess its potential for personalized risk prediction and screening strategies.MethodsFour different genetic risk scores (two already published and two newly developed) and their combinations (metaGRS) are compared in the subsets of two population-based biobank cohorts: the UK Biobank (UKBB, 3157 BC cases, 43,827 controls) and Estonian Biobank (EstBB, 317 prevalent and 308 incident BC cases in 32,557 women). In addition, correlations between different genetic risk scores and their associations with BC risk factors are studied in both cohorts.ResultsThe metaGRS that combines two genetic risk scores (metaGRS2 - based on 75 and 898 Single Nucleotide Polymorphisms, respectively) has the strongest association with prevalent BC status in both cohorts. One standard deviation difference in the metaGRS2 corresponds to an Odds Ratio = 1.6 (95% CI 1.54 to 1.66, p = 9.7*10-135) in the UK Biobank and accounting for family history marginally attenuates the effect (Odds Ratio = 1.58, 95% CI 1.53 to 1.64, p = 9.1*10-129). In the EstBB cohort, the hazard ratio of incident BC for the women in the top 5% of the metaGRS2 compared to women in the lowest 50% is 4.2 (95% CI 2.8 to 6.2, p = 8.1*10-13). The different GRSs are only moderately correlated with each other and are associated with different known predictors of BC. The classification of genetic risk for the same individual may vary considerably depending on the chosen GRS.ConclusionsWe have shown that metaGRS2 that combines on the effects of more than 900 SNPs provides best predictive ability for breast cancer in two different population-based cohorts. The strength of the effect of metaGRS2 indicates that the GRS could potentially be used to develop more efficient strategies for breast cancer screening for genotyped women.


2020 ◽  
Author(s):  
Chris Toh ◽  
James P. Brody

AbstractIntroductionThe course of COVID-19 varies from asymptomatic to severe (acute respiratory distress, cytokine storms, and death) in patients. The basis for this range in symptoms is unknown. One possibility is that genetic variation is responsible for the highly variable response to infection. We evaluated how well a genetic risk score based on chromosome-scale length variation and machine learning classification algorithms could predict severity of response to SARS-CoV-2 infection.MethodsWe compared 981 patients from the UK Biobank dataset who had a severe reaction to SARS-COV-2 infection before 27 April 2020 to a similar number of age matched patients drawn for the general UK Biobank population. For each patient, we built a profile of 88 numbers characterizing the chromosome-scale length variability of their germ line DNA. Each number represented one quarter of the 22 autosomes. We used the machine learning algorithm XGBoost to build a classifier that could predict whether a person would have a severe reaction to Covid-19 based only on their 88-number classification.ResultsWe found that the XGBoost classifier could differentiate between the two classes at a significant level p = 2 · 10 as measured against a randomized control and p = 3 · 10 measured against the expected value of a random guessing algorithm (AUC=0.5). However, we found that the AUC of the classifier was only 0.51, too low for a clinically useful test.Conclusion


Addiction ◽  
2017 ◽  
Vol 113 (1) ◽  
pp. 148-157 ◽  
Author(s):  
Amy E. Taylor ◽  
George Davey Smith ◽  
Marcus R. Munafò

Sign in / Sign up

Export Citation Format

Share Document