American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange: From Inception to First Data Release and Beyond—Lessons Learned and Member Institutions’ Perspectives

JCO Clinical Cancer Informatics ◽

10.1200/cci.17.00083 ◽

2018 ◽

pp. 1-14 ◽

Cited By ~ 6

Author(s):

Christine M. Micheel ◽

Shawn M. Sweeney ◽

Michele L. LeNoue-Newton ◽

Fabrice André ◽

Philippe L. Bedard ◽

...

Keyword(s):

Cancer Research ◽

Data Sharing ◽

Information Exchange ◽

American Association ◽

Lessons Learned ◽

Precision Oncology ◽

Sequencing Data ◽

Data Set ◽

Genetic Sequencing ◽

Share Data

The American Association for Cancer Research (AACR) Project Genomics Evidence Neoplasia Information Exchange (GENIE) is an international data-sharing consortium focused on enabling advances in precision oncology through the gathering and sharing of tumor genetic sequencing data linked with clinical data. The project’s history, operational structure, lessons learned, and institutional perspectives on participation in the data-sharing consortium are reviewed. Individuals involved with the inception and execution of AACR Project GENIE from each member institution described their experiences and lessons learned. The consortium was conceived in January 2014 and publicly released its first data set in January 2017, which consisted of 18,804 samples from 18,324 patients contributed by the eight founding institutions. Commitment and contributions from many individuals at AACR and the member institutions were crucial to the consortium’s success. These individuals filled leadership, project management, informatics, data curation, contracts, ethics, and security roles. Many lessons were learned during the first 3 years of the consortium, including on how to gather, harmonize, and share data; how to make decisions and foster collaboration; and how to set the stage for continued participation and expansion of the consortium. We hope that the lessons shared here will assist new GENIE members as well as others who embark on the journey of forming a genomic data–sharing consortium.

Download Full-text

Marker Turned Target? A Case to Review Neprilysin in ALL (or Lessons Learned at the American Association for Cancer Research Meetings, 2006)

Journal of Pediatric Hematology/Oncology ◽

10.1097/01.mph.0000212921.16427.df ◽

2006 ◽

Vol 28 (4) ◽

pp. 201-202 ◽

Cited By ~ 1

Author(s):

Barton A. Kamen

Keyword(s):

Cancer Research ◽

American Association ◽

Lessons Learned

Download Full-text

Guidelines for Sanger sequencing and molecular assay monitoring

Journal of Veterinary Diagnostic Investigation ◽

10.1177/1040638720905833 ◽

2020 ◽

Vol 32 (6) ◽

pp. 767-775

Author(s):

Beate M. Crossley ◽

Jianfa Bai ◽

Amy Glaser ◽

Roger Maes ◽

Elizabeth Porter ◽

...

Keyword(s):

Phylogenetic Analysis ◽

American Association ◽

Sequence Data ◽

Epidemiologic Studies ◽

Molecular Assay ◽

Sequencing Data ◽

Genetic Sequencing ◽

Laboratory Services ◽

Diagnostic Applications ◽

Laboratory Technology

Genetic sequencing, or DNA sequencing, using the Sanger technique has become widely used in the veterinary diagnostic community. This technology plays a role in verification of PCR results and is used to provide the genetic sequence data needed for phylogenetic analysis, epidemiologic studies, and forensic investigations. The Laboratory Technology Committee of the American Association of Veterinary Laboratory Diagnosticians has prepared guidelines for sample preparation, submission to sequencing facilities or instrumentation, quality assessment of nucleic acid sequence data performed, and for generating basic sequencing data and phylogenetic analysis for diagnostic applications. This guidance is aimed at assisting laboratories in providing consistent, high-quality, and reliable sequence data when using Sanger-based genetic sequencing as a component of their laboratory services.

Download Full-text

OSIRIS: A Minimum Data Set for Data Sharing and Interoperability in Oncology

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00094 ◽

2021 ◽

pp. 256-265

Author(s):

Julien Guérin ◽

Yec'han Laizet ◽

Vincent Le Texier ◽

Laetitia Chanas ◽

Bastien Rance ◽

...

Keyword(s):

Cancer Research ◽

Data Sharing ◽

Data Model ◽

Task Force ◽

Genomic Data ◽

Common Data Model ◽

University Hospitals ◽

Data Set ◽

Community Initiatives ◽

Minimal Data

PURPOSE Many institutions throughout the world have launched precision medicine initiatives in oncology, and a large amount of clinical and genomic data is being produced. Although there have been attempts at data sharing with the community, initiatives are still limited. In this context, a French task force composed of Integrated Cancer Research Sites (SIRICs), comprehensive cancer centers from the Unicancer network (one of Europe's largest cancer research organization), and university hospitals launched an initiative to improve and accelerate retrospective and prospective clinical and genomic data sharing in oncology. MATERIALS AND METHODS For 5 years, the OSIRIS group has worked on structuring data and identifying technical solutions for collecting and sharing them. The group used a multidisciplinary approach that included weekly scientific and technical meetings over several months to foster a national consensus on a minimal data set. RESULTS The resulting OSIRIS set and event-based data model, which is able to capture the disease course, was built with 67 clinical and 65 omics items. The group made it compatible with the HL7 Fast Healthcare Interoperability Resources (FHIR) format to maximize interoperability. The OSIRIS set was reviewed, approved by a National Plan Strategic Committee, and freely released to the community. A proof-of-concept study was carried out to put the OSIRIS set and Common Data Model into practice using a cohort of 300 patients. CONCLUSION Using a national and bottom-up approach, the OSIRIS group has defined a model including a minimal set of clinical and genomic data that can be used to accelerate data sharing produced in oncology. The model relies on clear and formally defined terminologies and, as such, may also benefit the larger international community.

Download Full-text

COVID-19 in people with multiple sclerosis: A global data sharing initiative

Multiple Sclerosis Journal ◽

10.1177/1352458520941485 ◽

2020 ◽

Vol 26 (10) ◽

pp. 1157-1162 ◽

Cited By ~ 2

Author(s):

Liesbet M Peeters ◽

Tina Parciak ◽

Clare Walton ◽

Lotte Geys ◽

Yves Moreau ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Scale Up ◽

Global Scale ◽

Quality Data ◽

Data Set ◽

Core Data ◽

Task Forces ◽

Multiple Partners ◽

Share Data

Background: We need high-quality data to assess the determinants for COVID-19 severity in people with MS (PwMS). Several studies have recently emerged but there is great benefit in aligning data collection efforts at a global scale. Objectives: Our mission is to scale-up COVID-19 data collection efforts and provide the MS community with data-driven insights as soon as possible. Methods: Numerous stakeholders were brought together. Small dedicated interdisciplinary task forces were created to speed-up the formulation of the study design and work plan. First step was to agree upon a COVID-19 MS core data set. Second, we worked on providing a user-friendly and rapid pipeline to share COVID-19 data at a global scale. Results: The COVID-19 MS core data set was agreed within 48 hours. To date, 23 data collection partners are involved and the first data imports have been performed successfully. Data processing and analysis is an on-going process. Conclusions: We reached a consensus on a core data set and established data sharing processes with multiple partners to address an urgent need for information to guide clinical practice. First results show that partners are motivated to share data to attain the ultimate joint goal: better understand the effect of COVID-19 in PwMS.

Download Full-text

Lessons Learned in Reviewing CCF Events

Volume 2: Structural Integrity; Safety and Security; Advanced Applications of Nuclear Technology; Balance of Plant for Nuclear Applications ◽

10.1115/icone17-75887 ◽

2009 ◽

Author(s):

Ashley Mossa ◽

Rupert Weston

Keyword(s):

Information Exchange ◽

Nuclear Power ◽

Power Plants ◽

Nuclear Power Plants ◽

Lessons Learned ◽

Level Of Detail ◽

Data Sets ◽

Pressurized Water ◽

Data Set ◽

Multiple Data Sets

The U.S. Nuclear Regulatory Commission (NRC) has an ongoing Common Cause Failure (CCF) data analysis program that periodically collects and evaluates information on component failures at U.S. commercial Nuclear Power Plants (NPPs). The primary information sources include the Licensee Event Reports (LER) and records from the Equipment Performance Information Exchange (EPIX) program. Once the information is collected, the failure records are evaluated to identify potential CCF events. CCF events are then coded, reviewed, and loaded into the NRC’s database. Verification of the CCF events is performed with the intended purpose of ensuring that events entered into the CCF database are indeed CCF events and that the event coding is consistent and correct. To ensure technical accuracy and correctness of the events loaded into the CCF database, the NRC requested the Pressurized Water Reactors Owners Group (PWROG) support in reviewing these events. Reviews of multiple data sets of CCF events were conducted on behalf of the PWROG. The data sets included CCF events that have occurred at U.S. commercial nuclear power plants. CCF events that occurred during 2006 through 2007 were included in the most recent data set that was reviewed. The level of information provided for reported CCF events varies from utility-to-utility. Without utility participation or input, the lack of consistency and varying level of detail can lead to incorrect interpretation and classification of a CCF event regarding its Probabilistic Risk Assessment (PRA) impact. This paper offers lessons learned from the reviews that were conducted. Insights for improving the consistency and level of detail related to the PRA information are summarized in this paper. The leading causes of initial misclassification of CCF events and patterns observed in conducting the reviews are discussed. The resolutions of misclassified CCF events are also discussed as part of the evaluation process to enhance the pedigree of the CCF database.

Download Full-text

Developing a topic-based repository of clinical trial individual patient data: experiences and lessons learned from a pilot project

Systematic Reviews ◽

10.1186/s13643-021-01717-2 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Nancy Medley ◽

Anna Cuthbert ◽

Richard Crew ◽

Lesley Stewart ◽

Catrin Tudur Smith ◽

...

Keyword(s):

Data Sharing ◽

Individual Patient Data ◽

Meta Analysis ◽

Pilot Project ◽

Original Data ◽

Patient Data ◽

Lessons Learned ◽

Future Research ◽

Pregnancy And Childbirth ◽

Share Data

Abstract Background Building a dataset of individual participant data (IPD) for meta-analysis represents considerable research investment as well as collaboration across multiple institutions and researchers. Making arrangements to curate and share the dataset beyond the IPD meta-analysis project for which it was established, for reuse in future research projects, would maximise the value of this investment. Methods Our aim was to establish the Cochrane repository for individual patient data from clinical trials in pregnancy and childbirth (CRIB) as an example of how an IPD repository could become part of Cochrane infrastructure. We believed that establishing CRIB under Cochrane auspices would engender trust and encourage trial investigators to share data, and at the same time position Cochrane to take steps towards expanding the number of reviews with IPD synthesis. Results CRIB was designed as a web-based platform to receive, host and facilitate onward sharing of de-identified data. Development was not straightforward and we did not fully achieve our aim as intended. We describe the challenges encountered and suggest ways that future repositories might overcome these. In particular, securing the legal agreements required to facilitate data sharing proved to be the main barrier, being time-consuming and more complex than anticipated. Conclusions We would recommend that researchers conducting IPD meta-analysis should consider discussing the option to transfer the curated IPD datasets to a repository at the end of the initial meta-analysis and this should be recognised within the data sharing agreements made with the original data contributors.

Download Full-text

Seventy-Fifth Annual Meeting of the American Association for Cancer Research

JNCI Journal of the National Cancer Institute ◽

10.1093/jnci/72.2.501-e ◽

1984 ◽

Keyword(s):

Cancer Research ◽

Annual Meeting ◽

American Association

Download Full-text

Germline EGFR variants are over-represented in adolescents and young adults (AYA) with adrenocortical carcinoma

Human Molecular Genetics ◽

10.1093/hmg/ddaa268 ◽

2020 ◽

Author(s):

Sara Akhavanfard ◽

Lamis Yehia ◽

Roshan Padmanabhan ◽

Jordan P Reynolds ◽

Ying Ni ◽

...

Keyword(s):

Young Adults ◽

Adrenocortical Carcinoma ◽

Kinase Inhibitor ◽

Mapk Pathway ◽

Adolescents And Young Adults ◽

Control Group ◽

Precision Oncology ◽

Sequencing Data ◽

Germline Variants ◽

Mutant Cells

Abstract Adrenocortical Carcinoma (ACC) is a rare endocrine tumor with poor overall prognosis and 1.5-fold overrepresentation in females. In children, ACC is associated with inherited cancer syndromes with 50–80% of childhood-ACC associated with TP53 germline variants. ACC in adolescents and young adults (AYA) is rarely due to germline TP53, IGF2, PRKAR1A and MEN1 variants. We analyzed exome sequencing data from 21 children (<15y), 32 AYA (15-39y), and 60 adults (>39y) with ACC, and retained all pathogenic, likely pathogenic, and highly prioritized variants of uncertain significance. We engineered a stable lentiviral-mutant ACC cell line, harboring an EGFR variant (p.Asp1080Asn) from a 21-year-old female without germline-TP53-variant and with aggressive ACC. We found that 4.8% of the children (P = 0.004) and 6.2% of AYA (P < 0.0001), all-female participants, harbored germline EGFR variants, compared to only 0.3% of the control group. Expanding our analysis to the RTK-RAS-MAPK pathway, we found that the RTK genes have the highest number of highly prioritized germline variants in these individuals amongst all three arms of this pathway. We showed EGFR mutant cells migrate faster and are characterized by a stem-like phenotype compared to wild type cells. While EGFR inhibitors did not affect the stemness of mutant cells, Sunitinib, a multireceptor tyrosine kinase inhibitor, significantly reduced their stem-like behavior. Our data suggest that EGFR could be a novel underlying germline predisposition factor for ACC, especially in the Childhood-AYA (C-AYA) population. Further clinical validation can improve precision oncology management of this disease, which is known to have limited therapeutic options.

Download Full-text

Current cancer driver variant predictors learn to recognize driver genes instead of functional variants

BMC Biology ◽

10.1186/s12915-020-00930-0 ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Daniele Raimondi ◽

Antoine Passemiers ◽

Piero Fariselli ◽

Yves Moreau

Keyword(s):

Data Sets ◽

Precision Oncology ◽

Complex Task ◽

Excellent Performance ◽

Driver Genes ◽

Significant Drop ◽

Data Set ◽

Gene Effects ◽

Cancer Driver ◽

Functional Variants

Abstract Background Identifying variants that drive tumor progression (driver variants) and distinguishing these from variants that are a byproduct of the uncontrolled cell growth in cancer (passenger variants) is a crucial step for understanding tumorigenesis and precision oncology. Various bioinformatics methods have attempted to solve this complex task. Results In this study, we investigate the assumptions on which these methods are based, showing that the different definitions of driver and passenger variants influence the difficulty of the prediction task. More importantly, we prove that the data sets have a construction bias which prevents the machine learning (ML) methods to actually learn variant-level functional effects, despite their excellent performance. This effect results from the fact that in these data sets, the driver variants map to a few driver genes, while the passenger variants spread across thousands of genes, and thus just learning to recognize driver genes provides almost perfect predictions. Conclusions To mitigate this issue, we propose a novel data set that minimizes this bias by ensuring that all genes covered by the data contain both driver and passenger variants. As a result, we show that the tested predictors experience a significant drop in performance, which should not be considered as poorer modeling, but rather as correcting unwarranted optimism. Finally, we propose a weighting procedure to completely eliminate the gene effects on such predictions, thus precisely evaluating the ability of predictors to model the functional effects of single variants, and we show that indeed this task is still open.

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text