Biological insights from self-perceived facial aging data of the UKBB participants

Mapping Intimacies ◽

10.1101/758854 ◽

2019 ◽

Author(s):

Simona Vigodner ◽

Raya Khanin

Keyword(s):

Statistical Power ◽

Large Scale ◽

Skin Pigmentation ◽

Skin Aging ◽

Facial Aging ◽

Large Scale Data ◽

Genome Wide ◽

Reported Data ◽

The Uk ◽

Scale Data

AbstractGenetic underpinnings of facial aging are still largely unknown. In this study, we leverage the statistical power of large-scale data from the UK Biobank and perform insilico analysis of genome-wide self-perceived facial aging. Functional analysis reveals significant over-representation of skin pigmentation and immune related pathways that are correlated with facial aging. For males, hair loss is one of the top categories that is highly significantly over-represented in the genetics data associated with self-reported facial aging. Our analysis confirms that genes coding for the extracellular matrix play important roles in aging. Overall, our results provide evidence that while somewhat biased, large-scale self-reported data on aging can be utilized for extracting useful insights into underlying biology, provide candidate skin aging biomarkers, and advance anti-aging skincare.

Download Full-text

Increasing large-scale data center capacity by statistical power control

Proceedings of the Eleventh European Conference on Computer Systems - EuroSys '16 ◽

10.1145/2901318.2901338 ◽

2016 ◽

Cited By ~ 11

Author(s):

Guosai Wang ◽

Shuhao Wang ◽

Bing Luo ◽

Weisong Shi ◽

Yinghang Zhu ◽

...

Keyword(s):

Power Control ◽

Data Center ◽

Statistical Power ◽

Large Scale ◽

Large Scale Data ◽

Scale Data

Download Full-text

Developing ultra-efficient methods for genome-wide association analysis of large-scale data

10.14264/57ed4e6 ◽

2021 ◽

Author(s):

◽

Longda Jiang

Keyword(s):

Association Analysis ◽

Large Scale ◽

Genome Wide Association ◽

Genome Wide Association Analysis ◽

Large Scale Data ◽

Genome Wide ◽

Scale Data

Download Full-text

Optimal Genomic Control in Large-scale Genetic Associations for Binary Diseases

10.21203/rs.3.rs-318017/v2 ◽

2021 ◽

Author(s):

Runqing Yang ◽

Yuxin Song ◽

Li Jiang ◽

Zhiyu Hao ◽

Runqing Yang

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Large Scale ◽

Association Studies ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genomic Heritability ◽

Large Scale Data ◽

Genome Wide

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.

Download Full-text

Architecting a distributed bioinformatics platform with iRODS and iPlant Agave API

10.1101/034488 ◽

2015 ◽

Author(s):

Liya Wang ◽

Peter Van Buren ◽

Doreen Ware

Keyword(s):

Large Scale ◽

Genome Wide Association Study ◽

Data Transfer ◽

Genomic Analysis ◽

The Past ◽

Large Scale Data ◽

Genome Wide ◽

A Genome ◽

Bioinformatics Workflow ◽

Scale Data

Over the past few years, cloud-based platforms have been proposed to address storage, management, and computation of large-scale data, especially in the field of genomics. However, for collaboration efforts involving multiple institutes, data transfer and management, interoperability and standardization among different platforms have imposed new challenges. This paper proposes a distributed bioinformatics platform that can leverage local clusters with remote computational clusters for genomic analysis using the unified bioinformatics workflow. The platform is built with a data server configured with iRODS, a computation cluster authenticated with iPlant Agave system, and web server to interact with the platform. A Genome-Wide Association Study workflow is integrated to validate the feasibility of the proposed approach.

Download Full-text

A resource-efficient tool for mixed model association analysis of large-scale data

10.1101/598110 ◽

2019 ◽

Cited By ~ 6

Author(s):

Longda Jiang ◽

Zhili Zheng ◽

Ting Qi ◽

Kathryn E. Kemper ◽

Naomi R. Wray ◽

...

Keyword(s):

Population Stratification ◽

Large Scale ◽

Mixed Model ◽

Genome Wide Association Study ◽

Mixed Linear Model ◽

Genome Wide Association ◽

Relationship Matrix ◽

Genome Wide ◽

The Uk ◽

Scale Data

ABSTRACTThe genome-wide association study (GWAS) has been widely used as an experimental design to detect associations between genetic variants and a phenotype. Two major confounding factors, population stratification and relatedness, could potentially lead to inflated GWAS test-statistics and thereby spurious associations. Mixed linear model (MLM)-based approaches can be used to account for sample structure. However, genome-wide association (GWA) analyses in biobank samples such as the UK Biobank (UKB) often exceed the capability of most existing MLM-based tools especially if the number of traits is large. Here, we developed an MLM-based tool (called fastGWA) that controls for population stratification by principal components and relatedness by a sparse genetic relationship matrix for GWA analyses of biobank-scale data. We demonstrated by extensive simulations that fastGWA is reliable, robust and highly resource-efficient. We then applied fastGWA to 2,173 traits on 456,422 array-genotyped and imputed individuals and 2,048 traits on 46,191 whole-exome-sequenced individuals in the UKB.

Download Full-text

Training and performance in SMEs: Empirical evidence from large-scale data from the UK

Journal of Small Business Management ◽

10.1080/00472778.2020.1816431 ◽

2020 ◽

pp. 1-33

Author(s):

Bochra Idris ◽

George Saridakis ◽

Stewart Johnstone

Keyword(s):

Empirical Evidence ◽

Large Scale ◽

Large Scale Data ◽

And Performance ◽

The Uk ◽

Scale Data

Download Full-text

Large-scale trans-ethnic replication and discovery of genetic associations for rare diseases with self-reported medical data

10.1101/2021.06.09.21258643 ◽

2021 ◽

Author(s):

Suyash S Shringarpure ◽

Wei Wang ◽

Yunxuan Jiang ◽

Alison Acevedo ◽

Devika Dhamija ◽

...

Keyword(s):

Rare Disease ◽

Rare Diseases ◽

Large Scale ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide ◽

Reported Data ◽

The Uk

A key challenge in the study of rare disease genetics is assembling large case cohorts for well- powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.

Download Full-text

FAIRly big: A framework for computationally reproducible processing of large-scale data

10.1101/2021.10.12.464122 ◽

2021 ◽

Author(s):

Adina S. Wagner ◽

Laura K. Waite ◽

Małgorzata Wierzba ◽

Felix Hoffstaedter ◽

Alexander Q. Waite ◽

...

Keyword(s):

Large Scale ◽

Open Science ◽

Uk Biobank ◽

Data Usage ◽

Scientific Investigations ◽

Large Scale Data ◽

Research Outcomes ◽

And Performance ◽

The Uk ◽

Scale Data

Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework's performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).

Download Full-text

MetaCycle: an integrated R package to evaluate periodicity in large scale data

10.1101/040345 ◽

2016 ◽

Cited By ~ 6

Author(s):

Gang Wu ◽

Ron C Anafi ◽

Michael E Hughes ◽

Karl Kornacker ◽

John B Hogenesch

Keyword(s):

Statistical Power ◽

Large Scale ◽

Time Series Data ◽

R Package ◽

Ease Of Use ◽

Data Availability ◽

Supplementary Information ◽

Series Data ◽

Large Scale Data ◽

Scale Data

Summary: Detecting periodicity in large scale data remains a challenge. Different algorithms offer strengths and weaknesses in statistical power, sensitivity to outliers, ease of use, and sampling requirements. While efforts have been made to identify best of breed algorithms, relatively little research has gone into integrating these methods in a generalizable method. Here we present MetaCycle, an R package that incorporates ARSER, JTK_CYCLE, and Lomb-Scargle to conveniently evaluate periodicity in time-series data. Availability and implementation: MetaCycle package is available on the CRAN repository (https://cran.r-project.org/web/packages/MetaCycle/index.html) and GitHub (https://github.com/gangwug/MetaCycle). Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

Gene Expression Maps in Plants: Current State and Prospects

Plants ◽

10.3390/plants8090309 ◽

2019 ◽

Vol 8 (9) ◽

pp. 309 ◽

Cited By ~ 2

Author(s):

Anna V. Klepikova ◽

Aleksey A. Penin

Keyword(s):

Regulatory Networks ◽

Large Scale ◽

Developmental Stages ◽

Large Scale Data ◽

Current State ◽

Genome Wide ◽

Plant Transcriptome ◽

Gene Regulatory ◽

Genome Wide Expression ◽

Scale Data

For many years, progress in the identification of gene functions has been based on classical genetic approaches. However, considerable recent omics developments have brought to the fore indirect but high-resolution methods of gene function identification such as transcriptomics, proteomics, and metabolomics. A transcriptome map is a powerful source of functional information and the result of the genome-wide expression analysis of a broad sampling of tissues and/or organs from different developmental stages and/or environmental conditions. In plant science, the application of transcriptome maps extends from the inference of gene regulatory networks to evolutionary studies. However, only some of these data have been integrated into databases, thus enabling analyses to be conducted without raw data; without this integration, extensive data preprocessing is required, which limits data usability. In this review, we summarize the state of plant transcriptome maps, analyze the problems associated with the combined analysis of large-scale data from various studies, and outline possible solutions to these problems.

Download Full-text