Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure

Mapping Intimacies ◽

10.1101/393611 ◽

2018 ◽

Author(s):

Hugh G. Gauch ◽

Sheng Qian ◽

Hans-Peter Piepho ◽

Linda Zhou ◽

Rui Chen

Keyword(s):

Population Structure ◽

Rare Allele ◽

Literature Survey ◽

High Dimensional ◽

Common Allele ◽

Snp Data ◽

The Common ◽

Low Dimensional ◽

Main Effects

AbstractSNP datasets are high-dimensional, often with thousands to millions of SNPs and hundreds to thousands of samples or individuals. Accordingly, PCA graphs are frequently used to provide a low-dimensional visualization in order to display and discover patterns in SNP data from humans, animals, plants, and microbes—especially to elucidate population structure. Given the popularity of PCA, one might expect that PCA is understood well and applied effectively. However, our literature survey of 125 representative articles that apply PCA to SNP data shows that three choices have usually been made poorly: PCA graph, SNP coding, and PCA variant. Our main three recommendations are simple and easily implemented: Use PCA biplots, SNP coding 1 for the rare allele and 0 for the common allele, and double-centered PCA (or AMMI1 if main effects are of interest). The ultimate benefit from informed and optimal choices of PCA graph, SNP coding, and PCA variant, is expected to be discovery of more biology, and thereby acceleration of medical, agricultural, and other vital applications.

Download Full-text

PSS Business Case Map: Supporting Idea Generation in PSS Design

Volume 3: 38th Design Automation Conference, Parts A and B ◽

10.1115/detc2012-70692 ◽

2012 ◽

Cited By ~ 2

Author(s):

Fumiya Akasaka ◽

Kazuki Fujita ◽

Yoshiki Shimomura

Keyword(s):

Idea Generation ◽

Business Case ◽

Literature Survey ◽

The Self ◽

High Dimensional ◽

Self Organizing Map ◽

Two Dimensional ◽

Service Type ◽

Business Cases ◽

Low Dimensional

This paper proposes the PSS Business Case Map as a tool to support designers’ idea generation in PSS design. The map visualizes the similarities among PSS business cases in a two-dimensional diagram. To make the map, PSS business cases are first collected by conducting, for example, a literature survey. The collected business cases are then classified from multiple aspects that characterize each case such as its product type, service type, target customer, and so on. Based on the results of this classification, the similarities among the cases are calculated and visualized by using the Self-Organizing Map (SOM) technique. A SOM is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional) view from high-dimensional data. The visualization result is offered to designers in a form of a two-dimensional map, which is called the PSS Business Case Map. By using the map, designers can figure out the position of their current business and can acquire ideas for the servitization of their business.

Download Full-text

Identification of Venous Thromboembolism (VTE)-Associated Novel Variants in the ABO Gene Using Targeted Deep Sequencing

Blood ◽

10.1182/blood.v118.21.709.709 ◽

2011 ◽

Vol 118 (21) ◽

pp. 709-709

Author(s):

Yan W. Asmann ◽

Mariza de Andrade ◽

Sumit Middha ◽

Martha E. Matsumoto ◽

Sebastian M. Armasu ◽

...

Keyword(s):

Deep Sequencing ◽

Mayo Clinic ◽

Case Series ◽

Rare Allele ◽

Minor Allele ◽

Recent Analysis ◽

Common Allele ◽

Illumina Hiseq ◽

Functional Variants ◽

The Common

Abstract Abstract 709 Background: A recent analysis of merged genome-wide and candidate gene genotypes in VTE cases and controls identified multiple tag SNPs that were strongly associated with VTE. Objective: To identify rare and/or novel functional variants by sequencing the implicated genes. Methods: Cases (n=1488) were Mayo Clinic European-American patients of non-Hispanic ancestry with objectively-diagnosed VTE in the absence of active cancer, venous catheter or antiphospholipid antibodies. Controls (n=1439) were Mayo Clinic outpatients without VTE who were frequency-matched on case age, gender, race, MI/stroke status and state of residence. For this analysis, we selected a subset of these cases and controls for sequencing to take advantage of the joint configuration of two ABO SNPs of primary interest, rs8176719 (ABO exon 6 deletion determining type O blood group) and rs2519093 (ABO intron 1 tag SNP), which were previously shown to be strongly associated with VTE (p=5.7E-12 and p=3.0E-16, respectively). We randomly sampled 82 cases and 14 controls within 3 of the 9 potential allele frequency cells (Figure). The rs8176719 alleles are -−/−- (double deletion is the common allele), –/G, and G/G (the rare allele). The rs2519093 alleles are GG (G is the common allele), AG, and AA (A is the rare allele). For each SNP, the genotypes are represented as 0, 1, or 2 copies of the minor allele. We represented the joint allelic configuration of the two SNPs with the number of copies of the rs8176719 given first as 0/0 (both with 0 copies of the minor allele), 0/1, 0/2 (0 copies of the rs8176719 SNP), 1/0 (1 copy of the rare rs8176719 SNP), 1/1/, and 1/2, and 2/0 (2 copies of the rare allele for the rs8176719 SNP and 0 copies of the rare allele of the rs2519093 SNP), 2/1, and 2/2. From the Figure one observes discrepancies between cases and controls at the 0/0, 1/1 and 2/2 combinations. We randomly sampled from these three combinations, taking one third of the case series. For each SNP, we had 28 cases with 0/0 copies of the rare allele, 27 cases with 1/1 copies of the rare allele; and 27 cases with the combination of 2/2 copies of the rare allele. We compared these 82 cases with 14 controls that do not have any of these combinations. Sixteen genes were selected for deep sequencing, including 5 genes harboring SNPs significantly associated with VTE (F5, SLC19A2, ABO, NME7, ATP1B1), 10 genes with SNPs marginally associated with VTE (C1orf114, KLKB1, SELP, F11, SCUBE1, PRKCB1, CD44, ITPR1, GFRA1, BLZF1), and CYP4V2 which reportedly confounds F11 and KLKB1. Agilent SureSelect probes were designed to capture and enrich the ∼2 Mb genomic regions of these 16 genes. Samples were multiplexed (12-plex) and sequenced using Illumina HiSeq 2000. The sequence reads were aligned to the human genome build 36 using Burrows-Wheeler Aligner, and the single nucleotide variants (SNVs) and small INDELs were called using SNVMix and GATK, respectively. For this analysis, novel ABO SNVs were tested for an association with VTE using age-, sex-adjusted logistic regression and Fisher's Exact Test. Results: 98% of the targeted regions were sequenced with > 20X coverage. On average, ∼2500 SNVs and ∼200 INDELs were detected in each sample. Fifteen novel SNVs in intron 6 and 3' of the ABO gene were associated with VTE (p<E-06) and belonged to 3 distinctive LD blocks; none were in LD with the coding or tag ABO SNPs (rs8176719; rs2519093). SNVs inside the middle LD block at the 3' of ABO are located within an enhancer and promoter histone marked with putative transcription factor binding sites. In addition, strong evidence from both ENCODE and dbEST support the middle LD block as lying within a novel transcript, probably an extension of the 3' of ABO. In addition, we discovered a novel, significant, protective, frame-shifting single base (G) deletion at ABO chr9:135120877. Conclusion: Novel ABO functional variants that are associated with VTE were identified by deep sequencing. Disclosures: Heit: Daiichi Sankyo: Consultancy, Honoraria.

Download Full-text

Classification of Brainwaves for Sleep Stages by High-Dimensional FFT Features from EEG Signals

Applied Sciences ◽

10.3390/app10051797 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1797 ◽

Cited By ~ 2

Author(s):

Mera Kartika Delimayanti ◽

Bedy Purnama ◽

Ngoc Giang Nguyen ◽

Mohammad Reza Faisal ◽

Kunti Robiatul Mahmudah ◽

...

Keyword(s):

Machine Learning ◽

Sleep Stage ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Sleep Stages ◽

Eeg Signals ◽

Stage Classification ◽

Sleep Stage Classification ◽

Low Dimensional

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.

Download Full-text

Contributions to Management Strategies in the NE Atlantic Regarding the Life History and Population Structure of a Key Deep-Sea Fish (Mora moro)

Biology ◽

10.3390/biology10060522 ◽

2021 ◽

Vol 10 (6) ◽

pp. 522

Author(s):

Régis Santos ◽

Wendell Medeiros-Leal ◽

Osman Crespo ◽

Ana Novoa-Pabon ◽

Mário Pinho

Keyword(s):

Population Structure ◽

Life History ◽

Deep Sea ◽

Management Strategies ◽

Size Composition ◽

Large Size ◽

The Common ◽

Fishing Impacts ◽

Sea Fish

With the commercial fishery expansion to deeper waters, some vulnerable deep-sea species have been increasingly captured. To reduce the fishing impacts on these species, exploitation and management must be based on detailed and precise information about their biology. The common mora Mora moro has become the main deep-sea species caught by longliners in the Northeast Atlantic at depths between 600 and 1200 m. In the Azores, landings have more than doubled from the early 2000s to recent years. Despite its growing importance, its life history and population structure are poorly understood, and the current stock status has not been assessed. To better determine its distribution, biology, and long-term changes in abundance and size composition, this study analyzed a fishery-dependent and survey time series from the Azores. M. moro was found on mud and rock bottoms at depths below 300 m. A larger–deeper trend was observed, and females were larger and more abundant than males. The reproductive season took place from August to February. Abundance indices and mean sizes in the catch were marked by changes in fishing fleet operational behavior. M. moro is considered vulnerable to overfishing because it exhibits a long life span, a large size, slow growth, and a low natural mortality.

Download Full-text

A Nonlinear Maximum Correntropy Information Filter for High-Dimensional Neural Decoding

Entropy ◽

10.3390/e23060743 ◽

2021 ◽

Vol 23 (6) ◽

pp. 743

Author(s):

Xi Liu ◽

Shuhang Chen ◽

Xiang Shen ◽

Xiang Zhang ◽

Yiwen Wang

Keyword(s):

State Estimation ◽

Measurement Model ◽

High Dimensional ◽

Neural Firing ◽

The Neural Network ◽

Information Filter ◽

Critical Technology ◽

Dimensional Measurements ◽

Non Gaussian ◽

Low Dimensional

Neural signal decoding is a critical technology in brain machine interface (BMI) to interpret movement intention from multi-neural activity collected from paralyzed patients. As a commonly-used decoding algorithm, the Kalman filter is often applied to derive the movement states from high-dimensional neural firing observation. However, its performance is limited and less effective for noisy nonlinear neural systems with high-dimensional measurements. In this paper, we propose a nonlinear maximum correntropy information filter, aiming at better state estimation in the filtering process for a noisy high-dimensional measurement system. We reconstruct the measurement model between the high-dimensional measurements and low-dimensional states using the neural network, and derive the state estimation using the correntropy criterion to cope with the non-Gaussian noise and eliminate large initial uncertainty. Moreover, analyses of convergence and robustness are given. The effectiveness of the proposed algorithm is evaluated by applying it on multiple segments of neural spiking data from two rats to interpret the movement states when the subjects perform a two-lever discrimination task. Our results demonstrate better and more robust state estimation performance when compared with other filters.

Download Full-text

Perspectives of the high-dimensional dynamics of neural microcircuits from the point of view of low-dimensional readouts

Complexity ◽

10.1002/cplx.10089 ◽

2003 ◽

Vol 8 (4) ◽

pp. 39-50 ◽

Cited By ~ 11

Author(s):

Stefan Häusler ◽

Henry Markram ◽

Wolfgang Maass

Keyword(s):

Point Of View ◽

High Dimensional ◽

Low Dimensional

Download Full-text

Improved interactive color visualization approach for hyperspectral images

Information Visualization ◽

10.1177/14738716211048142 ◽

2021 ◽

pp. 147387162110481

Author(s):

Haijun Yu ◽

Shengyang Li

Keyword(s):

Real Time ◽

Hyperspectral Images ◽

High Dimensional ◽

Interactive Control ◽

Output Image ◽

Dr Method ◽

The Rich ◽

Low Dimensional ◽

Color Visualization ◽

Fusion Coefficient

Hyperspectral images (HSIs) have become increasingly prominent as they can maintain the subtle spectral differences of the imaged objects. Designing approaches and tools for analyzing HSIs presents a unique set of challenges due to their high-dimensional characteristics. An improved color visualization approach is proposed in this article to achieve communication between users and HSIs in the field of remote sensing. Under the real-time interactive control and color visualization, this approach can help users intuitively obtain the rich information hidden in original HSIs. Using the dimensionality reduction (DR) method based on band selection, high-dimensional HSIs are reduced to low-dimensional images. Through drop-down boxes, users can freely specify images that participate in the combination of RGB channels of the output image. Users can then interactively and independently set the fusion coefficient of each image within an interface based on concentric circles. At the same time, the output image will be calculated and visualized in real time, and the information it reflects will also be different. In this approach, channel combination and fusion coefficient setting are two independent processes, which allows users to interact more flexibly according to their needs. Furthermore, this approach is also applicable for interactive visualization of other types of multi-layer data.

Download Full-text

LSTM-Guided Coaching Assistant for Table Tennis Practice

Sensors ◽

10.3390/s18124112 ◽

2018 ◽

Vol 18 (12) ◽

pp. 4112 ◽

Cited By ~ 6

Author(s):

Se-Min Lim ◽

Hyeong-Cheol Oh ◽

Jaein Kim ◽

Juwon Lee ◽

Jooyoung Park

Keyword(s):

Time Series ◽

State Space ◽

Time Series Data ◽

State Space Model ◽

Skill Assessment ◽

Series Data ◽

High Dimensional ◽

Table Tennis ◽

Space Model ◽

Low Dimensional

Recently, wearable devices have become a prominent health care application domain by incorporating a growing number of sensors and adopting smart machine learning technologies. One closely related topic is the strategy of combining the wearable device technology with skill assessment, which can be used in wearable device apps for coaching and/or personal training. Particularly pertinent to skill assessment based on high-dimensional time series data from wearable sensors is classifying whether a player is an expert or a beginner, which skills the player is exercising, and extracting some low-dimensional representations useful for coaching. In this paper, we present a deep learning-based coaching assistant method, which can provide useful information in supporting table tennis practice. Our method uses a combination of LSTM (Long short-term memory) with a deep state space model and probabilistic inference. More precisely, we use the expressive power of LSTM when handling high-dimensional time series data, and state space model and probabilistic inference to extract low-dimensional latent representations useful for coaching. Experimental results show that our method can yield promising results for characterizing high-dimensional time series patterns and for providing useful information when working with wearable IMU (Inertial measurement unit) sensors for table tennis coaching.

Download Full-text

Genetic Diversity and Population Structure in South African, French and Argentinian Angora Goats from Genome-Wide SNP Data

PLoS ONE ◽

10.1371/journal.pone.0154353 ◽

2016 ◽

Vol 11 (5) ◽

pp. e0154353 ◽

Cited By ~ 34

Author(s):

Carina Visser ◽

Simon F. Lashmar ◽

Este Van Marle-Köster ◽

Mario A. Poli ◽

Daniel Allain

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

South African ◽

Snp Data ◽

Genome Wide ◽

Angora Goats

Download Full-text

Low-Dimensional Partial Integro-differential Equations for High-Dimensional Asian Options

Inspired by Finance ◽

10.1007/978-3-319-02069-3_15 ◽

2014 ◽

pp. 331-348

Author(s):

Peter Hepperger

Keyword(s):

Differential Equations ◽

High Dimensional ◽

Asian Options ◽

Low Dimensional

Download Full-text