scholarly journals Consequences of PCA graphs, SNP codings, and PCA variants for elucidating population structure

2018 ◽  
Author(s):  
Hugh G. Gauch ◽  
Sheng Qian ◽  
Hans-Peter Piepho ◽  
Linda Zhou ◽  
Rui Chen

AbstractSNP datasets are high-dimensional, often with thousands to millions of SNPs and hundreds to thousands of samples or individuals. Accordingly, PCA graphs are frequently used to provide a low-dimensional visualization in order to display and discover patterns in SNP data from humans, animals, plants, and microbes—especially to elucidate population structure. Given the popularity of PCA, one might expect that PCA is understood well and applied effectively. However, our literature survey of 125 representative articles that apply PCA to SNP data shows that three choices have usually been made poorly: PCA graph, SNP coding, and PCA variant. Our main three recommendations are simple and easily implemented: Use PCA biplots, SNP coding 1 for the rare allele and 0 for the common allele, and double-centered PCA (or AMMI1 if main effects are of interest). The ultimate benefit from informed and optimal choices of PCA graph, SNP coding, and PCA variant, is expected to be discovery of more biology, and thereby acceleration of medical, agricultural, and other vital applications.

Author(s):  
Fumiya Akasaka ◽  
Kazuki Fujita ◽  
Yoshiki Shimomura

This paper proposes the PSS Business Case Map as a tool to support designers’ idea generation in PSS design. The map visualizes the similarities among PSS business cases in a two-dimensional diagram. To make the map, PSS business cases are first collected by conducting, for example, a literature survey. The collected business cases are then classified from multiple aspects that characterize each case such as its product type, service type, target customer, and so on. Based on the results of this classification, the similarities among the cases are calculated and visualized by using the Self-Organizing Map (SOM) technique. A SOM is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional) view from high-dimensional data. The visualization result is offered to designers in a form of a two-dimensional map, which is called the PSS Business Case Map. By using the map, designers can figure out the position of their current business and can acquire ideas for the servitization of their business.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 709-709
Author(s):  
Yan W. Asmann ◽  
Mariza de Andrade ◽  
Sumit Middha ◽  
Martha E. Matsumoto ◽  
Sebastian M. Armasu ◽  
...  

Abstract Abstract 709 Background: A recent analysis of merged genome-wide and candidate gene genotypes in VTE cases and controls identified multiple tag SNPs that were strongly associated with VTE. Objective: To identify rare and/or novel functional variants by sequencing the implicated genes. Methods: Cases (n=1488) were Mayo Clinic European-American patients of non-Hispanic ancestry with objectively-diagnosed VTE in the absence of active cancer, venous catheter or antiphospholipid antibodies. Controls (n=1439) were Mayo Clinic outpatients without VTE who were frequency-matched on case age, gender, race, MI/stroke status and state of residence. For this analysis, we selected a subset of these cases and controls for sequencing to take advantage of the joint configuration of two ABO SNPs of primary interest, rs8176719 (ABO exon 6 deletion determining type O blood group) and rs2519093 (ABO intron 1 tag SNP), which were previously shown to be strongly associated with VTE (p=5.7E-12 and p=3.0E-16, respectively). We randomly sampled 82 cases and 14 controls within 3 of the 9 potential allele frequency cells (Figure). The rs8176719 alleles are -−/−- (double deletion is the common allele), –/G, and G/G (the rare allele). The rs2519093 alleles are GG (G is the common allele), AG, and AA (A is the rare allele). For each SNP, the genotypes are represented as 0, 1, or 2 copies of the minor allele. We represented the joint allelic configuration of the two SNPs with the number of copies of the rs8176719 given first as 0/0 (both with 0 copies of the minor allele), 0/1, 0/2 (0 copies of the rs8176719 SNP), 1/0 (1 copy of the rare rs8176719 SNP), 1/1/, and 1/2, and 2/0 (2 copies of the rare allele for the rs8176719 SNP and 0 copies of the rare allele of the rs2519093 SNP), 2/1, and 2/2. From the Figure one observes discrepancies between cases and controls at the 0/0, 1/1 and 2/2 combinations. We randomly sampled from these three combinations, taking one third of the case series. For each SNP, we had 28 cases with 0/0 copies of the rare allele, 27 cases with 1/1 copies of the rare allele; and 27 cases with the combination of 2/2 copies of the rare allele. We compared these 82 cases with 14 controls that do not have any of these combinations. Sixteen genes were selected for deep sequencing, including 5 genes harboring SNPs significantly associated with VTE (F5, SLC19A2, ABO, NME7, ATP1B1), 10 genes with SNPs marginally associated with VTE (C1orf114, KLKB1, SELP, F11, SCUBE1, PRKCB1, CD44, ITPR1, GFRA1, BLZF1), and CYP4V2 which reportedly confounds F11 and KLKB1. Agilent SureSelect probes were designed to capture and enrich the ∼2 Mb genomic regions of these 16 genes. Samples were multiplexed (12-plex) and sequenced using Illumina HiSeq 2000. The sequence reads were aligned to the human genome build 36 using Burrows-Wheeler Aligner, and the single nucleotide variants (SNVs) and small INDELs were called using SNVMix and GATK, respectively. For this analysis, novel ABO SNVs were tested for an association with VTE using age-, sex-adjusted logistic regression and Fisher's Exact Test. Results: 98% of the targeted regions were sequenced with > 20X coverage. On average, ∼2500 SNVs and ∼200 INDELs were detected in each sample. Fifteen novel SNVs in intron 6 and 3' of the ABO gene were associated with VTE (p<E-06) and belonged to 3 distinctive LD blocks; none were in LD with the coding or tag ABO SNPs (rs8176719; rs2519093). SNVs inside the middle LD block at the 3' of ABO are located within an enhancer and promoter histone marked with putative transcription factor binding sites. In addition, strong evidence from both ENCODE and dbEST support the middle LD block as lying within a novel transcript, probably an extension of the 3' of ABO. In addition, we discovered a novel, significant, protective, frame-shifting single base (G) deletion at ABO chr9:135120877. Conclusion: Novel ABO functional variants that are associated with VTE were identified by deep sequencing. Disclosures: Heit: Daiichi Sankyo: Consultancy, Honoraria.


2020 ◽  
Vol 10 (5) ◽  
pp. 1797 ◽  
Author(s):  
Mera Kartika Delimayanti ◽  
Bedy Purnama ◽  
Ngoc Giang Nguyen ◽  
Mohammad Reza Faisal ◽  
Kunti Robiatul Mahmudah ◽  
...  

Manual classification of sleep stage is a time-consuming but necessary step in the diagnosis and treatment of sleep disorders, and its automation has been an area of active study. The previous works have shown that low dimensional fast Fourier transform (FFT) features and many machine learning algorithms have been applied. In this paper, we demonstrate utilization of features extracted from EEG signals via FFT to improve the performance of automated sleep stage classification through machine learning methods. Unlike previous works using FFT, we incorporated thousands of FFT features in order to classify the sleep stages into 2–6 classes. Using the expanded version of Sleep-EDF dataset with 61 recordings, our method outperformed other state-of-the art methods. This result indicates that high dimensional FFT features in combination with a simple feature selection is effective for the improvement of automated sleep stage classification.


Biology ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 522
Author(s):  
Régis Santos ◽  
Wendell Medeiros-Leal ◽  
Osman Crespo ◽  
Ana Novoa-Pabon ◽  
Mário Pinho

With the commercial fishery expansion to deeper waters, some vulnerable deep-sea species have been increasingly captured. To reduce the fishing impacts on these species, exploitation and management must be based on detailed and precise information about their biology. The common mora Mora moro has become the main deep-sea species caught by longliners in the Northeast Atlantic at depths between 600 and 1200 m. In the Azores, landings have more than doubled from the early 2000s to recent years. Despite its growing importance, its life history and population structure are poorly understood, and the current stock status has not been assessed. To better determine its distribution, biology, and long-term changes in abundance and size composition, this study analyzed a fishery-dependent and survey time series from the Azores. M. moro was found on mud and rock bottoms at depths below 300 m. A larger–deeper trend was observed, and females were larger and more abundant than males. The reproductive season took place from August to February. Abundance indices and mean sizes in the catch were marked by changes in fishing fleet operational behavior. M. moro is considered vulnerable to overfishing because it exhibits a long life span, a large size, slow growth, and a low natural mortality.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 743
Author(s):  
Xi Liu ◽  
Shuhang Chen ◽  
Xiang Shen ◽  
Xiang Zhang ◽  
Yiwen Wang

Neural signal decoding is a critical technology in brain machine interface (BMI) to interpret movement intention from multi-neural activity collected from paralyzed patients. As a commonly-used decoding algorithm, the Kalman filter is often applied to derive the movement states from high-dimensional neural firing observation. However, its performance is limited and less effective for noisy nonlinear neural systems with high-dimensional measurements. In this paper, we propose a nonlinear maximum correntropy information filter, aiming at better state estimation in the filtering process for a noisy high-dimensional measurement system. We reconstruct the measurement model between the high-dimensional measurements and low-dimensional states using the neural network, and derive the state estimation using the correntropy criterion to cope with the non-Gaussian noise and eliminate large initial uncertainty. Moreover, analyses of convergence and robustness are given. The effectiveness of the proposed algorithm is evaluated by applying it on multiple segments of neural spiking data from two rats to interpret the movement states when the subjects perform a two-lever discrimination task. Our results demonstrate better and more robust state estimation performance when compared with other filters.


Complexity ◽  
2003 ◽  
Vol 8 (4) ◽  
pp. 39-50 ◽  
Author(s):  
Stefan Häusler ◽  
Henry Markram ◽  
Wolfgang Maass

2021 ◽  
pp. 147387162110481
Author(s):  
Haijun Yu ◽  
Shengyang Li

Hyperspectral images (HSIs) have become increasingly prominent as they can maintain the subtle spectral differences of the imaged objects. Designing approaches and tools for analyzing HSIs presents a unique set of challenges due to their high-dimensional characteristics. An improved color visualization approach is proposed in this article to achieve communication between users and HSIs in the field of remote sensing. Under the real-time interactive control and color visualization, this approach can help users intuitively obtain the rich information hidden in original HSIs. Using the dimensionality reduction (DR) method based on band selection, high-dimensional HSIs are reduced to low-dimensional images. Through drop-down boxes, users can freely specify images that participate in the combination of RGB channels of the output image. Users can then interactively and independently set the fusion coefficient of each image within an interface based on concentric circles. At the same time, the output image will be calculated and visualized in real time, and the information it reflects will also be different. In this approach, channel combination and fusion coefficient setting are two independent processes, which allows users to interact more flexibly according to their needs. Furthermore, this approach is also applicable for interactive visualization of other types of multi-layer data.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4112 ◽  
Author(s):  
Se-Min Lim ◽  
Hyeong-Cheol Oh ◽  
Jaein Kim ◽  
Juwon Lee ◽  
Jooyoung Park

Recently, wearable devices have become a prominent health care application domain by incorporating a growing number of sensors and adopting smart machine learning technologies. One closely related topic is the strategy of combining the wearable device technology with skill assessment, which can be used in wearable device apps for coaching and/or personal training. Particularly pertinent to skill assessment based on high-dimensional time series data from wearable sensors is classifying whether a player is an expert or a beginner, which skills the player is exercising, and extracting some low-dimensional representations useful for coaching. In this paper, we present a deep learning-based coaching assistant method, which can provide useful information in supporting table tennis practice. Our method uses a combination of LSTM (Long short-term memory) with a deep state space model and probabilistic inference. More precisely, we use the expressive power of LSTM when handling high-dimensional time series data, and state space model and probabilistic inference to extract low-dimensional latent representations useful for coaching. Experimental results show that our method can yield promising results for characterizing high-dimensional time series patterns and for providing useful information when working with wearable IMU (Inertial measurement unit) sensors for table tennis coaching.


PLoS ONE ◽  
2016 ◽  
Vol 11 (5) ◽  
pp. e0154353 ◽  
Author(s):  
Carina Visser ◽  
Simon F. Lashmar ◽  
Este Van Marle-Köster ◽  
Mario A. Poli ◽  
Daniel Allain

Sign in / Sign up

Export Citation Format

Share Document