T-SNE visualization of large-scale neural recordings

Mapping Intimacies ◽

10.1101/087395 ◽

2016 ◽

Cited By ~ 5

Author(s):

George Dimitriadis ◽

Joana Neto ◽

Adam R. Kampff

Keyword(s):

Large Scale ◽

New Technologies ◽

Dimensional Space ◽

Clustering Algorithms ◽

Brain Regions ◽

Data Sets ◽

Neural Recordings ◽

Sorting Problem ◽

Feature Spaces ◽

High Degree

AbstractElectrophysiology is entering the era of ‘Big Data’. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, i.e. single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention [22, 23], but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grow exponentially.Here we introduce the t-student stochastic neighbor embedding (t-sne) dimensionality reduction method [27] as a visualization tool in the spike sorting process. T-sne embeds the n-dimensional extracellular spikes (n = number of features by which each spike is decomposed) into a low (usually two) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets both from hybrid [23] and paired juxtacellular/extracellular recordings [15]. We have released a graphical user interface (gui) written in python as a tool for the manual clustering of the t-sne embedded spikes and as a tool for an informed overview and fast manual curration of results from other clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.

Download Full-text

t-SNE Visualization of Large-Scale Neural Recordings

Neural Computation ◽

10.1162/neco_a_01097 ◽

2018 ◽

Vol 30 (7) ◽

pp. 1750-1774 ◽

Cited By ~ 13

Author(s):

George Dimitriadis ◽

Joana P. Neto ◽

Adam R. Kampff

Keyword(s):

Large Scale ◽

New Technologies ◽

Dimensional Space ◽

Clustering Algorithms ◽

Brain Regions ◽

Data Sets ◽

Neural Recordings ◽

Sorting Problem ◽

Manual Curation ◽

Feature Spaces

Electrophysiology is entering the era of big data. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, that is, single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention (Rey, Pedreira, & Quian Quiroga, 2015 ; Rossant et al., 2016 ) but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grows exponentially. Here we introduce the [Formula: see text]-student stochastic neighbor embedding (t-SNE) dimensionality reduction method (Van der Maaten & Hinton, 2008 ) as a visualization tool in the spike sorting process. t-SNE embeds the [Formula: see text]-dimensional extracellular spikes ([Formula: see text] = number of features by which each spike is decomposed) into a low- (usually two-) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets from both hybrid (Rossant et al., 2016 ) and paired juxtacellular/extracellular recordings (Neto et al., 2016 ). We have released a graphical user interface (GUI) written in Python as a tool for the manual clustering of the t-SNE embedded spikes and as a tool for an informed overview and fast manual curation of results from different clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

An evaluation of the accuracy and speed of metagenome analysis tools

Scientific Reports ◽

10.1038/srep19233 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 187

Author(s):

Stinus Lindgreen ◽

Karen L. Adair ◽

Paul P. Gardner

Keyword(s):

Aquatic Ecosystems ◽

Large Scale ◽

High Throughput Sequencing ◽

Data Sets ◽

Metagenome Analysis ◽

Analysis Tools ◽

Sequencing Platforms ◽

Capacity Data ◽

High Degree ◽

Realistic Data

Abstract Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

A Universal Nonparametric Event Detection Framework for Neuropixels Data

10.1101/650671 ◽

2019 ◽

Author(s):

Hao Chen ◽

Shizhe Chen ◽

Xinyi Deng

Keyword(s):

Change Point ◽

Large Scale ◽

Brain Regions ◽

General Purpose ◽

Electrophysiological Recording ◽

Neural Population ◽

Data Sets ◽

Analysis Tool ◽

Neural Activities ◽

Spontaneous Behavior

SummaryNeuropixels probes present exciting new opportunities for neuroscience, but such large-scale high-density recordings also introduce unprecedented challenges in data analysis. Neuropixels data usually consist of hundreds or thousands of long stretches of sequential spiking activities that evolve non-stationarily over time and are often governed by complex, unknown dynamics. Extracting meaningful information from the Neuropixels recordings is a non-trial task. Here we introduce a general-purpose, graph-based statistical framework that, without imposing any parametric assumptions, detects points in time at which population spiking activity exhibits simultaneous changes as well as changes that only occur in a subset of the neural population, referred to as “change-points”. The sequence of change-point events can be interpreted as a footprint of neural population activities, which allows us to relate behavior to simultaneously recorded high-dimensional neural activities across multiple brain regions. We demonstrate the effectiveness of our method with an analysis of Neuropixels recordings during spontaneous behavior of an awake mouse in darkness. We observe that change-point dynamics in some brain regions display biologically interesting patterns that hint at functional pathways, as well as temporally-precise coordination with behavioral dynamics. We hypothesize that neural activities underlying spontaneous behavior, though distributed brainwide, show evidences for network modularity. Moreover, we envision the proposed framework to be a useful off-the-shelf analysis tool to the neuroscience community as new electrophysiological recording techniques continue to drive an explosive proliferation in the number and size of data sets.

Download Full-text

Scalable Normalized Cut with Improved Spectral Rotation

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/210 ◽

2017 ◽

Cited By ~ 7

Author(s):

Xiaojun Chen ◽

Feiping Nie ◽

Joshua Zhexue Huang ◽

Min Yang

Keyword(s):

Large Scale ◽

Clustering Algorithms ◽

New Method ◽

Superior Performance ◽

Data Sets ◽

Normalized Cut ◽

Large Scale Data ◽

Rotation Method ◽

Cut Method ◽

Scale Data

Many spectral clustering algorithms have been proposed and successfully applied to many high-dimensional applications. However, there are still two problems that need to be solved: 1) existing methods for obtaining the final clustering assignments may deviate from the true discrete solution, and 2) most of these methods usually have very high computational complexity. In this paper, we propose a Scalable Normalized Cut method for clustering of large scale data. In the new method, an efficient method is used to construct a small representation matrix and then clustering is performed on the representation matrix. In the clustering process, an improved spectral rotation method is proposed to obtain the solution of the final clustering assignments. A series of experimental were conducted on 14 benchmark data sets and the experimental results show the superior performance of the new method.

Download Full-text

Deep soft K-means clustering with self-training for single-cell RNA sequence data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa039 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 2

Author(s):

Liang Chen ◽

Weinan Wang ◽

Yuyao Zhai ◽

Minghua Deng

Keyword(s):

Deep Learning ◽

Single Cell ◽

Large Scale ◽

Sequence Data ◽

Dimensional Space ◽

Expression Profiles ◽

Single Cells ◽

Clustering Algorithms ◽

Training Procedure ◽

Latent Space

Abstract Single-cell RNA sequencing (scRNA-seq) allows researchers to study cell heterogeneity at the cellular level. A crucial step in analyzing scRNA-seq data is to cluster cells into subpopulations to facilitate subsequent downstream analysis. However, frequent dropout events and increasing size of scRNA-seq data make clustering such high-dimensional, sparse and massive transcriptional expression profiles challenging. Although some existing deep learning-based clustering algorithms for single cells combine dimensionality reduction with clustering, they either ignore the distance and affinity constraints between similar cells or make some additional latent space assumptions like mixture Gaussian distribution, failing to learn cluster-friendly low-dimensional space. Therefore, in this paper, we combine the deep learning technique with the use of a denoising autoencoder to characterize scRNA-seq data while propose a soft self-training K-means algorithm to cluster the cell population in the learned latent space. The self-training procedure can effectively aggregate the similar cells and pursue more cluster-friendly latent space. Our method, called ‘scziDesk’, alternately performs data compression, data reconstruction and soft clustering iteratively, and the results exhibit excellent compatibility and robustness in both simulated and real data. Moreover, our proposed method has perfect scalability in line with cell size on large-scale datasets.

Download Full-text

DataMeadow: A Visual Canvas for Analysis of Large-Scale Multivariate Data

Information Visualization ◽

10.1057/palgrave.ivs.9500170 ◽

2008 ◽

Vol 7 (1) ◽

pp. 18-33 ◽

Cited By ~ 34

Author(s):

Niklas Elmqvist ◽

John Stasko ◽

Philippas Tsigas

Keyword(s):

Visual Analytics ◽

Large Scale ◽

Multidimensional Data ◽

Data Sets ◽

Data Set ◽

Data Dependencies ◽

Expert Review ◽

History Of ◽

Multidimensional Data Sets ◽

High Degree

Supporting visual analytics of multiple large-scale multidimensional data sets requires a high degree of interactivity and user control beyond the conventional challenges of visualizing such data sets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a data set displayed as multivariate visualizations with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by iteratively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to stakeholders. A powerful direct manipulation interface allows for selection, filtering, and creation of sets, subsets, and data dependencies. We have evaluated our system using a qualitative expert review involving two visualization researchers. Results from this review are favorable for the new method.

Download Full-text

An evaluation of the accuracy and speed of metagenome analysis tools

10.1101/017830 ◽

2015 ◽

Cited By ~ 10

Author(s):

Stinus Lindgreen ◽

Karen L Adair ◽

Paul Gardner

Keyword(s):

Aquatic Ecosystems ◽

Large Scale ◽

High Throughput Sequencing ◽

State Of The Art ◽

Data Sets ◽

Metagenome Analysis ◽

Analysis Tools ◽

Sequencing Platforms ◽

High Degree ◽

Realistic Data

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html

Download Full-text

A Novel Approach for Clustering Big Data based on MapReduce

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i3.pp1711-1719 ◽

2018 ◽

Vol 8 (3) ◽

pp. 1711 ◽

Cited By ~ 1

Author(s):

Gourav Bathla ◽

Himanshu Aggarwal ◽

Rinkle Rani

Keyword(s):

Big Data ◽

Categorical Data ◽

Large Scale ◽

Clustering Algorithms ◽

Numerical Data ◽

Large Data ◽

Data Sets ◽

Single Node ◽

Novel Approach ◽

Network Analytics

Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.

Download Full-text

Spatiotemporal functional interactivity among large-scale brain networks

10.1101/2020.04.14.041830 ◽

2020 ◽

Author(s):

Nan Xu ◽

Peter C. Doerschuk ◽

Shella D. Keilholz ◽

R. Nathan Spreng

Keyword(s):

Human Brain ◽

Short Duration ◽

Resting State ◽

Large Scale ◽

Brain Networks ◽

Resting State Fmri ◽

Brain Regions ◽

Spatial Direction ◽

Macro Scale ◽

High Degree

AbstractThe macro-scale intrinsic functional network architecture of the human brain has been well characterized. Early studies revealed robust and enduring patterns of static connectivity, while more recent work has begun to explore the temporal dynamics of these large-scale brain networks. Little work to date has investigated directed connectivity within and between these networks, or the temporal patterns of afferent (input) and efferent (output) connections between network nodes. Leveraging a novel analytic approach, prediction correlation, we investigated the causal interactions within and between large-scale networks of the brain using resting-state fMRI. This technique allows us to characterize information transfer between brain regions in both the spatial (direction) and temporal (duration) scales. Using data from the Human Connectome Project (N=200) we applied prediction correlation techniques to four resting state fMRI runs (total TRs = 4800). Three central observations emerged. First, the strongest and longest duration connections were observed within the somatomotor, visual and dorsal attention networks. Second, the short duration connections were observed for high-degree nodes in the visual and default networks, as well as in hippocampus. Specifically, the connectivity profile of the highest-degree nodes was dominated by efferent connections to multiple cortical areas. Moderate high-degree nodes, particularly in hippocampal regions, showed an afferent connectivity profile. Finally, multimodal association nodes in lateral prefrontal brain regions demonstrated a short duration, bidirectional connectivity profile, consistent with this region’s role in integrative and modulatory processing. These results provide novel insights into the spatiotemporal dynamics of human brain function.

Download Full-text