scholarly journals T-SNE visualization of large-scale neural recordings

2016 ◽  
Author(s):  
George Dimitriadis ◽  
Joana Neto ◽  
Adam R. Kampff

AbstractElectrophysiology is entering the era of ‘Big Data’. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, i.e. single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention [22, 23], but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grow exponentially.Here we introduce the t-student stochastic neighbor embedding (t-sne) dimensionality reduction method [27] as a visualization tool in the spike sorting process. T-sne embeds the n-dimensional extracellular spikes (n = number of features by which each spike is decomposed) into a low (usually two) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets both from hybrid [23] and paired juxtacellular/extracellular recordings [15]. We have released a graphical user interface (gui) written in python as a tool for the manual clustering of the t-sne embedded spikes and as a tool for an informed overview and fast manual curration of results from other clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.

2018 ◽  
Vol 30 (7) ◽  
pp. 1750-1774 ◽  
Author(s):  
George Dimitriadis ◽  
Joana P. Neto ◽  
Adam R. Kampff

Electrophysiology is entering the era of big data. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, that is, single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention (Rey, Pedreira, & Quian Quiroga, 2015 ; Rossant et al., 2016 ) but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grows exponentially. Here we introduce the [Formula: see text]-student stochastic neighbor embedding (t-SNE) dimensionality reduction method (Van der Maaten & Hinton, 2008 ) as a visualization tool in the spike sorting process. t-SNE embeds the [Formula: see text]-dimensional extracellular spikes ([Formula: see text] = number of features by which each spike is decomposed) into a low- (usually two-) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets from both hybrid (Rossant et al., 2016 ) and paired juxtacellular/extracellular recordings (Neto et al., 2016 ). We have released a graphical user interface (GUI) written in Python as a tool for the manual clustering of the t-SNE embedded spikes and as a tool for an informed overview and fast manual curation of results from different clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Stinus Lindgreen ◽  
Karen L. Adair ◽  
Paul P. Gardner

Abstract Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html


2019 ◽  
Author(s):  
Hao Chen ◽  
Shizhe Chen ◽  
Xinyi Deng

SummaryNeuropixels probes present exciting new opportunities for neuroscience, but such large-scale high-density recordings also introduce unprecedented challenges in data analysis. Neuropixels data usually consist of hundreds or thousands of long stretches of sequential spiking activities that evolve non-stationarily over time and are often governed by complex, unknown dynamics. Extracting meaningful information from the Neuropixels recordings is a non-trial task. Here we introduce a general-purpose, graph-based statistical framework that, without imposing any parametric assumptions, detects points in time at which population spiking activity exhibits simultaneous changes as well as changes that only occur in a subset of the neural population, referred to as “change-points”. The sequence of change-point events can be interpreted as a footprint of neural population activities, which allows us to relate behavior to simultaneously recorded high-dimensional neural activities across multiple brain regions. We demonstrate the effectiveness of our method with an analysis of Neuropixels recordings during spontaneous behavior of an awake mouse in darkness. We observe that change-point dynamics in some brain regions display biologically interesting patterns that hint at functional pathways, as well as temporally-precise coordination with behavioral dynamics. We hypothesize that neural activities underlying spontaneous behavior, though distributed brainwide, show evidences for network modularity. Moreover, we envision the proposed framework to be a useful off-the-shelf analysis tool to the neuroscience community as new electrophysiological recording techniques continue to drive an explosive proliferation in the number and size of data sets.


Author(s):  
Xiaojun Chen ◽  
Feiping Nie ◽  
Joshua Zhexue Huang ◽  
Min Yang

Many spectral clustering algorithms have been proposed and successfully applied to many high-dimensional applications. However, there are still two problems that need to be solved: 1) existing methods for obtaining the final clustering assignments may deviate from the true discrete solution, and 2) most of these methods usually have very high computational complexity. In this paper, we propose a Scalable Normalized Cut method for clustering of large scale data. In the new method, an efficient method is used to construct a small representation matrix and then clustering is performed on the representation matrix. In the clustering process, an improved spectral rotation method is proposed to obtain the solution of the final clustering assignments. A series of experimental were conducted on 14 benchmark data sets and the experimental results show the superior performance of the new method.


2020 ◽  
Vol 2 (2) ◽  
Author(s):  
Liang Chen ◽  
Weinan Wang ◽  
Yuyao Zhai ◽  
Minghua Deng

Abstract Single-cell RNA sequencing (scRNA-seq) allows researchers to study cell heterogeneity at the cellular level. A crucial step in analyzing scRNA-seq data is to cluster cells into subpopulations to facilitate subsequent downstream analysis. However, frequent dropout events and increasing size of scRNA-seq data make clustering such high-dimensional, sparse and massive transcriptional expression profiles challenging. Although some existing deep learning-based clustering algorithms for single cells combine dimensionality reduction with clustering, they either ignore the distance and affinity constraints between similar cells or make some additional latent space assumptions like mixture Gaussian distribution, failing to learn cluster-friendly low-dimensional space. Therefore, in this paper, we combine the deep learning technique with the use of a denoising autoencoder to characterize scRNA-seq data while propose a soft self-training K-means algorithm to cluster the cell population in the learned latent space. The self-training procedure can effectively aggregate the similar cells and pursue more cluster-friendly latent space. Our method, called ‘scziDesk’, alternately performs data compression, data reconstruction and soft clustering iteratively, and the results exhibit excellent compatibility and robustness in both simulated and real data. Moreover, our proposed method has perfect scalability in line with cell size on large-scale datasets.


2008 ◽  
Vol 7 (1) ◽  
pp. 18-33 ◽  
Author(s):  
Niklas Elmqvist ◽  
John Stasko ◽  
Philippas Tsigas

Supporting visual analytics of multiple large-scale multidimensional data sets requires a high degree of interactivity and user control beyond the conventional challenges of visualizing such data sets. We present the DataMeadow, a visual canvas providing rich interaction for constructing visual queries using graphical set representations called DataRoses. A DataRose is essentially a starplot of selected columns in a data set displayed as multivariate visualizations with dynamic query sliders integrated into each axis. The purpose of the DataMeadow is to allow users to create advanced visual queries by iteratively selecting and filtering into the multidimensional data. Furthermore, the canvas provides a clear history of the analysis that can be annotated to facilitate dissemination of analytical results to stakeholders. A powerful direct manipulation interface allows for selection, filtering, and creation of sets, subsets, and data dependencies. We have evaluated our system using a qualitative expert review involving two visualization researchers. Results from this review are favorable for the new method.


2015 ◽  
Author(s):  
Stinus Lindgreen ◽  
Karen L Adair ◽  
Paul Gardner

Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html


Author(s):  
Gourav Bathla ◽  
Himanshu Aggarwal ◽  
Rinkle Rani

Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.


2020 ◽  
Author(s):  
Nan Xu ◽  
Peter C. Doerschuk ◽  
Shella D. Keilholz ◽  
R. Nathan Spreng

AbstractThe macro-scale intrinsic functional network architecture of the human brain has been well characterized. Early studies revealed robust and enduring patterns of static connectivity, while more recent work has begun to explore the temporal dynamics of these large-scale brain networks. Little work to date has investigated directed connectivity within and between these networks, or the temporal patterns of afferent (input) and efferent (output) connections between network nodes. Leveraging a novel analytic approach, prediction correlation, we investigated the causal interactions within and between large-scale networks of the brain using resting-state fMRI. This technique allows us to characterize information transfer between brain regions in both the spatial (direction) and temporal (duration) scales. Using data from the Human Connectome Project (N=200) we applied prediction correlation techniques to four resting state fMRI runs (total TRs = 4800). Three central observations emerged. First, the strongest and longest duration connections were observed within the somatomotor, visual and dorsal attention networks. Second, the short duration connections were observed for high-degree nodes in the visual and default networks, as well as in hippocampus. Specifically, the connectivity profile of the highest-degree nodes was dominated by efferent connections to multiple cortical areas. Moderate high-degree nodes, particularly in hippocampal regions, showed an afferent connectivity profile. Finally, multimodal association nodes in lateral prefrontal brain regions demonstrated a short duration, bidirectional connectivity profile, consistent with this region’s role in integrative and modulatory processing. These results provide novel insights into the spatiotemporal dynamics of human brain function.


Sign in / Sign up

Export Citation Format

Share Document