A Novel Reduction Circuit Based on Binary Tree Path Partition on FPGAs

Linhuai Tang; Zhihong Huang; Gang Cai; Yong Zheng; Jiamin Chen

doi:10.3390/a14020030

A Novel Reduction Circuit Based on Binary Tree Path Partition on FPGAs

Algorithms ◽

10.3390/a14020030 ◽

2021 ◽

Vol 14 (2) ◽

pp. 30

Author(s):

Linhuai Tang ◽

Zhihong Huang ◽

Gang Cai ◽

Yong Zheng ◽

Jiamin Chen

Keyword(s):

Binary Tree ◽

High Performance ◽

Large Scale ◽

Partial Result ◽

Data Set ◽

Reduction Methods ◽

Multiple Data Sets ◽

Tree Path ◽

Scale Matrix ◽

Path Partition

Due to high parallelism, field-programmable gate arrays are widely used as accelerators in engineering and scientific fields, which involve a large number of operations of vector and matrix. High-performance accumulation circuits are the key to large-scale matrix operations. By selecting the adder as the reduction operator, the reduction circuit can implement the accumulation function. However, the pipelined adder will bring challenges to the design of the reduction circuit. To solve this problem, we propose a novel reduction circuit based on binary tree path partition, which can simultaneously handle multiple data sets with arbitrary lengths. It divides the input data into multiple groups and sends them to different iterations for calculation. The elements belonging to the same data set in each group are added to obtain a partial result, and the partial results of the same data set are added to achieve the final result. Compared with other reduction methods, it has the least area-time product.

Download Full-text

Visualising large-scale geodynamic simulations: How to Dive into Earth's Mantle with Virtual Reality

10.5194/egusphere-egu2020-5714 ◽

2020 ◽

Author(s):

Markus Wiedemann ◽

Bernhard S.A. Schuberth ◽

Lorenzo Colli ◽

Hans-Peter Bunge ◽

Dieter Kranzlmüller

Keyword(s):

Virtual Reality ◽

Real Time ◽

High Performance ◽

Large Scale ◽

Large Data ◽

Optimization Techniques ◽

Physical Parameters ◽

Mantle Flow ◽

Data Set ◽

Precise Knowledge

Precise knowledge of the forces acting at the base of tectonic plates is of fundamental importance, but models of mantle dynamics are still often qualitative in nature to date. One particular problem is that we cannot access the deep interior of our planet and can therefore not make direct in situ measurements of the relevant physical parameters. Fortunately, modern software and powerful high-performance computing infrastructures allow us to generate complex three-dimensional models of the time evolution of mantle flow through large-scale numerical simulations.In this project, we aim at visualizing the resulting convective patterns that occur thousands of kilometres below our feet and to make them "accessible" using high-end virtual reality techniques.Models with several hundred million grid cells are nowadays possible using the modern supercomputing facilities, such as those available at the Leibniz Supercomputing Centre. These models provide quantitative estimates on the inaccessible parameters, such as buoyancy and temperature, as well as predictions of the associated gravity field and seismic wavefield that can be tested against Earth observations.3-D visualizations of the computed physical parameters allow us to inspect the models such as if one were actually travelling down into the Earth. This way, convective processes that occur thousands of kilometres below our feet are virtually accessible by combining the simulations with high-end VR techniques.The large data set used here poses severe challenges for real time visualization, because it cannot fit into graphics memory, while requiring rendering with strict deadlines. This raises the necessity to balance the amount of displayed data versus the time needed for rendering it.As a solution, we introduce a rendering framework and describe our workflow that allows us to visualize this geoscientific dataset. Our example exceeds 16 TByte in size, which is beyond the capabilities of most visualization tools. To display this dataset in real-time, we reduce and declutter the dataset through isosurfacing and mesh optimization techniques.Our rendering framework relies on multithreading and data decoupling mechanisms that allow to upload data to graphics memory while maintaining high frame rates. The final visualization application can be executed in a CAVE installation as well as on head mounted displays such as the HTC Vive or Oculus Rift. The latter devices will allow for viewing our example on-site at the EGU conference.

Download Full-text

An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network

BMC Bioinformatics ◽

10.1186/s12859-021-04553-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Hanjing Jiang ◽

Yabing Huang

Keyword(s):

High Performance ◽

Large Scale ◽

Representation Learning ◽

Biological Data ◽

Graph Representation ◽

Data Set ◽

Validation Experiment ◽

Biomolecular Network ◽

Disease Associations ◽

Drug Reposition

Abstract Background Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. Results In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. Conclusions The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.

Download Full-text

Ovarian Cancer Prediction Using PCA, K-PCA, ICA and Random Forest

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.202112168 ◽

2021 ◽

pp. 103-108

Author(s):

Asiye Sahin ◽

Nermin Ozcan ◽

Gokhan Nur

Keyword(s):

Ovarian Cancer ◽

Dimension Reduction ◽

Large Scale ◽

Time Saving ◽

Data Set ◽

Cancer Prediction ◽

Reduction Methods ◽

Advanced Stages ◽

Model Training ◽

Post Menopausal

Ovarian cancer, which is the most common in women and occurs mostly in the post-menopausal period, develops with the uncontrolled proliferation of the cells in the ovaries and the formation of tumors. Early diagnosis is very difficult and in most cases, it is a type of cancer that is in advanced stages when first diagnosed. While it tends to be treated successfully in the early stages where it is confined to the ovary, it is more difficult to treat in the advanced stages and is often fatal. For this reason, it has been focused on studies that predict whether people have ovarian cancer. In our study, we designed a RF-based ovarian cancer prediction model using a data set consisting of 49 features including blood routine tests, general chemistry tests and tumor marker data of 349 real patients. Since the data set containing too many dimensions will increase the time and resources that need to be spent, we reduced the dimension of the data with PCA, K-PCA and ICA methods and examined its effect on the result and time saving. The best result was obtained with a score of 0.895 F1 by using the new smaller-sized data obtained by the PCA method, in which the dimension was reduced from 49 to 6, in the RF method, and the training of the model took 18.191 seconds. This result was both better as a success and more economical in terms of time spent during model training compared to the prediction made over larger data with 49 features, where no dimension reduction method was used. The study has shown that in predictions made with machine learning models over large-scale medical data, dimension reduction methods will provide advantages in terms of time and resources by improving the prediction results.

Download Full-text

An FPGA Implementation of Deep Spiking Neural Networks for Low-Power and Fast Classification

Neural Computation ◽

10.1162/neco_a_01245 ◽

2020 ◽

Vol 32 (1) ◽

pp. 182-204 ◽

Cited By ~ 3

Author(s):

Xiping Ju ◽

Biao Fang ◽

Rui Yan ◽

Xiaoliang Xu ◽

Huajin Tang

Keyword(s):

Neural Networks ◽

High Performance ◽

Large Scale ◽

Hardware Architecture ◽

Clock Frequency ◽

Data Set ◽

Speed Up ◽

Fast Classification ◽

Spike Signals ◽

Gpu Implementation

A spiking neural network (SNN) is a type of biological plausibility model that performs information processing based on spikes. Training a deep SNN effectively is challenging due to the nondifferention of spike signals. Recent advances have shown that high-performance SNNs can be obtained by converting convolutional neural networks (CNNs). However, the large-scale SNNs are poorly served by conventional architectures due to the dynamic nature of spiking neurons. In this letter, we propose a hardware architecture to enable efficient implementation of SNNs. All layers in the network are mapped on one chip so that the computation of different time steps can be done in parallel to reduce latency. We propose new spiking max-pooling method to reduce computation complexity. In addition, we apply approaches based on shift register and coarsely grained parallels to accelerate convolution operation. We also investigate the effect of different encoding methods on SNN accuracy. Finally, we validate the hardware architecture on the Xilinx Zynq ZCU102. The experimental results on the MNIST data set show that it can achieve an accuracy of 98.94% with eight-bit quantized weights. Furthermore, it achieves 164 frames per second (FPS) under 150 MHz clock frequency and obtains 41[Formula: see text] speed-up compared to CPU implementation and 22 times lower power than GPU implementation.

Download Full-text

Computation of Engine Noise Propagation and Scattering off An Aircraft

International Journal of Aeroacoustics ◽

10.1260/147547202765275989 ◽

2002 ◽

Vol 1 (4) ◽

pp. 403-420 ◽

Cited By ~ 18

Author(s):

D. Stanescu ◽

J. Xu ◽

M.Y. Hussaini ◽

F. Farassat

Keyword(s):

Experimental Data ◽

Time Domain ◽

High Performance ◽

Large Scale ◽

Numerical Algorithms ◽

Spectral Element ◽

Spectral Element Methods ◽

Data Set ◽

Noise Field ◽

High Performance Computers

The purpose of this paper is to demonstrate the feasibility of computing the fan inlet noise field around a real twin-engine aircraft, which includes the radiation of the main spinning modes from the engine as well as the reflection and scattering by the fuselage and the wing. This first-cut large-scale computation is based on time domain and frequency domain approaches that employ spectral element methods for spatial discretization. The numerical algorithms are designed to exploit high-performance computers such as the IBM SP4. Although the simulations could not match the exact conditions of the only available experimental data set, they are able to predict the trends of the measured noise field fairly well.

Download Full-text

DeGNServer: Deciphering Genome-Scale Gene Networks through High Performance Reverse Engineering Analysis

BioMed Research International ◽

10.1155/2013/856325 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 15

Author(s):

Jun Li ◽

Hairong Wei ◽

Patrick Xuechun Zhao

Keyword(s):

Reverse Engineering ◽

Sample Size ◽

Gene Networks ◽

High Performance ◽

Large Scale ◽

Small Sample ◽

Biological Processes ◽

Expression Data ◽

Data Set ◽

Genome Scale

Analysis of genome-scale gene networks (GNs) using large-scale gene expression data provides unprecedented opportunities to uncover gene interactions and regulatory networks involved in various biological processes and developmental programs, leading to accelerated discovery of novel knowledge of various biological processes, pathways and systems. The widely used context likelihood of relatedness (CLR) method based on the mutual information (MI) for scoring the similarity of gene pairs is one of the accurate methods currently available for inferring GNs. However, the MI-based reverse engineering method can achieve satisfactory performance only when sample size exceeds one hundred. This in turn limits their applications for GN construction from expression data set with small sample size. We developed a high performance web server, DeGNServer, to reverse engineering and decipher genome-scale networks. It extended the CLR method by integration of different correlation methods that are suitable for analyzing data sets ranging from moderate to large scale such as expression profiles with tens to hundreds of microarray hybridizations, and implemented all analysis algorithms using parallel computing techniques to infer gene-gene association at extraordinary speed. In addition, we integrated the SNBuilder and GeNa algorithms for subnetwork extraction and functional module discovery. DeGNServer is publicly and freely available online.

Download Full-text

Parsing Noun Phrases in the Penn Treebank

Computational Linguistics ◽

10.1162/coli_a_00076 ◽

2011 ◽

Vol 37 (4) ◽

pp. 753-809 ◽

Cited By ~ 3

Author(s):

David Vadas ◽

James R. Curran

Keyword(s):

Natural Language ◽

Language Processing ◽

Gold Standard ◽

High Performance ◽

Large Scale ◽

Complex Structure ◽

Noun Phrases ◽

Simple Task ◽

Data Set ◽

Lexical Information

Noun phrases (nps) are a crucial part of natural language, and can have a very complex structure. However, this np structure is largely ignored by the statistical parsing field, as the most widely used corpus is not annotated with it. This lack of gold-standard data has restricted previous efforts to parse nps, making it impossible to perform the supervised experiments that have achieved high performance in so many Natural Language Processing (nlp) tasks. We comprehensively solve this problem by manually annotating np structure for the entire Wall Street Journal section of the Penn Treebank. The inter-annotator agreement scores that we attain dispel the belief that the task is too difficult, and demonstrate that consistent np annotation is possible. Our gold-standard np data is now available for use in all parsers. We experiment with this new data, applying the Collins (2003) parsing model, and find that its recovery of np structure is significantly worse than its overall performance. The parser's F-score is up to 5.69% lower than a baseline that uses deterministic rules. Through much experimentation, we determine that this result is primarily caused by a lack of lexical information. To solve this problem we construct a wide-coverage, large-scale np Bracketing system. With our Penn Treebank data set, which is orders of magnitude larger than those used previously, we build a supervised model that achieves excellent results. Our model performs at 93.8% F-score on the simple task that most previous work has undertaken, and extends to bracket longer, more complex nps that are rarely dealt with in the literature. We attain 89.14% F-score on this much more difficult task. Finally, we implement a post-processing module that brackets nps identified by the Bikel (2004) parser. Our np Bracketing model includes a wide variety of features that provide the lexical information that was missing during the parser experiments, and as a result, we outperform the parser's F-score by 9.04%. These experiments demonstrate the utility of the corpus, and show that many nlp applications can now make use of np structure.

Download Full-text

DEVELOPMENT OF HIGH-PERFORMANCE AND LARGE-SCALE VIETNAMESE AUTOMATIC SPEECH RECOGNITION SYSTEMS

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/34/4/13165 ◽

2019 ◽

Vol 34 (4) ◽

pp. 335-348

Author(s):

Do Quoc Truong ◽

Pham Ngoc Phuong ◽

Tran Hoang Tung ◽

Luong Chi Mai

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Call Center ◽

High Performance ◽

Large Scale ◽

System Development ◽

Large Data ◽

Training Data ◽

Data Set ◽

Wide Range

Automatic Speech Recognition (ASR) systems convert human speech into the corresponding transcription automatically. They have a wide range of applications such as controlling robots, call center analytics, voice chatbot. Recent studies on ASR for English have achieved the performance that surpasses human ability. The systems were trained on a large amount of training data and performed well under many environments. With regards to Vietnamese, there have been many studies on improving the performance of existing ASR systems, however, many of them are conducted on a small-scaled data, which does not reflect realistic scenarios. Although the corpora used to train the system were carefully design to maintain phonetic balance properties, efforts in collecting them at a large-scale are still limited. Specifically, only a certain accent of Vietnam was evaluated in existing works. In this paper, we first describe our efforts in collecting a large data set that covers all 3 major accents of Vietnam located in the Northern, Center, and Southern regions. Then, we detail our ASR system development procedure utilizing the collected data set and evaluating different model architectures to find the best structure for Vietnamese. In the VLSP 2018 challenge, our system achieved the best performance with 6.5% WER and on our internal test set with more than 10 hours of speech collected real environments, the system also performs well with 11% WER

Download Full-text

Differentially-Private Multi-Party Sketching for Large-Scale Statistics

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2020-0047 ◽

2020 ◽

Vol 2020 (3) ◽

pp. 153-174

Author(s):

Seung Geol Choi ◽

Dana Dachman-soled ◽

Mukul Kulkarni ◽

Arkady Yerukhimovich

Keyword(s):

Hash Function ◽

Large Scale ◽

Differential Privacy ◽

Sensitive Data ◽

Data Set ◽

Multiple Data ◽

Trade Offs ◽

Input Size ◽

Address Data ◽

Multiple Data Sets

AbstractWe consider a scenario where multiple organizations holding large amounts of sensitive data from their users wish to compute aggregate statistics on this data while protecting the privacy of individual users. To support large-scale analytics we investigate how this privacy can be provided for the case of sketching algorithms running in time sub-linear of the input size.We begin with the well-known LogLog sketch for computing the number of unique elements in a data stream. We show that this algorithm already achieves differential privacy (even without adding any noise) when computed using a private hash function by a trusted curator. Next, we show how to eliminate this requirement of a private hash function by injecting a small amount of noise, allowing us to instantiate an efficient LogLog protocol for the multi-party setting. To demonstrate the practicality of this approach, we run extensive experimentation on multiple data sets, including the publicly available IP address data set from University of Michigan’s scans of internet IPv4 space, to determine the trade-offs among efficiency, privacy and accuracy of our implementation for varying numbers of parties and input sizes.Finally, we generalize our approach for the LogLog sketch and obtain a general framework for constructing multi-party differentially private protocols for several other sketching algorithms.

Download Full-text

DeepOMe: A Web Server for the Prediction of 2′-O-Me Sites Based on the Hybrid CNN and BLSTM Architecture

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.686894 ◽

2021 ◽

Vol 9 ◽

Author(s):

Hongyu Li ◽

Li Chen ◽

Zaoli Huang ◽

Xiaotong Luo ◽

Huiqin Li ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Short Term Memory ◽

Learning Algorithm ◽

Web Server ◽

High Sensitivity ◽

Computational Method ◽

Regulatory Control ◽

Data Set ◽

Deep Learning Algorithm

2′-O-methylations (2′-O-Me or Nm) are one of the most important layers of regulatory control over gene expression. With increasing attentions focused on the characteristics, mechanisms and influences of 2′-O-Me, a revolutionary technique termed Nm-seq were established, allowing the identification of precise 2′-O-Me sites in RNA sequences with high sensitivity. However, as the costs and complexities involved with this new method, the large-scale detection and in-depth study of 2′-O-Me is still largely limited. Therefore, the development of a novel computational method to identify 2′-O-Me sites with adequate reliability is urgently needed at the current stage. To address the above issue, we proposed a hybrid deep-learning algorithm named DeepOMe that combined Convolutional Neural Networks (CNN) and Bidirectional Long Short-term Memory (BLSTM) to accurately predict 2′-O-Me sites in human transcriptome. Validating under 4-, 6-, 8-, and 10-fold cross-validation, we confirmed that our proposed model achieved a high performance (AUC close to 0.998 and AUPR close to 0.880). When testing in the independent data set, DeepOMe was substantially superior to NmSEER V2.0. To facilitate the usage of DeepOMe, a user-friendly web-server was constructed, which can be freely accessed at http://deepome.renlab.org.

Download Full-text