Memory-driven computing accelerates genomic data processing

Mapping Intimacies ◽

10.1101/519579 ◽

2019 ◽

Cited By ~ 4

Author(s):

Matthias Becker ◽

Milind Chabbi ◽

Stefanie Warnat-Herresthal ◽

Kathrin Klee ◽

Jonas Schulte-Schrepping ◽

...

Keyword(s):

Energy Consumption ◽

Data Processing ◽

Data Privacy ◽

Large Scale ◽

Transcriptome Assembly ◽

Primary Data ◽

Attractive Alternative ◽

Rna Seq ◽

Local Data ◽

Ngs Data

Next generation sequencing (NGS) is the driving force behind precision medicine and is revolutionizing most, if not all, areas of the life sciences. Particularly when targeting the major common diseases, an exponential growth of NGS data is foreseen for the next decades. This enormous increase of NGS data and the need to process the data quickly for real-world applications requires to rethink our current compute infrastructures. Here we provide evidence that memory-driven computing (MDC), a novel memory-centric hardware architecture, is an attractive alternative to current processor-centric compute infrastructures. To illustrate how MDC can change NGS data handling, we used RNA-seq assembly and pseudoalignment followed by quantification as two first examples. Adapting transcriptome assembly pipelines for MDC reduced compute time by 5.9-fold for the first step (SAMtools). Even more impressive, pseudoalignment by near-optimal probabilistic RNA-seq quantification (kallisto) was accelerated by more than two orders of magnitude with identical accuracy and indicated 66% reduced energy consumption. One billion RNA-seq reads were processed in just 92 seconds. Clearly, MDC simultaneously reduces data processing time and energy consumption. Together with the MDC-inherent solutions for local data privacy, a new compute model can be projected pushing large scale NGS data processing and primary data analytics closer to the edge by directly combining high-end sequencers with local MDC, thereby also reducing movement of large raw data to central cloud storage. We further envision that other data-rich areas will similarly benefit from this new memory-centric compute architecture.

Download Full-text

Large Scale Hotel Resource Retrieval Algorithm Based on Characteristic Threshold Extraction

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2022.16.4 ◽

2022 ◽

Vol 16 ◽

pp. 26-31

Author(s):

Min Fang

Keyword(s):

Energy Consumption ◽

Data Processing ◽

Large Scale ◽

High Energy ◽

Error Component ◽

Retrieval Algorithm ◽

Data Detection ◽

Redundant Data ◽

Variable Window ◽

Resource Data

At present, the hotel resource retrieval algorithm has the problem of low retrieval efficiency, low accuracy, low security and high energy consumption, and this study proposes a large scale hotel resource retrieval algorithm based on characteristic threshold extraction. In the large-scale hotel resource data, the mass sequence is decomposed into periodic component, trend component, random error component and burst component. Different components are extracted, the singular point detection is realized by the extraction results, and the abnormal data in the hotel resource data are obtained. Based on the attribute of hotel resource data, the data similarity is processed with variable window, the total similarity of data is obtained, and the abnormal detection of redundant resource data is realized. The abnormal data detection results and redundant data detection results are substituted into the space-time filter, and the data processing is completed. The retrieval problem is identified, and the data processing results are replaced in the hotel resource retrieval based on the characteristic threshold extraction to achieve the normalization of data source and rule knowledge. The characteristic threshold and retrieval strategy are determined, and data fusion reasoning is carried out. After repeated iteration, effective solutions are obtained. The effective solution is fused to get the best retrieval result. Experimental results showed that the algorithm has higher retrieval accuracy, efficiency and security coefficient, and the average search energy consumption is 56n J/bit.

Download Full-text

DNA Sequence Chromatogram Browsing Using JAVA and CORBA

Genome Research ◽

10.1101/gr.9.3.277 ◽

1999 ◽

Vol 9 (3) ◽

pp. 277-281 ◽

Cited By ~ 1

Author(s):

Jeremy D. Parsons ◽

Eugen Buehler ◽

LaDeana Hillier

Keyword(s):

Dna Sequence ◽

Large Scale ◽

Primary Data ◽

Local Data ◽

Contig Assembly ◽

Client Server ◽

Link Type ◽

Data Source ◽

Washington University ◽

Expressed Sequence

DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence.[The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/∼jparsons. Links to working examples of the trace viewers can be found athttp://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.]

Download Full-text

High-confidence Coding and Noncoding Transcriptome Maps

10.1101/109363 ◽

2017 ◽

Author(s):

Bo-Hyun You ◽

Sang-Ho Yoon ◽

Jin-Wu Nam

Keyword(s):

Large Scale ◽

Transcriptome Assembly ◽

Likelihood Estimation ◽

The Cancer Genome Atlas ◽

Rna Seq ◽

High Performing ◽

Transcription Start Sites ◽

Assembly Pipeline ◽

Cancer Genome Atlas ◽

Cleavage And Polyadenylation

AbstractThe advent of high-throughput RNA-sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising ninety-nine billion RNAs-seq reads from the ENCODE, human BodyMap projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about eighty-nine billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.

Download Full-text

A Secured Data Processing Technique for Effective Utilization of Cloud Computing

Journal of Data Mining & Digital Humanities ◽

10.46298/jdmdh.4085 ◽

2018 ◽

Vol Special Issue on Scientific... ◽

Author(s):

Mbarek Marwan ◽

Ali Kartit ◽

Hassan Ouahmane

Keyword(s):

Data Processing ◽

Digital Humanities ◽

Data Privacy ◽

High Performance ◽

Processing Technique ◽

Cloud Services ◽

It Services ◽

Local Data ◽

Sensitive Data ◽

Analytical Tools

International audience Digital humanities require IT Infrastructure and sophisticated analytical tools, including datavisualization, data mining, statistics, text mining and information retrieval. Regarding funding, tobuild a local data center will necessitate substantial investments. Fortunately, there is another optionthat will help researchers take advantage of these IT services to access, use and share informationeasily. Cloud services ideally offer on-demand software and resources over the Internet to read andanalyze ancient documents. More interestingly, billing system is completely flexible and based onresource usage and Quality of Service (QoS) level. In spite of its multiple advantages, outsourcingcomputations to an external provider arises several challenges. Specifically, security is the majorfactor hindering the widespread acceptance of this new concept. As a case study, we review the use ofcloud computing to process digital images safely. Recently, various solutions have been suggested tosecure data processing in cloud environement. Though, ensuring privacy and high performance needsmore improvements to protect the organization's most sensitive data. To this end, we propose aframework based on segmentation and watermarking techniques to ensure data privacy. In this respect,segementation algorithm is used to to protect client's data against untauhorized access, whilewatermarking method determines and maintains ownership. Consequentely, this framework willincrease the speed of development on ready-to-use digital humanities tools.

Download Full-text

A systematic NGS-based approach for contaminant detection and functional inference

10.1101/741934 ◽

2019 ◽

Author(s):

Sung-Joon Park ◽

Satoru Onizuka ◽

Masahide Seki ◽

Yutaka Suzuki ◽

Takanori Iwata ◽

...

Keyword(s):

Large Scale ◽

Precise Determination ◽

Host Cells ◽

Rna Seq ◽

Functional Inference ◽

Apoptotic Pathways ◽

Multiple Species ◽

Next Generation Sequencing Ngs ◽

Ngs Data

AbstractBackgroundMicrobial contamination impedes successful biological and biomedical research. Computational approaches utilizing next-generation sequencing (NGS) data offer promising diagnostics to assess the presence of contaminants. However, as host cells are often contaminated by multiple microorganisms, these approaches require careful attention to intra- and interspecies sequence similarities, which have not yet been fully addressed.ResultsWe present a computational approach that rigorously investigates the genomic origins of sequenced reads, including those mapped to multiple species that have been discarded in previous studies. Through the analysis of large-scale synthetic and public NGS samples, we approximated that 1,000−100,000 microbial reads prevail when one million host reads are sequenced by RNA-seq. The microbe catalog we established included Cutibacterium as a prevalent contaminant, suggesting that contamination mostly originates from the laboratory environment. Importantly, by applying a systematic method to infer the functional impact of contamination, we revealed that host-contaminant interactions cause profound changes in the host molecular landscapes, as exemplified by changes in inflammatory and apoptotic pathways during Mycoplasma infection.ConclusionsThese findings reinforce the concept that precise determination of the origins and functional impacts of contamination is imperative for quality research and illustrate the usefulness of the proposed approach to comprehensively characterize contamination landscapes.

Download Full-text

Interference mitigation in point to point wireless sensor networks using LSP protocol and time division multiplexing approach

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189028 ◽

2020 ◽

Vol 39 (4) ◽

pp. 5449-5458

Author(s):

A. Arokiaraj Jovith ◽

S.V. Kasmir Raja ◽

A. Razia Sulthana

Keyword(s):

Energy Consumption ◽

Large Scale ◽

Control Parameter ◽

Interference Mitigation ◽

Wireless Sensor ◽

Point To Point ◽

A Minor ◽

Two Stages ◽

Time Division ◽

And Control

Interference in Wireless Sensor Network (WSN) predominantly affects the performance of the WSN. Energy consumption in WSN is one of the greatest concerns in the current generation. This work presents an approach for interference measurement and interference mitigation in point to point network. The nodes are distributed in the network and interference is measured by grouping the nodes in the region of a specific diameter. Hence this approach is scalable and isextended to large scale WSN. Interference is measured in two stages. In the first stage, interference is overcome by allocating time slots to the node stations in Time Division Multiple Access (TDMA) fashion. The node area is split into larger regions and smaller regions. The time slots are allocated to smaller regions in TDMA fashion. A TDMA based time slot allocation algorithm is proposed in this paper to enable reuse of timeslots with minimal interference between smaller regions. In the second stage, the network density and control parameter is introduced to reduce interference in a minor level within smaller node regions. The algorithm issimulated and the system is tested with varying control parameter. The node-level interference and the energy dissipation at nodes are captured by varying the node density of the network. The results indicate that the proposed approach measures the interference and mitigates with minimal energy consumption at nodes and with less overhead transmission.

Download Full-text

Faculty Opinions recommendation of Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.13296969.14657090 ◽

2011 ◽

Author(s):

Steven Salzberg ◽

Michael Schatz

Keyword(s):

Reference Genome ◽

Transcriptome Assembly ◽

Full Length ◽

Rna Seq

Download Full-text

Bibliometric Analysis of Specific Energy Consumption (SEC) in Machining Operations: A Sustainable Response

Sustainability ◽

10.3390/su13105617 ◽

2021 ◽

Vol 13 (10) ◽

pp. 5617

Author(s):

Raman Kumar ◽

Sehijpal Singh ◽

Ardamanbir Singh Sidhu ◽

Catalin I. Pruncu

Keyword(s):

Energy Consumption ◽

Bibliometric Analysis ◽

Specific Energy ◽

Specific Energy Consumption ◽

Primary Data ◽

Data Content ◽

Author Keywords ◽

Link Strength ◽

Inclusive Analysis ◽

Machining Operations

This paper’s persistence is to make an inclusive analysis of 268 documents about specific energy consumption (SEC) in machining operations from 2001 to 2020 in the Scopus database. A systematic approach collects information on SEC documents’ primary data; their types, publications, citations, and predictions are presented. The VOSviewer 1.1.16 and Biblioshiny 2.0 software are used for visualization analysis to show the progress standing of SEC publications. The selection criteria of documents are set for citation analysis. The ranks are assigned to the most prolific and dominant authors, sources, articles, countries, and organizations based on the total citations, number of documents, average total citation, and total link strength. The author-keywords, index-keywords, and text data content analysis has been conducted to find the hotspots and progress trend in SEC in machining operations. The most prolific and dominant article, source, author, organization, and country are Anderson et al. “Laser-assisted machining of Inconel 718 with an economic analysis”, the Int J Mach Tools Manuf, Shin Y.C., form Purdue University Singapore, and United States, respectively, based on total citations as per defined criteria. The author keywords “specific cutting energy” and “surface roughness” dominate the machining operations SEC. SEC’s implication in machining operations review and bibliometric analysis is to deliver an inclusive perception for the scholars working in this field. It is the primary paper that utilizes bibliometric research to analyze the SEC in machining operations publications expansively. It is valuable for scholars to grasp the hotspots in this field in time and help the researchers in the SEC exploration arena rapidly comprehend the expansion status and trend.

Download Full-text

Transcriptional and morphological profiling of parvalbumin interneuron subpopulations in the mouse hippocampus

Nature Communications ◽

10.1038/s41467-020-20328-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Lin Que ◽

David Lukacsovich ◽

Wenshu Luo ◽

Csaba Földy

Keyword(s):

Large Scale ◽

Cell Types ◽

Rna Seq ◽

Neuronal Identity ◽

Parvalbumin Interneurons ◽

Different Types ◽

Parvalbumin Interneuron ◽

Cam Profile ◽

Developmental Domains

AbstractThe diversity reflected by >100 different neural cell types fundamentally contributes to brain function and a central idea is that neuronal identity can be inferred from genetic information. Recent large-scale transcriptomic assays seem to confirm this hypothesis, but a lack of morphological information has limited the identification of several known cell types. In this study, we used single-cell RNA-seq in morphologically identified parvalbumin interneurons (PV-INs), and studied their transcriptomic states in the morphological, physiological, and developmental domains. Overall, we find high transcriptomic similarity among PV-INs, with few genes showing divergent expression between morphologically different types. Furthermore, PV-INs show a uniform synaptic cell adhesion molecule (CAM) profile, suggesting that CAM expression in mature PV cells does not reflect wiring specificity after development. Together, our results suggest that while PV-INs differ in anatomy and in vivo activity, their continuous transcriptomic and homogenous biophysical landscapes are not predictive of these distinct identities.

Download Full-text

Multiple Alu exonization in 3’UTR of a primate specific isoform of CYP20A1 creates a potential miRNA sponge

Genome Biology and Evolution ◽

10.1093/gbe/evaa233 ◽

2020 ◽

Author(s):

Aniket Bhattacharya ◽

Vineet Jha ◽

Khushboo Singhal ◽

Mahar Fatima ◽

Dayanidhi Singh ◽

...

Keyword(s):

Heat Shock ◽

Cortical Neurons ◽

Regulatory Networks ◽

Large Scale ◽

Neuronal Development ◽

Random Sets ◽

Rna Seq ◽

Orphan Gene ◽

Mirna Sponge ◽

Human Neurons

Abstract Alu repeats contribute to phylogenetic novelties in conserved regulatory networks in primates. Our study highlights how exonized Alus could nucleate large-scale mRNA-miRNA interactions. Using a functional genomics approach, we characterize a transcript isoform of an orphan gene, CYP20A1 (CYP20A1_Alu-LT) that has exonization of 23 Alus in its 3’UTR. CYP20A1_Alu-LT, confirmed by 3’RACE, is an outlier in length (9 kb 3’UTR) and widely expressed. Using publically available datasets, we demonstrate its expression in higher primates and presence in single nucleus RNA-seq of 15928 human cortical neurons. miRanda predicts ∼4700 miRNA recognition elements (MREs) for ∼1000 miRNAs, primarily originated within these 3’UTR-Alus. CYP20A1_Alu-LT could be a potential multi-miRNA sponge as it harbors ≥10 MREs for 140 miRNAs and has cytosolic localization. We further tested whether expression of CYP20A1_Alu-LT correlates with mRNAs harboring similar MRE targets. RNA-seq with conjoint miRNA-seq analysis was done in primary human neurons where we observed CYP20A1_Alu-LT to be downregulated during heat shock response and upregulated in HIV1-Tat treatment. 380 genes were positively correlated with its expression (significantly downregulated in heat shock and upregulated in Tat) and they harbored MREs for nine expressed miRNAs which were also enriched in CYP20A1_Alu-LT. MREs were significantly enriched in these 380 genes compared to random sets of differentially expressed genes (p = 8.134e-12). Gene ontology suggested involvement of these genes in neuronal development and hemostasis pathways thus proposing a novel component of Alu-miRNA mediated transcriptional modulation that could govern specific physiological outcomes in higher primates.

Download Full-text