external sorting Latest Research Papers

Facilitating external sorting on SMR-based large-scale storage systems

Future Generation Computer Systems ◽

10.1016/j.future.2020.10.032 ◽

2021 ◽

Vol 116 ◽

pp. 333-348

Author(s):

Chih-Hsuan Chen ◽

Shuo-Han Chen ◽

Yu-Pei Liang ◽

Tseng-Yi Chen ◽

Tsan-sheng Hsu ◽

...

Keyword(s):

Large Scale ◽

Storage Systems ◽

External Sorting

Download Full-text

Efficient External Sorting for Memory-Constrained Embedded Devices with Flash Memory

ACM Transactions on Embedded Computing Systems ◽

10.1145/3446976 ◽

2021 ◽

Vol 20 (4) ◽

pp. 1-21

Author(s):

Riley Jackson ◽

Jonathan Gresl ◽

Ramon Lawrence

Keyword(s):

Data Collection ◽

Flash Memory ◽

Health And Safety ◽

Sorting Algorithm ◽

Output Buffer ◽

Embedded Devices ◽

Sorting Data ◽

Limited Application ◽

External Sorting ◽

Merge Sort

Embedded devices are ubiquitous in areas of industrial and environmental monitoring, health and safety, and consumer appliances. A common use case is data collection, processing, and performing actions based on data analysis. Although many Internet of Things (IoT) applications use the embedded device simply for data collection, there are benefits to having more data processing done closer to data collection to reduce network transmissions and power usage and provide faster response. This work implements and evaluates algorithms for sorting data on embedded devices with specific focus on the smallest memory devices. In devices with less than 4 KB of available RAM, the standard external merge sort algorithm has limited application as it requires a minimum of three memory buffers and is not flash-aware. The contribution is a memory-optimized external sorting algorithm called no output buffer sort (NOBsort) that reduces the minimum memory required for sorting, has excellent performance for sorted or near-sorted data, and sorts on external memory such as SD cards or raw flash chips. When sorting large datasets, no output buffer sort reduces I/O and execution time by between 20% to 35% compared to standard external merge sort.

Download Full-text

External Sorting Algorithm: State-of-the-Art and Future Directions

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/806/1/012040 ◽

2020 ◽

Vol 806 ◽

pp. 012040

Author(s):

Wenhan Chen ◽

Yang Liu ◽

Zhiguang Chen ◽

Fang Liu ◽

Nong Xiao

Keyword(s):

State Of The Art ◽

Sorting Algorithm ◽

Future Directions ◽

External Sorting

Download Full-text

BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa051 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2705-2711 ◽

Cited By ~ 2

Author(s):

Gianvito Urgese ◽

Emanuele Parisi ◽

Orazio Scicolone ◽

Santa Di Cataldo ◽

Elisa Ficarra

Keyword(s):

Sequence Analysis ◽

Supplementary Information ◽

Sorting Algorithm ◽

Rna Seq ◽

Compact Sets ◽

Analysis Pipeline ◽

Alignment Algorithms ◽

External Sorting ◽

Computational Resources ◽

Generation Sequencing

Abstract Motivation High-throughput next-generation sequencing can generate huge sequence files, whose analysis requires alignment algorithms that are typically very demanding in terms of memory and computational resources. This is a significant issue, especially for machines with limited hardware capabilities. As the redundancy of the sequences typically increases with coverage, collapsing such files into compact sets of non-redundant reads has the 2-fold advantage of reducing file size and speeding-up the alignment, avoiding to map the same sequence multiple times. Method BioSeqZip generates compact and sorted lists of alignment-ready non-redundant sequences, keeping track of their occurrences in the raw files as well as of their quality score information. By exploiting a memory-constrained external sorting algorithm, it can be executed on either single- or multi-sample datasets even on computers with medium computational capabilities. On request, it can even re-expand the compacted files to their original state. Results Our extensive experiments on RNA-Seq data show that BioSeqZip considerably brings down the computational costs of a standard sequence analysis pipeline, with particular benefits for the alignment procedures that typically have the highest requirements in terms of memory and execution time. In our tests, BioSeqZip was able to compact 2.7 billion of reads into 963 million of unique tags reducing the size of sequence files up to 70% and speeding-up the alignment by 50% at least. Availability and implementation BioSeqZip is available at https://github.com/bioinformatics-polito/BioSeqZip. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Performance Analysis of a Faster In−place External Sorting Algorithm

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2019/v4i430122 ◽

2020 ◽

pp. 1-7

Author(s):

Asaduzzaman Nur Shuvo ◽

Apurba Adhikary ◽

Md. Bipul Hossain ◽

Sultana Jahan Soheli

Keyword(s):

Time Complexity ◽

Divide And Conquer ◽

Sorting Algorithm ◽

Data Sets ◽

Data Set ◽

Internal Memory ◽

Huge Data ◽

External Sorting ◽

Bottle Neck ◽

Quick Sort

Data sets in large applications are often too gigantic to fit completely inside the computer’s internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottle−neck. While applying sorting on this huge data set, it is essential to do external sorting. This paper is concerned with a new in−place external sorting algorithm. Our proposed algorithm uses the concept of Quick−Sort and Divide−and−Conquer approaches resulting in a faster sorting algorithm avoiding any additional disk space. In addition, we showed that the average time complexity can be reduced compared to the existing external sorting approaches.

Download Full-text

Facilitate External Sorting for Large-Scale Storage on Shingled Magnetic Recording Drives

Lecture Notes in Networks and Systems - Advances in Information and Communication ◽

10.1007/978-3-030-12385-7_80 ◽

2019 ◽

pp. 1159-1164

Author(s):

Yu-Pei Liang ◽

Min-Hong Shen ◽

Yi-Han Lien ◽

Wei-Kuan Shih

Keyword(s):

Magnetic Recording ◽

Large Scale ◽

Shingled Magnetic Recording ◽

External Sorting

Download Full-text

deGSM: memory scalable construction of large scale de Bruijn Graph

10.1101/388454 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hongzhe Guo ◽

Yilei Fu ◽

Yan Gao ◽

Junyi Li ◽

Yadong Wang ◽

...

Keyword(s):

Genome Sequence ◽

Large Scale ◽

High Throughput Sequencing ◽

De Novo ◽

Rapid Development ◽

Main Idea ◽

Supplementary Information ◽

De Bruijn Graph ◽

External Sorting ◽

De Bruijn

AbstractMotivationDe Bruijn graph, a fundamental data structure to represent and organize genome sequence, plays important roles in various kinds of sequence analysis tasks such as de novo assembly, high-throughput sequencing (HTS) read alignment, pan-genome analysis, metagenomics analysis, HTS read correction, etc. With the rapid development of HTS data and ever-increasing number of assembled genomes, there is a high demand to construct de Bruijn graph for sequences up to Tera-base-pair level. It is non-trivial since the size of the graph to be constructed could be very large and each graph consists of hundreds of billions of vertices and edges. Current existing approaches may have unaffordable memory footprints to handle such a large de Bruijn graph. Moreover, it also requires the construction approach to handle very large dataset efficiently, even if in a relatively small RAM space.ResultsWe propose a lightweight parallel de Bruijn graph construction approach, de Bruijn Graph Constructor in Scalable Memory (deGSM). The main idea of deGSM is to efficiently construct the Bur-rows-Wheeler Transformation (BWT) of the unipaths of de Bruijn graph in constant RAM space and transform the BWT into the original unitigs. It is mainly implemented by a fast parallel external sorting of k-mers, which allows only a part of k-mers kept in RAM by a novel organization of the k-mers. The experimental results demonstrate that, just with a commonly used machine, deGSM is able to handle very large genome sequence(s), e.g., the contigs (305 Gbp) and scaffolds (1.1 Tbp) recorded in Gen-Bank database and Picea abies HTS dataset (9.7 Tbp). Moreover, deGSM also has faster or comparable construction speed compared with state-of-the-art approaches. With its high scalability and efficiency, deGSM has enormous potentials in many large scale genomics studies.Availabilityhttps://github.com/hitbc/[email protected] (YW) and [email protected] (BL)Supplementary informationSupplementary data are available online.

Download Full-text

MONTRES-NVM: An External Sorting Algorithm for Hybrid Memory

2018 IEEE 7th Non-Volatile Memory Systems and Applications Symposium (NVMSA) ◽

10.1109/nvmsa.2018.00013 ◽

2018 ◽

Author(s):

Mohammed Bey Ahmed Khernache ◽

Arezki Laga ◽

Jalil Boukhobza

Keyword(s):

Sorting Algorithm ◽

Hybrid Memory ◽

External Sorting

Download Full-text

MONTRES : Merge ON-the-Run External Sorting Algorithm for Large Data Volumes on SSD Based Storage Systems

IEEE Transactions on Computers ◽

10.1109/tc.2017.2706678 ◽

2017 ◽

Vol 66 (10) ◽

pp. 1689-1702 ◽

Cited By ~ 4

Author(s):

Arezki Laga ◽

Jalil Boukhobza ◽

Frank Singhoff ◽

Michel Koskas

Keyword(s):

Storage Systems ◽

Large Data ◽

Sorting Algorithm ◽

External Sorting

Download Full-text

ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework

Future Generation Computer Systems ◽

10.1016/j.future.2016.03.003 ◽

2016 ◽

Vol 65 ◽

pp. 76-89 ◽

Cited By ~ 8

Author(s):

Young-Sik Lee ◽

Luis Cavazos Quero ◽

Sang-Hoon Kim ◽

Jin-Soo Kim ◽

Seungryoul Maeng

Keyword(s):

Mapreduce Framework ◽

External Sorting

Download Full-text

external sorting
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Facilitating external sorting on SMR-based large-scale storage systems

Efficient External Sorting for Memory-Constrained Embedded Devices with Flash Memory

External Sorting Algorithm: State-of-the-Art and Future Directions

BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis

Performance Analysis of a Faster In−place External Sorting Algorithm

Facilitate External Sorting for Large-Scale Storage on Shingled Magnetic Recording Drives

deGSM: memory scalable construction of large scale de Bruijn Graph

MONTRES-NVM: An External Sorting Algorithm for Hybrid Memory

MONTRES : Merge ON-the-Run External Sorting Algorithm for Large Data Volumes on SSD Based Storage Systems

ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework

Export Citation Format

external sortingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Facilitating external sorting on SMR-based large-scale storage systems

Efficient External Sorting for Memory-Constrained Embedded Devices with Flash Memory

External Sorting Algorithm: State-of-the-Art and Future Directions

BioSeqZip: a collapser of NGS redundant reads for the optimization of sequence analysis

Performance Analysis of a Faster In−place External Sorting Algorithm

Facilitate External Sorting for Large-Scale Storage on Shingled Magnetic Recording Drives

deGSM: memory scalable construction of large scale de Bruijn Graph

MONTRES-NVM: An External Sorting Algorithm for Hybrid Memory

MONTRES : Merge ON-the-Run External Sorting Algorithm for Large Data Volumes on SSD Based Storage Systems

ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework

external sorting
Recently Published Documents