Big telescope, big data: towards exascale with the Square Kilometre Array

A. M. M. Scaife

doi:10.1098/rsta.2019.0060

Big telescope, big data: towards exascale with the Square Kilometre Array

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0060 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190060

Author(s):

A. M. M. Scaife

Keyword(s):

High Performance ◽

Numerical Algorithms ◽

Image Formation ◽

Science Data ◽

Square Kilometre Array ◽

Huge Data ◽

Data Rates ◽

Data Processor ◽

Optical Telescopes ◽

Scale Data

Unlike optical telescopes, radio interferometers do not image the sky directly but require specialized image formation algorithms. For the Square Kilometre Array (SKA), the computational requirements of this image formation are extremely demanding due to the huge data rates produced by the telescope. This processing will be performed by the SKA Science Data Processor facilities and a network of SKA Regional Centres, which must not only deal with SKA-scale data volumes but also with stringent science-driven image fidelity requirements. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

Building the World's Largest Radio Telescope: The Square Kilometre Array Science Data Processor

2018 IEEE 14th International Conference on e-Science (e-Science) ◽

10.1109/escience.2018.00101 ◽

2018 ◽

Author(s):

Jamie S. Farnes ◽

Ben Mort ◽

Fred Dulwich ◽

Karel Adamek ◽

Anna Brown ◽

...

Keyword(s):

Radio Telescope ◽

Science Data ◽

Square Kilometre Array ◽

Data Processor

Download Full-text

Source-Finding for the Australian Square Kilometre Array Pathfinder

Publications of the Astronomical Society of Australia ◽

10.1071/as12028 ◽

2012 ◽

Vol 29 (3) ◽

pp. 371-381 ◽

Cited By ~ 22

Author(s):

M. Whiting ◽

B. Humphreys

Keyword(s):

High Performance ◽

Distributed Processing ◽

Square Kilometre Array ◽

Data Rates ◽

Different Types ◽

Automated Processing ◽

Pipeline Processing ◽

Science Community ◽

Line Imaging ◽

Performance Computing

AbstractThe Australian Square Kilometre Array Pathfinder (ASKAP) presents a number of challenges in the area of source finding and cataloguing. The data rates and image sizes are very large, and require automated processing in a high-performance computing environment. This requires development of new tools, that are able to operate in such an environment and can reliably handle large datasets. These tools must also be able to accommodate the different types of observations ASKAP will make: continuum imaging, spectral-line imaging, transient imaging. The ASKAP project has developed a source-finder known as selavy, built upon the duchamp source-finder. selavy incorporates a number of new features, which we describe here.Since distributed processing of large images and cubes will be essential, we describe the algorithms used to distribute the data, find an appropriate threshold and search to that threshold and form the final source catalogue. We describe the algorithm used to define a varying threshold that responds to the local, rather than global, noise conditions, and provide examples of its use. And we discuss the approach used to apply two-dimensional fits to detected sources, enabling more accurate parameterisation. These new features are compared for timing performance, where we show that their impact on the pipeline processing will be small, providing room for enhanced algorithms.We also discuss the development process for ASKAP source finding software. By the time of ASKAP operations, the ASKAP science community, through the Survey Science Projects, will have contributed important elements of the source finding pipeline, and the mechanisms in which this will be done are presented.

Download Full-text

The Square Kilometre Array Science Data Processor. Preliminary compute platform design

Journal of Instrumentation ◽

10.1088/1748-0221/10/07/c07004 ◽

2015 ◽

Vol 10 (07) ◽

pp. C07004-C07004 ◽

Cited By ~ 15

Author(s):

P.C. Broekema ◽

R.V. van Nieuwpoort ◽

H.E. Bal

Keyword(s):

Science Data ◽

Platform Design ◽

Square Kilometre Array ◽

Data Processor

Download Full-text

Numerical algorithms for high-performance computational science

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2019.0066 ◽

2020 ◽

Vol 378 (2166) ◽

pp. 20190066 ◽

Cited By ~ 2

Author(s):

Jack Dongarra ◽

Laura Grigori ◽

Nicholas J. Higham

Keyword(s):

High Performance ◽

Numerical Algorithms ◽

Computational Science ◽

Floating Point ◽

Important Criterion ◽

Data Movement ◽

Floating Point Arithmetic ◽

High Performance Computers ◽

Point Arithmetic ◽

Speed And Accuracy

A number of features of today’s high-performance computers make it challenging to exploit these machines fully for computational science. These include increasing core counts but stagnant clock frequencies; the high cost of data movement; use of accelerators (GPUs, FPGAs, coprocessors), making architectures increasingly heterogeneous; and multi- ple precisions of floating-point arithmetic, including half-precision. Moreover, as well as maximizing speed and accuracy, minimizing energy consumption is an important criterion. New generations of algorithms are needed to tackle these challenges. We discuss some approaches that we can take to develop numerical algorithms for high-performance computational science, with a view to exploiting the next generation of supercomputers. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

Flood Inundation Prediction

Annual Review of Fluid Mechanics ◽

10.1146/annurev-fluid-030121-113138 ◽

2021 ◽

Vol 54 (1) ◽

Author(s):

Paul D. Bates

Keyword(s):

Fluid Mechanics ◽

Land Surface ◽

High Performance ◽

Numerical Algorithms ◽

Economic Damage ◽

Annual Review ◽

Publication Date ◽

Flood Inundation ◽

Damage Mapping ◽

Performance Computing

Every year flood events lead to thousands of casualties and significant economic damage. Mapping the areas at risk of flooding is critical to reducing these losses, yet until the last few years such information was available for only a handful of well-studied locations. This review surveys recent progress to address this fundamental issue through a novel combination of appropriate physics, efficient numerical algorithms, high-performance computing, new sources of big data, and model automation frameworks. The review describes the fluid mechanics of inundation and the models used to predict it, before going on to consider the developments that have led in the last five years to the creation of the first true fluid mechanics models of flooding over the entire terrestrial land surface. Expected final online publication date for the Annual Review of Fluid Mechanics, Volume 54 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

High-performance SAR image formation and post-processing

Lecture Notes in Computer Science - Euro-Par'96 Parallel Processing ◽

10.1007/bfb0024695 ◽

1996 ◽

pp. 139-146

Author(s):

E. Appiani ◽

M. Corvi ◽

G. Garibotto ◽

C. Coelho

Keyword(s):

High Performance ◽

Image Formation ◽

Sar Image ◽

Post Processing

Download Full-text

Systematic Methods on Machine Learning Techniques for Clinical Predictive Modelling

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2138.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 288-297

Keyword(s):

Machine Learning ◽

Data Mining ◽

High Performance ◽

Predictive Modelling ◽

Medical Application ◽

Medical Data ◽

Machine Learning Techniques ◽

Screening Process ◽

Huge Data ◽

System Data

Predictive modelling is a mathematical technique which uses Statistics for prediction, due to the rapid growth of data over the cloud system, data mining plays a significant role. Here, the term data mining is a way of extracting knowledge from huge data sources where it’s increasing the attention in the field of medical application. Specifically, to analyse and extract the knowledge from both known and unknown patterns for effective medical diagnosis, treatment, management, prognosis, monitoring and screening process. But the historical medical data might include noisy, missing, inconsistent, imbalanced and high dimensional data.. This kind of data inconvenience lead to severe bias in predictive modelling and decreased the data mining approach performances. The various pre-processing and machine learning methods and models such as Supervised Learning, Unsupervised Learning and Reinforcement Learning in recent literature has been proposed. Hence the present research focuses on review and analyses the various model, algorithm and machine learning technique for clinical predictive modelling to obtain high performance results from numerous medical data which relates to the patients of multiple diseases.

Download Full-text

New Python-based methods for data processing

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913000863 ◽

2013 ◽

Vol 69 (7) ◽

pp. 1274-1282 ◽

Cited By ~ 74

Author(s):

Nicholas K. Sauter ◽

Johan Hattne ◽

Ralf W. Grosse-Kunstleve ◽

Nathaniel Echols

Keyword(s):

Data Reduction ◽

Graphics Processing Units ◽

High Performance ◽

Array Detectors ◽

Data Rates ◽

Python Scripting ◽

Linac Coherent Light Source ◽

Future Data ◽

Scripting Language ◽

Computational Resources

Current pixel-array detectors produce diffraction images at extreme data rates (of up to 2 TB h−1) that make severe demands on computational resources. New multiprocessing frameworks are required to achieve rapid data analysis, as it is important to be able to inspect the data quickly in order to guide the experiment in real time. By utilizing readily available web-serving tools that interact with the Python scripting language, it was possible to implement a high-throughput Bragg-spot analyzer (cctbx.spotfinder) that is presently in use at numerous synchrotron-radiation beamlines. Similarly, Python interoperability enabled the production of a new data-reduction package (cctbx.xfel) for serial femtosecond crystallography experiments at the Linac Coherent Light Source (LCLS). Future data-reduction efforts will need to focus on specialized problems such as the treatment of diffraction spots on interleaved lattices arising from multi-crystal specimens. In these challenging cases, accurate modeling of close-lying Bragg spots could benefit from the high-performance computing capabilities of graphics-processing units.

Download Full-text