Algorithms for in‐place matrix transposition

Given a linear mapΦ:Mn→Mm, its multiplicity maps are defined as the family of linear mapsΦ⊗idk:Mn⊗Mk→Mm⊗Mk, whereidkdenotes the identity onMk. Let‖⋅‖1denote the trace-norm on matrices, as well as the induced trace-norm on linear maps of matrices, i.e.‖Φ‖1=max{‖Φ(X)‖1:X∈Mn,‖X‖1=1}. A fact of fundamental importance in both operator algebras and quantum information is that‖Φ⊗idk‖1can grow withk. In general, the rate of growth is bounded by‖Φ⊗idk‖1≤k‖Φ‖1, and matrix transposition is the canonical example of a map achieving this bound. We prove that, up to an equivalence, the transpose is the unique map achieving this bound. The equivalence is given in terms of complete trace-norm isometries, and the proof relies on a particular characterization of complete trace-norm isometries regarding preservation of certain multiplication relations.We use this result to characterize the set of single-shot quantum channel discrimination games satisfying a norm relation that, operationally, implies that the game can be won with certainty using entanglement, but is hard to win without entanglement. Specifically, we show that the well-known example of such a game, involving the Werner-Holevo channels, is essentially the unique game satisfying this norm relation. This constitutes a step towards a characterization of single-shot quantum channel discrimination games with maximal gap between optimal performance of entangled and unentangled strategies.

Download Full-text

FFT for the APE Parallel Computer

International Journal of Modern Physics C ◽

10.1142/s012918319700117x ◽

1997 ◽

Vol 08 (06) ◽

pp. 1317-1334 ◽

Cited By ~ 5

Author(s):

Thomas Lippert ◽

Klaus Schilling ◽

Sven Trentmann ◽

Federico Toschi ◽

Raffaele Tripiccione

Keyword(s):

Systolic Array ◽

Parallel Computer ◽

Massively Parallel ◽

Two Dimensional ◽

One Dimensional ◽

Data Field ◽

Matrix Transposition ◽

Neighbor Connectivity ◽

Parallel Fft ◽

Parallel Fft Algorithm

We present a parallel FFT algorithm for SIMD systems following the "Transpose Algorithm" approach. The method is based on the assignment of the data field onto a one-dimensional ring of systolic cells. The systolic array can be universally mapped onto any parallel system. In particular for systems with next-neighbor connectivity our method has the potential to improve the efficiency of matrix transposition by use of hyper-systolic communication. We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a two-dimensional hydrodynamics code for turbulence studies.

Download Full-text

An efficient algorithm for large-scale matrix transposition

Proceedings 2000 International Conference on Parallel Processing ◽

10.1109/icpp.2000.876148 ◽

2002 ◽

Cited By ~ 1

Author(s):

Jinwoo Suh ◽

V.K. Prasanna

Keyword(s):

Efficient Algorithm ◽

Large Scale ◽

Matrix Transposition ◽

Scale Matrix

Download Full-text

The complexity of matrix transposition on one-tape off-line Turing machines

Theoretical Computer Science ◽

10.1016/0304-3975(91)90175-2 ◽

1991 ◽

Vol 82 (1) ◽

pp. 113-129 ◽

Cited By ~ 11

Author(s):

Martin Dietzfelbinger ◽

Wolfgang Maass ◽

Georg Schnitger

Keyword(s):

Turing Machines ◽

Matrix Transposition

Download Full-text

2-D pseudo-spectral viscoacoustic modeling in a distributed-memory multi-processor computer

Bulletin of the Seismological Society of America ◽

10.1785/bssa0830051345 ◽

1993 ◽

Vol 83 (5) ◽

pp. 1345-1354

Author(s):

Quingbo Liao ◽

George A. McMechan

Keyword(s):

Distributed Memory ◽

Fourier Transforms ◽

Relaxation Times ◽

Empirical Method ◽

Interprocessor Communication ◽

Computing Environment ◽

Absorbing Boundaries ◽

Matrix Transposition ◽

Pseudo Spectral ◽

Frequency Curves

Abstract Two pseudo-spectral implementations of 2-D viscoacoustic modeling are developed in a distributed-memory multi-processor computing environment. The first involves simultaneous computation of the response of one model to many source locations and, as it requires no interprocessor communication, is perfectly parallel. The second involves computation of the response, to one source, of a large model that is distributed across all processors. In the latter, local rather than global, Fourier transforms are used to minimize interprocessor communication and to eliminate the need for matrix transposition. In both algorithms, absorbing boundaries are defined as zones of decreased Q as part of the model, and so require no extra computation. An empirical method of determining sets of relaxation times for a broad range of Q values eliminates the need for iterative fitting of Q-frequency curves.

Download Full-text