Application of Vector Processors To Solve Finite Difference Equations
Abstract As computer technology approaches limitations imposed by the speed of light, increased emphasis is placed on vector processors. These have the ability to increase greatly the speed of arithmetic even without improvements in such basic computer characteristics as memory cycle time. This paper deals with solving systems of finite difference equations on the STAR 100 and the CYBER 203, two Control Data Corp. computers with built-in vector processors. Systems of three-dimensional finite difference equations having from 2,000 to 8,000 unknowns were solved by means of Gaussian elimination and line successive overrelaxation (LSOR). On these machines, the D4 Gaussian elimination technique reduced computer time by factors as large as 4.6 relative to standard Gaussian elimination. Vectorization of the D4 code on the STAR 100 reduced computer times relative to scalar results by factors as large as 26, despite nonoptimal coding. LSOR was vectorized successfully with computer time reduction factors of 35 to 43 on the STAR 100. On. the CYBER 203, run times were reduced by factors of 45 to 54, relative to the scalar performance of the STAR 100. On an 8,000-block problem, average processing speed for a complete LSOR solution was approximately 25 million floating operations per second (megaflops). Introduction Large computers with hardware specifically designed for vector processing offer the potential for solving large systems of finite difference equations with exceptional speed. Our work was intended to test certain solution algorithms and determine which perform best on two such computers-the STAR 100 and the CYBER 203. The algorithms discussed are both well known - (1) Gaussian elimination and (2) successive overrelaxation (SOR). The STAR 100 has as much as 1,024,000 words of 64-bit core memory and has a virtual operating system. Its most unusual feature, however, is that processing speed can vary over two orders of magnitude, depending on the structure of the computer code being processed. The speed of 64-bit arithmetic ranges from about 0.5 to 50 megaflops. (A floating operation is an add, multiply, divide, etc.) At the low end of the speed range, its performance is similar to a CDC 6600, a 1960's technology computer, but at the high end it can outrun the fastest of modern scalar computers. This large speed variation results from the fact that the STAR l00's core memory has a destructive read characteristic that prevents the same core area from being referenced for 31 machine cycles following a previous read. (This results in a memory cycle time of 1,280 nanoseconds.) Coupled with this slow core memory is a vector arithmetic unit that can produce two 64-bit adds or one 64-bit multiply during every 40-nanosecond clock cycle, once the arithmetic unit reaches steady state (see Appendix for details). All vector operations (adds, multiplies, etc.) have a linear performance characteristic of the formC=S+R·L, (1) where C is the number of clock cycles required to complete the operation, S is vector start-up time, R is the steady-state result rate, and L is vector length.