In the early nineties, the National Aeronautics and Space Administration (NASA) issued requirements for a 1 GFLOPS (1024 MFLOPS) machine to be built for less than $50,000. The request was driven by a desire to avoid the massive costs of deploying vector processor and massively multiprocessor supercomputers, like Cray and Thinking Machines. According to Bell & Gray, “In 1994, a 16- node $40,000 cluster built from Intel 486 computers achieved that goal.” (Bell & Gray, 2002; Moyer & Umar, 2001) The winning cluster utilized a design known as the Beowulf cluster. The Beowulf cluster design, “builds on decades of parallel processing research and on many attempts to apply loosely coupled computers to a variety of applications.” (Bell & Gray, 2002) The Beowulf architecture displaced and largely eliminated vector and MMP systems over a very short period of time. Bell and Gray’s article analyzes data from the Top500 list, which has rated supercomputer performance since the mid-nineties, and includes a table that clearly demonstrates the current dominance of Beowulf clusters.
Figure 1 Growth of Beowulf (scalar) clusters (Bell & Gray, 2002)
The scalar Beowulf cluster ASCI Red was the first supercomputer to achieve 1 TFLOPS (1024 GFLOPS) performance. (Meuer, 2008) In June 2008, the Top500 list announced that the first 1 PFLOPS (1024 TFLOPS) machine had been measured and it follow the Beowulf architecture. (“Top 500 List – June 2008,” 2008) In spite of the current dominance of the Beowulf architecture, there are emerging platforms that represent radical departures from past computational models that may provide the foundation for future supercomputers.
Originally developed to provide fast three dimensional graphics processing for the consumer electronic gaming industry, graphics processing units (GPU) are a CMOS commodity version of single instruction multiple dataset (SIMD) processors. (Fan, Qiu, Kaufman, & Yoakum-Stover, 2004) GPU technology leverages high density chip architectures like those described by Nair in his 2002 article, “Effect of Increasing Chip Density on the Evolution of Computer Architectures.” (Nair, 2002) As SP based clusters have displaced vector and MPP architectures, certain problem sets have been left without a highly efficient platform for computation, particularly image processing. The use of the GPU to service computation instead of rendering video game images is an innovative use of a specialized processor. The GPU is to SIMD processors what Beowulf was to vector and MPP systems. Existing GPU’s are not equivalent to SIMD, as indicated in Suda et al’s 2009 article. (Suda, et al., 2009) However, the low cost CMOS solution has potential to be developed into a replacement, much like Beowulfs replaced vector-processing machines.
Bell, G., & Gray, J. (2002). What’s Next in High-Performance Computing? [Article]. Communications of the ACM, 45(2), 91-95.
Fan, Z., Qiu, F., Kaufman, A., & Yoakum-Stover, S. (2004). GPU Cluster for High Performance Computing. Paper presented at the Proceedings of the 2004 ACM/IEEE conference on Supercomputing.
Meuer, H. (2008). The TOP500 Project: Looking Back over 15 Years of Supercomputing Experience. Retrieved from http://www.top500.org/files/TOP500_Looking_back_HWM.pdf
Nair, R. (2002). Effect of Increasing Chip Density on the Evolution of Computer Architectures. IBM Journal of Research & Development, 46(2/3), 11.
Suda, R., Aoki, T., Hirasawa, S., Nukada, A., Honda, H., & Matsuoka, S. (2009). Aspects of GPU for general purpose high performance computing. Paper presented at the Proceedings of the 2009 Asia and South Pacific Design Automation Conference.
Top 500 List – June 2008. (2008). Retrieved 9/20/2009, 2009, from http://top500.org/list/2008/06/100