GEMM
The GEMM-Based Level 3 BLAS concept utilizes the fact that it is possible to formulate the Level 3 BLAS operations in terms of the Level 3 operation for general matrix multiply and add, SGEMM, and some Level 1 and Level 2 BLAS operations. The level 3 Basic Linear Algebra Subprograms (BLAS) are designed to perform various matrix multiply and triangular system solving computations. Due to the complex hardware organization of advanced computer architecturs the development of optimal level 3 BLAS code is costly and time consuming. However, it is possible to develop a portable and high-performance level 3 BLAS library mainly relying on a highly optimized GEMM, the routine for the general matrix multiply and add operation. With suitable partitioning, all the other level 3 BLAS can be defined in terms of GEMM and a small amount of level 1 and level 2 computations. Our contribution is twofold. First, the model implementations in Fortran 77 of the GEMM-based level 3 BLAS are structured to reduce effectively data traffic in a memory hierarchy. Second, the GEMM-based level 3 BLAS performance evaluation benchmark is a tool for evaluating and comparing different implementations of the level 3 BLAS with the GEMM-based model implementations.
Keywords for this software
References in zbMATH (referenced in 30 articles , 1 standard article )
Showing results 1 to 20 of 30.
Sorted by year (- Ranjan, Desh; Savage, John; Zubair, Mohammad: Upper and lower I/O bounds for pebbling (r)-pyramids (2012)
- Al-Mohy, Awad H.; Higham, Nicholas J.: The complex step approximation to the Fréchet derivative of a matrix function (2010)
- Granat, Robert; Kågström, Bo; Kressner, Daniel: A novel parallel QR algorithm for hybrid distributed memory HPC systems (2010)
- Tomov, Stanimire; Dongarra, Jack; Baboulin, Marc: Towards dense linear algebra for hybrid GPU accelerated manycore systems (2010)
- Granat, Robert; Jonsson, Isak; Kågström, Bo: RECSY and SCASY library software: Recursive blocked and parallel algorithms for Sylvester-type matrix equations with some applications (2009)
- Woodsend, Kristian; Gondzio, Jacek: Hybrid MPI/OpenMP parallel linear support vector machine training (2009)
- Coulaud, O.; Fortin, P.; Roman, J.: High performance BLAS formulation of the multipole-to-local operator in the fast multipole method (2008)
- Kågström, B.; Kressner, D.; Quintana-Ortí, E. S.; Quintana-Ortí, G.: Blocked algorithms for the reduction to Hessenberg-triangular form revisited (2008)
- Grigori, Laura; Li, Xiaoye S.: Towards an accurate performance modeling of parallel sparse factorization (2007)
- Kågström, Bo; Kressner, Daniel: Multishift variants of the QZ algorithm with aggressive early deflation (2006)
- Elmroth, Erik; Gustavson, Fred; Jonsson, Isak; Kågström, Bo: Recursive blocked algorithms and hybrid data structures for dense matrix library software (2004)
- Granat, Robert; Jonsson, Isak; Kågström, Bo: Combining explicit and recursive blocking for solving triangular Sylvester-type matrix equations on distributed memory platforms (2004)
- Irony, Dror; Toledo, Sivan; Tiskin, Alexander: Communication lower bounds for distributed-memory matrix multiplication (2004)
- -: An updated set of basic linear algebra subprograms (BLAS) (2002)
- Andersen, Bjarne S.; Gunnels, John A.; Gustavson, Fred; Waśniewski, Jerzy: A recursive formulation of the inversion of symmetric positive definite matrices in packed storage data format (2002)
- Henry, Greg; Watkins, David; Dongarra, Jack: A parallel implementation of the nonsymmetric QR algorithm for distributed memory architectures (2002)
- Andersen, Bjarne Stig; Waśniewski, Jerzy; Gustavson, Fred G.: A recursive formulation of Cholesky factorization of a matrix in packed storage (2001)
- Bilardi, Gianfranco; D’Alberto, Paolo; Nicolau, Alex: Fractal matrix multiplication: A case study on portability of cache performance (2001)
- Clint Whaley, R.; Petitet, A.; Dongarra, J. J.: Automated empirical optimizations of software and the ATLAS project (2001)
- Gunnels, John A.; Gustavson, Fred G.; Henry, Greg M.; van de Geijn, Robert A.: FLAME: formal linear algebra methods environment (2001)