Leveraging index compression techniques to optimize the use of co-processors

Authors

  • Manuel Freire Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Uruguay https://orcid.org/0000-0002-4887-8918
  • Raul Marichal Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Uruguay https://orcid.org/0000-0003-4974-0803
  • Agustin Martinez Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Uruguay
  • Daniel Padron Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Uruguay
  • Ernesto Dufrechou Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Uruguay https://orcid.org/0000-0003-4971-340X
  • Pablo Ezzatti Instituto de Computación (InCo), Facultad de Ingeniería, Universidad de la República, Uruguay https://orcid.org/0000-0002-2368-8907

DOI:

https://doi.org/10.24215/16666038.24.e01

Keywords:

blocked formats, matrix storage reduction, memory access, reordering techniques, sparse matrices

Abstract

The significant presence that many-core devices like GPUs have these days, and their enormous computational power, motivates the study of sparse matrix operations in this hardware. The essential sparse kernels in scientific computing, such as the sparse matrix-vector multiplication (SpMV), usually have many different high-performance GPU implementations. Sparse matrix problems typically imply memory-bound operations, and this characteristic is particularly limiting in massively parallel processors. This work revisits the main ideas about reducing the volume of data required by sparse storage formats and advances in understanding some compression techniques. In particular, we study the use of index compression combined with sparse matrix reordering techniques in CSR and explore other approaches using a blocked format. The systematic experimental evaluation on a large set of real-world matrices confirms that this approach achieves meaningful data storage reductions. Additionally, we find promising results of the impact of the storage reduction on the execution time when using accelerators to perform the mathematical kernels.

Downloads

Download data is not yet available.

References

F. G. Gustavson, W. Liniger, and R. Willoughby, “Sym- bolic generation of an optimal crout algorithm for sparse systems of linear equations,” J. ACM, vol. 17, no. 1, pp. 87–109, 1970.

I.Duff,“Asurveyofsparsematrixresearch,”Proceed- ings of the IEEE, vol. 65, no. 4, pp. 500–535, 1977.

T. Davis and E. P. Natarajan, “Algorithm 907: Klu, a direct sparse solver for circuit simulation problems,” ACM Trans. Math. Softw., vol. 37, pp. 36:1–36:17, 2010.

A. Chakraborty, T. Dutta, S. Mondal, and A. Nath, “Application of graph theory in social media,” Interna- tional Journal of Computer Sciences and Engineering,

vol. 6, pp. 722–729, October 2018.

M. Freire, R. Marichal, E. Dufrechou, and P. Ezzatti, “Towards reducing communications in sparse matrix kernels,” in Conference on Cloud Computing, Big Data

& Emerging Topics, pp. 17–30, Springer, 2023.

M.Freire,R.Marichal,S.Gonzaga,E.Dufrechou,and P. Ezzatti, “Enhancing the sparse matrix storage using reordering techniques,” in Latin America High Perfor- mance Computing Conference (CARLA 23), pp. 66–76, 2023.

E.Dufrechou,P.Ezzatti,andE.S.Quintana-Ort ́ı,“Se- lecting optimal SpMV realizations for GPUs via ma- chine learning,” Int. J. High Perform. Comput. Appl., vol. 35, no. 3, 2021.

E.Dufrechou,P.Ezzatti,M.Freire,andE.S.Quintana- Ort ́ı, “Machine learning for optimal selection of sparse triangular system solvers on gpus,” J. Parallel Dis- tributed Comput., vol. 158, pp. 47–55, 2021.

Y. Saad, Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics, sec- ond ed., 2003.

A.PinarandM.T.Heath,“Improvingperformanceof sparse matrix-vector multiplication,” in Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, SC ’99, (New York, NY, USA), p. 30–es, Association for Computing Machinery, 1999.

Y. Saad, “Sparskit: a basic tool kit for sparse matrix computations - version 2,” 1994.

J. Zhang and L. Gruenwald, “Regularizing irregularity: Bitmap-based and portable sparse matrix multiplica- tion for graph data on gpus,” in Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), GRADES-NDA ’18, (New York, NY, USA), pp. 1–8, Association for Computing Machinery, 2018.

G. Berger, M. Freire, R. Marini, E. Dufrechou, and P. Ezzatti, “Unleashing the performance of bmsparse for the sparse matrix multiplication in GPUs,” in Pro- ceedings of the2021 12th Workshop on Latest Ad- vances in Scalable Algorithms for Large-Scale Systems (ScalA), pp. 19–26, November 2021.

G. Berger, M. Freire, R. Marini, E. Dufrechou, and P. Ezzatti, “Advancing on an efficient sparse matrix multiplication kernel for modern gpus,” Concurrency and Computation: Practice and Experience, p. e7271, 2022.

E.CuthillandJ.McKee,“Reducingthebandwidthof sparse symmetric matrices,” in Proceedings of the 1969 24th national conference, pp. 157–172, ACM Press, 1969.

M.BollhoeferandY.Saad,“Ontherelationsbetween ILUs and factored approximate inverses,” SIAM J. Ma- trix Anal. Appl., vol. 24, no. 1, pp. 219–237, 2002.

X. Sun, Y. Zhang, T. Wang, X. Zhang, L. Yuan, and L. Rao, “Optimizing SpMV for diagonal sparse ma- trices on GPU,” in 2011 International Conference on Parallel Processing, pp. 492–501, IEEE, September 2011.

N.BellandM.Garland,“Implementingsparsematrix- vector multiplication on throughput-oriented proces- sors,” in Proceedings of the Conference on High Perfor- mance Computing Networking, Storage and Analysis, pp. 1–11, 2009.

N. Bell and M. Garland, Cusp library, 2012.

D.Guo,W.Gropp,andL.N.Olson,“Ahybridformat for better performance of sparse matrix-vector mul- tiplication on a GPU,” The International Journal of High Performance Computing Applications, vol. 30, pp. 103–120, July 2015.

H.Anzt,J.Dongarra,G.Flegar,N.J.Higham,andE.S. Quintana-Ort ́ı, “Adaptive precision in block-jacobi pre- conditioning for iterative sparse linear system solvers,” Concurrency and Computation: Practice and Experi- ence, vol. 31, p. e4460, March 2018.

F. Goebel, H. Anzt, T. Cojean, G. Flegar, and E. S. Quintana-Ort ́ı,“Multiprecisionblock-jacobiforiter- ative triangular solves,” in Euro-Par 2020: Parallel

Processing, pp. 546–560, Springer International Pub- lishing, 2020.

T. Gru ̈tzmacher, T. Cojean, G. Flegar, F. Go ̈bel, and H. Anzt, “A customized precision format based on mantissa segmentation for accelerating sparse linear algebra,” Concurrency and Computation: Practice and Experience, vol. 32, July 2019.

S.Xu,H.X.Lin,andW.Xue,“Sparsematrix-vector multiplication optimizations based on matrix band- width reduction using NVIDIA CUDA,” in 2010 Ninth International Symposium on Distributed Computing and Applications to Business, Engineering and Science, IEEE, August 2010.

M.MaggioniandT.Berger-Wolf,“CoAdELL:Adap- tivity and compression for improving sparse matrix- vector multiplication on GPUs,” in 2014 IEEE Interna- tional Parallel & Distributed Processing Symposium Workshops, IEEE, May 2014.

M. Maggioni and T. Berger-Wolf, “AdELL: An adaptive warp-balancing ELL format for efficient sparse matrix-vector multiplication on GPUs,” in 2013 42nd International Conference on Parallel Processing, IEEE, October 2013.

W.T.Tang,W.J.Tan,R.Ray,Y.W.Wong,W.Chen, S. hao Kuo, R. S. M. Goh, S. J. Turner, and W.-F. Wong, “Accelerating sparse matrix-vector multiplication on

GPUs using bit-representation-optimized schemes,” in

Proc. of the International Conference on High Perfor- mance Computing, Networking, Storage and Analysis, ACM, 2013.

A. Monakov, A. Lokhmotov, and A. Avetisyan, “Au- tomatically tuning sparse matrix-vector multiplication for GPU architectures,” in High Performance Em- bedded Architectures and Compilers, pp. 111–125, Springer Berlin Heidelberg, 2010.

C.Hong,A.Sukumaran-Rajam,I.Nisa,K.Singh,and P. Sadayappan, “Adaptive sparse tiling for sparse ma- trix multiplication,” in Proceedings of the 24th Sympo- sium on Principles and Practice of Parallel Program- ming, ACM, February 2019.

C. Yang, A. Buluc ̧, and J. D. Owens, “Design princi- ples for sparse matrix multiplication on the gpu,” in Euro-Par 2018: Parallel Processing (M. Aldinucci, L. Padovani, and M. Torquati, eds.), (Cham), pp. 672– 687, Springer International Publishing, 2018.

T. Gale, M. Zaharia, C. Young, and E. Elsen, “Sparse GPU kernels for deep learning,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’20, IEEE Press, 2020.

K. Kourtis, G. Goumas, and N. Koziris, “Optimizing sparse matrix-vector multiplication using index and value compression,” in Proc. of the 2008 conference on Computing frontiers, pp. 87–96, ACM Press, 2008.

J. Willcock and A. Lumsdaine, “Accelerating sparse matrix computations via data compression,” in Pro- ceedings of the 20th annual international conference on Supercomputing - ICS '06, pp. 307–3016, ACM Press, 2006.

M.Freire,R.Marichal,E.Dufrechou,P.Ezzatti,and M. Pedemonte, “Trajectory-based metaheuristics for improving sparse matrix storage,” in Proceedings of the 9th Latin American Conference on Computational Inteligence (LACCI 2023), (Recife, Brasil), pp. 66–76, November 2023.

G.Berger,E.Dufrechou,andP.Ezzatti,“Sparsematrix- vector product for the bmsparse matrix format in gpus,” in Proceedings of the 21ST International workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar 2023), LNCS, (Limassol, Cyprus), Springer, August 2023.

Downloads

Published

2024-04-22

Issue

Section

Original Articles

How to Cite

[1]
“Leveraging index compression techniques to optimize the use of co-processors”, JCS&T, vol. 24, no. 1, p. e01, Apr. 2024, doi: 10.24215/16666038.24.e01.

Similar Articles

1-10 of 179

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)