Performance Portable GPU Code Generation for Matrix Multiplication
Toomas Remmelg, Thibaut Lutz, Michel Steuwer, and Christophe Dubach
PDF BibTex dblp ACM DL Google Scholar
Published in Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, GPGPU@PPoPP 2016