Parallel GEMM

Sep 4, 2013 at 1:47 AM
Edited Sep 4, 2013 at 2:00 AM
I have 4 Titan GPUs and I want to run GEMM on different data on each of them in parallel. GEMM is blocking, so I cant call it in sequence.

Do I need to create 4 BLAS objects
GPGPUBLAS blas1 = GPGPUBLAS.Create(gpu1);
GPGPUBLAS blas2 = GPGPUBLAS.Create(gpu2);
GPGPUBLAS blas3 = GPGPUBLAS.Create(gpu3);
GPGPUBLAS blas4 = GPGPUBLAS.Create(gpu4);
and call blasN.GEMM from 4 different threads?
private void main()
            Thread gemmThread1 = new Thread(GEMM_thread);
            Thread gemmThread2 = new Thread(GEMM_thread);
            Thread gemmThread3 = new Thread(GEMM_thread);
            Thread gemmThread4 = new Thread(GEMM_thread);


            while (
               gemmThread1.IsAlive || 
               gemmThread2.IsAlive || 
               gemmThread3.IsAlive || 
               gemmThread4.IsAlive) ;

private void GEMM_Thread(object blasObj)
   GPGPUBLAS blas = (GPGPUBLAS)blasObj;
or is there a more elegant solution?
Sep 4, 2013 at 7:22 AM
That's quite a set-up you've got there! Would love to know more about what you are doing...
As far as I know you do indeed need separate BLAS objects since each one is associated with one GPU. You could put your objects in an array or list and cycle through them in a loop.
Sep 5, 2013 at 8:53 AM
Unfortunately due to only allowing one context at a time we can't use multiple gpus for GPGPUBLAS.
Sep 5, 2013 at 4:57 PM
That seems... odd. Can you elaborate?
Sep 5, 2013 at 6:18 PM
Take a look at the multi-threaded unit tests in Cudafy.Host.UnitTests project.
Sep 9, 2013 at 1:09 PM
Edited Sep 9, 2013 at 1:10 PM
unfortunately i think its a problem with Gemm. I revised my code based off multi-gpu and it results in the gemm doing nothing. ended up doing my own interop for gemm :/