Cublas matrix multiplication

Jan 13, 2013 at 5:22 PM

Did any one succeed in using cublas gemm for matrix multiplication?

I only get NotImplementedException.

Would really make me happy if any one can give me a clue.

/Javerberg

************ Code ***********

public static void TestMatrixMultiplication()
{
    GPGPU gpu = CudafyHost.GetDevice(CudafyModes.Target);
    GPGPUBLAS blas = GPGPUBLAS.Create(gpu);

    int matrixAHeight = 3;
    int matrixAWidth = 3;
    float[] matrixA = new[] { -0.9f, -0.8f, -0.7f, -0.6f, -0.5f, -0.4f, -0.3f, -0.2f, -0.1f };

    int matrixBHeight = 3;
    int matrixBWidth = 3;
    float[] matrixB = new[] { -0.9f, -0.8f, -0.7f, -0.6f, -0.5f, -0.4f, -0.3f, -0.2f, -0.1f };
           
    int matrixCHeight = 3;
    int matrixCWidth = 3;
    float[] matrixC = new float[matrixCWidth * matrixCHeight];

    // allocate the memory on the GPU
    float[] devA = gpu.Allocate<float>(matrixAHeight * matrixAWidth);
    float[] devB = gpu.Allocate<float>(matrixBHeight * matrixBWidth);
    float[] devC = gpu.Allocate<float>(matrixCHeight * matrixCWidth);

    // copy the arrays to the GPU
    gpu.CopyToDevice(matrixA, devA);
    gpu.CopyToDevice(matrixB, devB);
    gpu.CopyToDevice(matrixC, devC);

    float alpha = 1.0f;
    float beta = 0.0f;
    int m = matrixAHeight;  //number of rows of matrix op(A) and C.
    int n = matrixBWidth;    //number of columns of matrix op(B) and C.
    int k = matrixAWidth;    //number of columns of op(A) and rows of op(B).
    int lda = matrixAWidth; //leading dimension of two-dimensional array used to store the matrix A.
    int ldb = matrixBWidth; //leading dimension of two-dimensional array used to store matrix B.
    int ldc = matrixCWidth; //leading dimension of a two-dimensional array used to store the matrix C.

    blas.GEMM(m, n, k, alpha, devA, devB, beta, devC,
        cublasOperation.N, cublasOperation.N, lda, ldb, ldc);

    gpu.CopyFromDevice(devC, matrixC);

    for (int i = 0; i < 9; i++)
    {
        Console.WriteLine(matrixC[i]);
    }
}

Jan 17, 2013 at 11:50 PM

Sorry for late reply.

What build target of your cudafy? 32bit or 64bit? BLAS LEVEL 2, 3 and SPARSE routines in cudafy support only 64bit now. 32bit version is not supported yet. Try 64bit. 32bit is in working progress :)

Jan 18, 2013 at 7:00 PM

Thanks alot!

Changed to 64 bits, and it works!

 

Feb 6, 2013 at 9:32 AM
Hi guys, Im facing the same problem with the example above. Wen I call the method I get a NotImplementedException. The method GEMM is not implemented. What do you guys mean by the build target.
Feb 6, 2013 at 9:36 AM
You need to use 64bit Windows OS. In BUILD menu in VS 2012, click Configuration Manager. Change Platform x86 to x64.
Feb 6, 2013 at 9:45 AM
Thanks for the speedy reply. I have changed it but I am still having the same problem.
Feb 6, 2013 at 10:03 AM
Check your Active Solution configuration in Configuration manager (DEBUG or RELEASE). Platform setting is different with both solution configuration. You need to change both platform settings.
Feb 6, 2013 at 10:31 AM
Both have been set to x64 and its not woriking.
Feb 6, 2013 at 11:29 PM
What type of your CUDAfy ? If your cudafy was pre-built binary (in Download section), try to build cudafy from source code in x64 platform. Source code can be downloaded from here.
Feb 7, 2013 at 11:50 AM
I have tried that and there i still get the same error. Is it possible that cudafy is not seeing the cublas library
Feb 7, 2013 at 12:07 PM
If cublas library is missing, program throws other exception, not NotImplementedException.

NotImplementedException error in BLAS was thrown if program is running in 32bit mode. both CUDAfy and your project is set to x64 platform?
Feb 7, 2013 at 12:20 PM
So I downloaded cudafy again and compliled it in x64 debug mode. I then copied ur example and pasted it in the program file inside CudafyByExample and then I just called that method inside the try of the main. All the other examples run perfectly and I must also add that I dnt have a GPU card yet and so im using the emulator.
Feb 7, 2013 at 12:22 PM
Edited Feb 7, 2013 at 12:32 PM
Oh, I got it. BLAS and SPARSE need to actual device for CUDA libraries. CPU Emulator is not implemented yet :)
Feb 7, 2013 at 12:38 PM
Thanks for that. Do u know how much perfomance u actualy shed by going thru cudafy compared to coding in cuda from c++.
Feb 7, 2013 at 12:47 PM
CUDAfy's BLAS and SPARSE routines call native C++ functions using DLLImport. So I think there is very slight difference between C# and C++ (pure calculating speed. no entire speed).

But .Net programs are compiled to binary code when program is running (Just In Time Compiler), So generally, First running speed of C# program is slower than C++ native program.

If you need more actual performance test results, I recommend you that create new thread for other user's opinion.
Apr 11, 2013 at 9:46 AM
Hi,

In this same examples when I change the size of the matices more precisely when I change the lines above to

int matrixAHeight = 2;
        int matrixAWidth = 2;
        //float[] matrixA = new[] { -0.9f, -0.8f, -0.7f, -0.6f, -0.5f, -0.4f, -0.3f, -0.2f, -0.1f };
        float[] matrixA = new[] {-0.9f, -0.8f, -0.7f, -0.6f};

        int matrixBHeight = 2;
        int matrixBWidth = 6;
        //float[] matrixB = new[] { -0.9f, -0.8f, -0.7f, -0.6f, -0.5f, -0.4f, -0.3f, -0.2f, -0.1f, 0.5f, 0.5f, 0.5f };
        float[] matrixB = new[] { -0.9f, -0.8f, -0.7f, -0.6f, -0.5f, -0.4f, -0.3f, -0.2f, -0.1f, 0.5f, 0.5f, 0.5f };


        int matrixCHeight = 2;
        int matrixCWidth = 6;
        float[] matrixC = new float[matrixCWidth * matrixCHeight];

I get zeros and NAN's as output. Is there anything I did wrong.
Apr 11, 2013 at 1:32 PM
The sample code's GEMM parameters is wrong.

A prototype of CUDAfy.NET cublas wrapper's GEMM is here.

public abstract void GEMM(int m, int k, int n, float alpha, float[] A, float[] B, float beta, float[] C, cublasOperation transa = cublasOperation.N, cublasOperation transb = cublasOperation.N, int lda = 0, int ldb = 0, int ldc = 0);

order of dimension of matrix is M, K, N. You missed this like M N K. (this is different of other normal blas function. sorry :) )

You have to fix source code at

blas.GEMM(m, n, k, alpha, devA, devB, beta, devC,
    cublasOperation.N, cublasOperation.N, lda, ldb, ldc);
to

blas.GEMM(m, k, n, alpha, devA, devB, beta, devC,
    cublasOperation.N, cublasOperation.N);
(lda, ldb, ldc is automatically calculated generally. So you can omit these parameters.)
Apr 12, 2013 at 1:13 PM
Edited Apr 12, 2013 at 1:14 PM
Hi, Thanks for the response, I have tried this now, double[] check = new double[]{1 ,2 , 3, 4}; double[] checkDev = _gpu.Allocate(check); _gpu.CopyToDevice(check, checkDev); double[] numbers = new double[]{1, 2,3, 4, 1, 2,3,4}; double[] numbersDev = _gpu.Allocate(numbers); double[] resultCheckDev = _gpu.Allocate(numbers); double[] res = new double[8]; _gpu.CopyToDevice(numbers, numbersDev); blas.GEMM(2, 2, 4, 1, checkDev, numbersDev, 0, resultCheckDev, cublasOperation.N, cublasOperation.N); _gpu.CopyFromDevice(resultCheckDev, res); I am trying to compute [1 2]x [1 2 3 4 3 4 ; 1 2 3 4] Which should be 3 6 9 12 ;7 14 21 28 But I get something totally off from the code above
Apr 12, 2013 at 1:13 PM
Hi,

Thanks for the response, I have tried this now,
       double[] check = new double[]{1 ,2 , 3, 4};
       double[] checkDev = _gpu.Allocate(check);
        _gpu.CopyToDevice(check, checkDev);
       double[] numbers =  new double[]{1, 2,3, 4, 1, 2,3,4};
        double[] numbersDev = _gpu.Allocate(numbers);
        double[] resultCheckDev = _gpu.Allocate(numbers);
        double[] res =  new double[8];
        _gpu.CopyToDevice(numbers, numbersDev);
        blas.GEMM(2, 2, 4, 1, checkDev, numbersDev, 0, resultCheckDev, cublasOperation.N, cublasOperation.N);
        _gpu.CopyFromDevice(resultCheckDev, res);

I am trying to compute

[1 2 [1 2 3 4
3 4] x 1 2 3 4]

Which should be
3 6 9 12
7 14 21 28

But I get something totally off from the code above
Apr 12, 2013 at 1:24 PM
cublas's matrix format is column-major format. This means that element of matrix are saved as "vertical" vector, not "horizontal" vector. So your double[]{1, 2, 3, 4) means that matrix
[1, 3]
[2, 4]

and {1,2,3,4,1,2,3,4} means that
[1, 3, 1, 3]
[2, 4, 2, 4]

so result
[7, 15, 7, 15]
[10,22,10,22]
is totally correct. Not
[3, 6, 9, 12]
[7, 14, 21, 28]

You can check this article for difference of row-major format and column-major format : http://en.wikipedia.org/wiki/Row-major_order#Column-major_order