Feb 20, 2012 at 4:47 PM


Does anybody have a sample code for Cudafy.Maths.BLAS.GPGPUBLAS?



Feb 27, 2012 at 4:00 AM

You can check sample BLAS code in BLAS1_1D.cs, BLAS2.cs, BLAS3.cs in Cudafy.Math.UnitTests. 

Here is very simple sample code for BLAS routines.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using Cudafy;
using Cudafy.Host;
using Cudafy.Maths.BLAS;

namespace CUDAfySample
    class Program
        static void Main(string[] args)
            // Get GPU device
            GPGPU gpu = CudafyHost.GetDevice(CudafyModes.Target);

            // Create GPGPUBLAS (CUBLAS Wrapper)
            GPGPUBLAS blas = GPGPUBLAS.Create(gpu);

            // Prepare sample data
            Random rand = new Random();
            int n = 500;
            double[] cpuVectorX = new double[n];
            double[] cpuVectorY = new double[n];
            double[] cpuMatrixA = new double[n * n];

            for (int i = 0; i < n; i++)
                cpuVectorX[i] = rand.Next(100);
                cpuVectorY[i] = rand.Next(100);

            for (int i = 0; i < n * n; i++)
                cpuMatrixA[i] = rand.Next(100);

            // Copy CPU to GPU memory
            // Before using GPGPUBLAS, You have to copy data from cpu to gpu.
            double[] gpuVectorX = gpu.CopyToDevice(cpuVectorX);
            double[] gpuVectorY = gpu.CopyToDevice(cpuVectorY);
            double[] gpuMatrixA = gpu.CopyToDevice(cpuMatrixA);

            // BLAS1 sample : y = x + y
            blas.AXPY(1.0, gpuVectorX, gpuVectorY);

            // BLAS2 sample : y = Ax + y
            blas.GEMV(n, n, 1.0, gpuMatrixA, gpuVectorX, 1.0, gpuVectorY);

            // Get result from GPU
            gpu.CopyFromDevice<double>(gpuVectorY, cpuVectorY);
            // And you can use result cpuVectorY for any other purpose.

Aug 3, 2012 at 6:50 AM
Edited Aug 3, 2012 at 8:22 AM


I am receiving the following error while trying to run this example:

Unable to load DLL 'cublas32_42_9': The specified module could not be found. (Exception from HRESULT: 0x8007007E)

Any suggestions? I am running with 4.2 toolkit and 4.2 SDK. I have Cudafy.NET.dll in my references and it is v.1.9.

All sorts of help would be much appreciated. This problem seems like a dll dependancy problem, but I can not figure out what I should do to fix it. Do I need additional dll in order to make it work?



Aug 3, 2012 at 9:54 AM


If you got some error message for DLL exception, please check whether your Cuda toolkit version is 4.2. And check your system path has cuda toolkit's bin folder. In default, Cuda toolkit is installed at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin (In my environmental, v4.2 -> v4.1 because my cuda toolkit version). And please check whether there is cublas32_42_9.dll.

PS. Cudafy's BLAS level 2, 3 and SPARSE routines supported at 64bit only yet. Be careful :)

Aug 3, 2012 at 10:02 AM
Edited Aug 3, 2012 at 10:05 AM


thank you for your quick response.

However, I checked the given instructions and everything seems to be fine. My Cuda toolkit version is in fact 4.2 and system path is pointing to the right folder. And the dll does exist in the valid folder.

I am considering if I should reinstall the whole environment.

If you have any other suggestions I would be grateful :).

And to be more precise the exception pops during the creation of GPGPUBLAS.Create(gpu) method.



Aug 3, 2012 at 10:06 AM

flipordie : I suggest you that copy cublas32_42_9.dll to your work folder (same place with your program binary.)

Aug 3, 2012 at 10:07 AM

Done already :(.

Aug 3, 2012 at 10:09 AM

Wondering if the problem could be my Quadro 600 graphic card. It shouldn't be though, right?

Aug 3, 2012 at 10:10 AM

My system is both CUDA and toolkit 4.1, so I can't test latest version of cudafy. Sorry :(

Aug 3, 2012 at 10:11 AM

Alrighty. Well out of curiosity I am going to try it with CUDA 4.1 :). Nevertheless, thanks for your time.

Aug 3, 2012 at 10:11 AM

Quadro 600 has cuda core too. So I think your GPU may not be problem.

Aug 3, 2012 at 10:13 AM

If you using latest cudafy, you need cuda toolkit 4.2 because cudafy's internal reference is all 4.2 dlls.

Aug 3, 2012 at 10:15 AM

Ah, damnit. Well, anyhow, gotto try to find a fix. :).

Oct 3, 2013 at 2:49 PM
Hi guys can someone help me with generating random numbers from the Poisson distribution through cudafy now that cuda 5 and 5.5 has it.
Oct 3, 2013 at 2:58 PM
cudafy's curand wrapper does not have curandGeneratePoisson function wrapper yet. So you can't use that functions yet.
Oct 3, 2013 at 3:03 PM
when do u think we can expect to have that function? Do u know of a way to do it peharps?
Oct 3, 2013 at 3:11 PM
Edited Oct 3, 2013 at 3:13 PM
CUDAfy is open source library, so if you need that functions, you can add curandGeneratePoisson wrapper yourself.

Here is guide for adding wrapper.
  1. Add driver function to Cudafy.Math/RAND/CURANDDriver.cs
    You can see a lot of curand function wrapper in here, and simply imitate that for curandGeneratePoisson function.
    You can check more detail of original curand poisson function prototrye in here :
  2. Add Interface function to Cudafy.Math/RAND/GPGPURAND.cs
  3. Add implemented function of interface to Cudafy.Math/RAND/CudaRAND.cs
You can use any CURAND function of original CUDA using this way.
Oct 4, 2013 at 8:17 AM
@phetsa - if you would like to add this code then I can give you access to our internal SVN and help you further. Please email me.
Oct 4, 2013 at 1:06 PM
Ive never added anything before but I will try
Oct 7, 2013 at 8:33 AM
Can any one confirm if I can do this in VS2012. Im having a problem doing some simple tests. Like calling one of the rand functions from the cudafybyexample project. It crashes completely. kkc0923, can you please help.
Oct 7, 2013 at 8:59 AM
phetsa, paste your code to , then I will check it.
Oct 7, 2013 at 9:39 AM
basically im geting this error


I followed the steps quite closely.

The exception is being thrown by the method:

protected void SafeCall(curandStatus status, DevicePtrEx ptrEx = null)
        if(ptrEx != null)
        if (status != curandStatus.CURAND_STATUS_SUCCESS)
            throw new CudafyMathException(CudafyMathException.csRAND_ERROR_X, status.ToString());
Oct 7, 2013 at 9:47 AM
I'm fairly sure that CudafyByExample project does not use any NVIDIA random functions. The Cudafy.Host.UnitTests project does. Please provide specific information relating to your problem: source file and project and whether you changed anything from the default CUDAfy source code.
Oct 7, 2013 at 10:05 AM
I added a snipet of code in the file ripple_gpu.cs

  • This software is based upon the book CUDA By Example by Sanders and Kandrot
  • and source code provided by NVIDIA Corporation.
  • It is a good idea to read the book while studying the examples!
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Cudafy;
    using Cudafy.Host;
    using Cudafy.Maths.RAND;
    using Cudafy.Translator;
namespace CudafyByExample
public class ripple_gpu
    public ripple_gpu()

    public const int DIM = 1024;

    private byte[] _dev_bitmap;

    private GPGPU _gpu;

    private dim3 _blocks;

    private dim3 _threads;

    public void Initialize(int bytes)
        CudafyModule km = CudafyTranslator.Cudafy();

        _gpu = CudafyHost.GetDevice(CudafyModes.Target, CudafyModes.DeviceId);
        _dev_bitmap = _gpu.Allocate<byte>(bytes);
         * Added piece of code
        double[] devData = _gpu.Allocate<double>(30);
        double[] hostData = new double[30];
        _gpu.CopyToDevice(hostData, devData);
        GPGPURAND rand = GPGPURAND.Create(_gpu);
        _gpu.CopyFromDevice(devData, hostData);

        _blocks = new dim3(DIM / 16, DIM / 16);
        _threads = new dim3(16, 16);

    public void Execute(byte[] resultBuffer, int ticks)
        _gpu.Launch(_blocks, _threads).thekernel(_dev_bitmap, ticks);
        _gpu.CopyFromDevice(_dev_bitmap, resultBuffer);

    public static void thekernel(GThread thread, byte[] ptr, int ticks)
        // map from threadIdx/BlockIdx to pixel position
        int x = thread.threadIdx.x + thread.blockIdx.x * thread.blockDim.x;
        int y = thread.threadIdx.y + thread.blockIdx.y * thread.blockDim.y;
        int offset = x + y * thread.blockDim.x * thread.gridDim.x;

        // now calculate the value at that position
        float fx = x - DIM/2;
        float fy = y - DIM/2;
        float d = GMath.Sqrt(fx * fx + fy * fy );
        //float d = thread.sqrtf(fx * fx + fy * fy);
        byte grey = (byte)(128.0f + 127.0f * GMath.Cos(d / 10.0f - ticks / 7.0f) /
                                             (d/10.0f + 1.0f));
        ptr[offset*4 + 0] = grey;
        ptr[offset*4 + 1] = grey;
        ptr[offset*4 + 2] = grey;
        ptr[offset*4 + 3] = 255;        

    public void ShutDown()

Thanks in advance
Oct 7, 2013 at 10:38 AM
CUDAfy does not contain a function for GeneratePoisson!
Oct 7, 2013 at 11:04 AM
I have added it but even with any of the other functions like GenerateNormal return the same error