CUDA.NET exception: ErrorNoBinaryForGPU (Ensure that compiled architecture version is suitable for device).

Jul 2, 2013 at 9:17 PM
Hi,

I have been using Cudafy V1.22 without issues on my Win7 64bit machine when I compile my VS2010 Professional C# apps as x86.

Today I have been trying to change my C# .Net 4 Cuda app to 64bit so I can load more data onto the GPU (I run into the C# x86 memory limit when reading large files).

I now get an exception ErrorNoBinaryForGPU when calling LoadModule.

I have Cuda version v5.0 64bit installed. My video card is a GTX 650 Ti. Using the GetDeviceProperties method I can see that the Device Cuda Capability is 3.0. I have tried calling the CudafyTranslator.Cudafy method with various options, with x86 I use sm30 without problem.

As part of the build process in VS I see as part of the output the following :

Compiler version: v5.0
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\bin\nvcc
-I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.0\include" -m64 -arch=sm_30 "C:\Work\Dev\Cuda\TestApp\TestApp\bin\Debug\CUDAFYSOURCETEMP.cu" -o "C:\Work\Dev\Cuda\TestApp\TestApp\bin\Debug\CUDAFYSOURCETEMP.ptx" --ptx
CUDAFYSOURCETEMP.cu

I'm new to Cuda so am a bit lost as to what the problem can be. Changing my app configuration back to x86 and the problem dissapears.

Any advice or questions to help me solve this are appreciated. Thanks
Coordinator
Jul 3, 2013 at 8:42 AM
Make sure you are not using a cached cudafy module. Can you show the code you use for creating and loading module?
Jul 3, 2013 at 9:07 AM
Hi Nick,

I created a new project from scratch and it worked, so I guess this means that the existing project I was trying to convert must have been cached somewhere.

Can you advise on how the module should be loaded so that a cached copy is not loaded, or are there any flags that can be set so that the cache is overwritten during each build process?

To help others I have posted my working 64bit app below. If anybody else comes across this problem they should be able to run the below without problems (although the capability and cores/blocks vars may need changing depending on GPU).
using System;
using System.Windows.Forms;
using Cudafy;
using Cudafy.Host;
using Cudafy.Translator;

namespace Test64Bit
{
    public partial class FormMain : Form
    {
        public FormMain()
        {
            InitializeComponent();
        }

        private void buttonRunModel_Click(object sender, EventArgs e)
        {
            // create an array to hold the example data
            float[,] dataArray = new float[10, 10];

            // fill the example data
            for (int x = 0; x < 10; x++)
            {
                for (int y = 0; y < 10; y++)
                {
                    dataArray[x, y] = 1.0f;
                }
            }

            // debug our environment
            if (Environment.Is64BitProcess) Console.WriteLine("64 bit");
            else Console.WriteLine("32 bit");

            // translate the class and compile on the gpu
            CudafyModule km = CudafyTranslator.Cudafy(eArchitecture.sm_30);

            // get the device and unload any existing modules on the device
            GPGPU gpu = CudafyHost.GetDevice(eGPUType.Cuda);
            gpu.UnloadModules();
            
            // get some device properties
            GPGPUProperties gpprop = gpu.GetDeviceProperties(false);
            Console.WriteLine("Device Cuda Capability = " + gpprop.Capability);

            // load the module onto the device
            gpu.LoadModule(km);

            // copy the data array to the gpu
            float[,] dev_dataArray = gpu.CopyToDevice(dataArray);

            // launch x blocks of x threads each
            int gridSize = 512;
            int blockSize = 512;
            gpu.Launch(gridSize, blockSize, "GPUModel_64Bit", dev_dataArray);

            // sync all the threads
            gpu.Synchronize();

            // free the memory allocated on the GPU
            gpu.FreeAll();

            // debug we are complete without issues
            Console.WriteLine("Done");
        }

        // the model that we run on the gpu in parallel
        [Cudafy]
        public static void GPUModel_64Bit(GThread thread, float[,] dataArray)
        {
            // get the thread id 
            int my_tid = thread.threadIdx.x + thread.blockIdx.x * thread.blockDim.x;
        }
    }
}
Coordinator
Jul 3, 2013 at 3:01 PM
I can only imagine that the checksum for the dll did not change between builds. Not sure how unless you had your code in separate Any CPU dll.
The next release of CUDAfy will feature tighter checks for architecture and platform to prevent such eventualities.