Unable to switch to eArchitecture.sm_20 : Exception ErrorNoBinaryForGPU

Mar 16, 2012 at 7:19 PM


I try to work using Architecture 2.0 (x64), but doesn't look working.  I can do it n x64 using 1.3 but not 2.0.  I have download the latest CUDAFY (Version: 1.8.4427.36820    RuntIme Version: v4.0.30319)

In my code when I use :

1)  CudafyModule km = CudafyTranslator.Cudafy(ePlatform.x64, eArchitecture.sm_13);   =>  Works well !

2)  CudafyModule km = CudafyTranslator.Cudafy(ePlatform.x64, eArchitecture.sm_20);  => Throw Exception at gpu.LoadModule()
I have tried also : CudafyModule km = CudafyTranslator.Cudafy(ePlatform.x64, eArchitecture.sm_20, new Version(4, 1), true);  No success, same error :-(

GPGPU gpu = CudafyHost.GetDevice(CudafyModes.Target);

Exeception :
CUDA.NET exception: ErrorNoBinaryForGPU (Ensure that compiled architecture version is suitable for device).

My Environment variables ... and I verified each of them.

CUDA_BIN_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\bin
CUDA_INC_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\include
CUDA_LIB_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\lib\x64
CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\
CUDA_PATH_V3_2=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2\
CUDA_PATH_V4_0=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\
CUDA_PATH_V4_1=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\
NVTOOLSEXT_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\nvToolsExt\

A bit confuse ...

Thx for help.

Mar 18, 2012 at 9:35 PM


Hi following to search at my problem.  I have downloaded the Codafy source code and try to reproduce it manually :

Looking at the KernelModule.Compile() method , here are the parms used to create the NVCC Process:

"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v4.1\\bin\\nvcc.exe"

" -I\"C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v4.1\\include\" -m64  -arch=sm_20  \"D:\\DEV\\C#\\Cudafy_src\\Cudafy\\Cudafy.Host.UnitTests\\bin\\x64\\Debug\\CUDAFYSOURCETEMP.cu\"  -o \"D:\\DEV\\C#\\Cudafy_src\\Cudafy\\Cudafy.Host.UnitTests\\bin\\x64\\Debug\\CUDAFYSOURCETEMP.ptx\"  --ptx"


process.StandardError.ReadToEnd() :



 -I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\include " -m64  -arch=sm_20  -G0   "D:\DEV\C#\Cudafy_src\Cudafy\Cudafy.Host.UnitTests\bin\x64\Debug\CUDAFYSOURCETEMP.cu"  -o  "D:\DEV\C#\Cudafy_src\Cudafy\Cudafy.Host.UnitTests\bin\x64\Debug\CUDAFYSOURCETEMP.ptx"  --ptx"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\bin>nvcc  -I "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\include " -m64  -arch=sm_20  -G0   "D:\DEV\C#\Cudafy_src\Cudafy\Cudafy.Host.UnitTests\bin\x64\Debug\CUDAFYSOURCETEMP.cu"  -o  "D:\DEV\C#\Cudafy_src\Cudafy\Cudafy.Host.UnitTests\bin\x64\Debug\CUDAFYSOURCETEMP.ptx"


Later the "ptxModule.PTX" give :
// Generated by NVIDIA NVVM Compiler
// Compiler built on Fri Jan 13 04:24:03 2012 (1326446643)
// Cuda compilation tools, release 4.1, V0.2.1221
.version 3.0
.target sm_20
.address_size 64


The error still there in CudaGPU.LoadModule() at these lines

     CUmodule cumod = _cuda.LoadModule(Encoding.ASCII.GetBytes(ptxModule.PTX));   <=== generate error
     _module = module;
     _module.Tag = cumod;
catch (CUDAException ex)
     HandleCUDAException(ex);  //{"Exception of type 'GASS.CUDA.CUDAException' was thrown."} ErrorNoBinaryForGPU


I continue to investigate, but at first look NVCC compilation looks fine ??!!!


Mar 25, 2012 at 5:15 PM


I have abandon and re-install all (including VS2010, NVIDIA drivers ...) one by one.  Now it works :-(   Man, I lost too much time with this problem :-(

Mar 27, 2012 at 7:57 AM

Glad you sorted it.  These kind of issues are a killer for productivity.

Oct 5, 2012 at 3:54 PM

I have a similar problem.  ErrorNoBinaryForGPU (Ensure that compiled architecture version is suitable for device).

Mine is that I have one CUDAFY app that works and one that does not.  The one that works has 29 files listed in the begining of CUDAFYSOURCETEMP.ptx the one that does not work has 28, The missing file ends with: CUDAFYSOURCETEMP.cudafe1.gpu.  Both have files that end with CUDAFYSOURCETEMP.cudafe2.gpu.  I am not sure what's causing this.


Oct 7, 2012 at 6:21 PM


Not come across this before. What are the differences between projects?


Oct 8, 2012 at 8:25 PM

Hi Nick,

This is tied to another thread, but the only difference is calling GMath.Pow.  When I include that the CUDAFYSOURCETEMP.cudafe1.gpu file is not generated. 

Oct 10, 2012 at 1:01 PM

This file is generated by nvcc, not cudafy.  Is the generated CUDAFYSOURCETEMP.cu correctly using the floating point version of pow?

Dec 3, 2012 at 7:23 AM

This isn't going to help much, but I've been experiencing a similar problem.

I have a single project which I access and build from 2 separate PCs.
Once PC has a CUDA card capable of sm_30, whilst the other is only capable of sm_11, so I need to regularly switch back and forth between them. (Which at the moment, I am doing manually by amending the code).

The project is located on a network drive, and I build to that same network location from both PCs.

I've been getting the ErrorNoBinaryForGPU error when I attempt to move from one PC to the other, and rebuild, even though I change the architecture. I tried cleaning the solution, manually deleting all files in the Debug directory etc. Eventually it does work, and I'm able to run the program.

When I switch back to the original PC, I experience the same problem again - and once again eventually it rectifies itself (or my cleaning/deleting files rectifies it).

As I said originally, I don't think this would help much - I don't know exactly what is causing the error, or exactly how to rectify, but I do think there is a bug there of some sort.

Dec 3, 2012 at 7:58 AM

There are too many factors involved to say what could be happening. Do you copy the binaries between machines, too?  Do you also build for same platform each time (x64 or x86).

Do you do caching of cudafy modules?  You should also serialize cudafy modules - you can then open these in the cudafymoduleviewer utility and see the ptx code which also contains the CUDA compute capability.

Dec 3, 2012 at 8:03 AM
Edited Dec 3, 2012 at 8:05 AM

yes, as I said I don't think you'll be able to get to the bottom of it from the information I have / have given.

But to answer your specific questions:

  • I don't copy the binaries between machines, I rebuild before running on each machine (always in debug from within the IDE at this point) 
  • I'm always building for x86
  • I don't cache cudafy modules
  • I've tried serializing and then opening using cudafymoduleviewer, but when I open the .cdfy file I get the error "Could not load file or assembly "THEDLL.dll" or one of its dependencies. An attempt was made to load a program with an incorrect format".
Dec 3, 2012 at 8:09 AM

Make sure you are using the same version of CUDAfy on all machines.

The incorrect format error is most likely due to loading your x86 module into an x64 cudafymoduleviewer - i.e. your Windows is 64-bit. This highlights a need for both x64 and x86 versions of the two utilities shipped with cudafy - will attempt to include this in future.

Dec 3, 2012 at 8:18 AM

you're right, I did have different cudafy versions on each machine. Rectified now, I will report back whether the error persists.

Also, my windows is 64-bit (and my code 32-bit as I said previously).

Dec 5, 2012 at 1:42 AM

The "eventually it works itself out" symptom has me thinking about hardware issues, such as whether something is cached on the GPU that needs to be cleaned out. Does that make any sense? My phyiscs background leads me to believe that software always behaves the same, but hardware is (more) vulnerable to other influences.

Jun 10, 2014 at 3:02 AM
I've had this issue again today.

2 things I've noticed:
  • it "Cudafies" in debug mode, but not release.
  • the Capability of one of my GPUs is listed as 1.1 when using GetDeviceProperties. It is a 660ti, and should be capable of 3.0. I was using architecture sm_11 given this, which as I said worked in debug but not release. If I overwrite it with sm_30 it works always.