Demystifing threads and blocks

Oct 25, 2012 at 10:21 AM


I have been using CUDAfy on a Quadro FX 570M, for the sample apps I currently use a blockSize and gridSize of 128 and 128 (that is what is in the examples). Looking at the specs returned from this device, it shows the following :

Threads in warp : 32

Max threads in block : 512

Max thread dimensions : (512, 512, 1)

Does this mean that I can launch the gpu with a blockSize and threadSize of each 512, eg

gpu.Launch(512, 512, "GPUModel", etc etc)?

and then in the model each of the threads (512 * 512 = 263,680) will be run in parallel? eg

public static GPUModel(GThread thread, etc etc...) {

      int tid = thread.threadIdx + thread.blockIdx.x * thread.blockDim.x;

I guess it is the last line that I do not understand, basically I want to know how to determine how many unique threads I can run in parallel. If you could explain the last line of code above where the tid is defined that would be appreciated.


Oct 30, 2012 at 4:28 AM

There is a CUDA Occupancy Calculator implemented in Excel available as part of the CUDA 5.0 download. (Possibly earlier versions as well.)  It can be accessed from tab 3 of the NVidia CUDA Samples Browser utility that comes with the download. This may answer some of your questions.

I also stumbled on a white paper earlier today while browsing the web, discussing the occasional occurrence of performance-sinkholes for small changes in the set-up parameters. I have lost the link, but the Calculator should help you stay away from those. Mainly, It warned against ssuming that performance will always change smoothly for small changes in the set-up.