Optimal gridsize and thread size

Oct 8, 2014 at 11:19 AM
Hi Guys,

Could you please help with a good approximate of the optimal grid and block size for:

Tesla 20c
Quadro K2000

on large data sets.

I was using 512 for both when launching but I cant seem to get the speedup improvement I was expecting to see on the Tesla 20c.

I haven't been able to use the occupancy calculator effectively.

Would appreciate your help.
Oct 8, 2014 at 9:35 PM
I have no experience on teslas, but have you considered using the cuda profiler? If I recall correctly, It also studies occupancy, and lets you find other bottlenecks in your implementation. But perhaps you already know of this.