Creating Streams on Device

Oct 2, 2015 at 2:32 PM
Just wondering if it's possible to create streams on the device so that you can launch new child grids (using dynamic parallelism) on separate streams to maximise concurrency??
Oct 5, 2015 at 7:16 AM
You may need to dig around a little in the source code but dynamic parallelism is supported on suitable devices.
Oct 5, 2015 at 8:49 AM
Yes, I've been using dynamic parallelism successfully but (I think) the current support in Cudafy does not allow you to specify a stream for the child kernel.

The Cuda programming guide appears to suggest that you can create streams in device code and then launch child kernels asynchronously using those streams. I'm fairly certain that would get some good speed up (at least in my case).

I'd be willing to help with implementing this - do you guys have any high level code architecture for Cudafy that would accelerate any implementation I did? e.g. the code base is quite large, I could do with some pointers!