This project is read-only.

Issues with multithreading

Sep 26, 2015 at 3:54 PM
Edited Sep 26, 2015 at 3:57 PM
Hi,

What is the recipe for accessing the GPU within multiple threads?

I have GPU code which runs from multiple threads created by a task scheduler. In the UI thread I call gpu.EnableMultithreading and gpu.SetCurrentContext. The code to allocate, copyto/from, launch and free memory is inside the procedure which is ran by the other threads.

I've set gpu.Lock() before memory allocation and Unlock() after deallocation, but it always gives me an InvalidContext error.

In CUDAfy unit tests, memory allocation is done in the UI thread before setting the Multithreading flag and locking the gpu, but this is not possible in my code because it is really dynamic, I don't know the size of the arrays a priori. I'd like to understand what is the mechanics of multithreading with CUDAfy so I can see what I'm doing wrong.

This is my code sequence:

1 - GPU is initialized in the UI thread (InitComputeDevice() does the module loading stuff).
If My.Settings.EnableGPUProcessing Then
        DWSIM.App.InitComputeDevice()
        My.MyApplication.gpu.SetCurrentContext()
        My.MyApplication.gpu.EnableMultithreading()
End If
2 - Multiple parallel Tasks are created to run the solving code and, at some point, a GPU function may be called by another thread than UI's. This doesn't work at all, with or without the SetCurrentContext() call. Lock() always throws an invalid context exception.
      Public Shared Sub pr_gpu_func(n As Integer, Vx As Double(), VKij As Double(,), Tc As Double(), Pc As Double(), w As Double(), T As Double, alpha As Double(), ai As Double(), bi As Double(), a As Double(,), aml_temp As Double(), bml_temp As Double(), aml2_temp As Double())

            Dim gpu As GPGPU = My.MyApplication.gpu

            Dim dev_alpha As Double() = Nothing
            Dim dev_ai As Double() = Nothing
            Dim dev_bi As Double() = Nothing
            Dim dev_Tc As Double() = Nothing
            Dim dev_Pc As Double() = Nothing
            Dim dev_W As Double() = Nothing
            Dim dev_a As Double(,) = Nothing
            Dim dev_vkij As Double(,) = Nothing
            Dim dev_Vx As Double() = Nothing
            Dim dev_aml2_temp As Double() = Nothing
            Dim dev_aml_temp As Double() = Nothing
            Dim dev_bml_temp As Double() = Nothing

            If Not gpu.IsCurrentContext Then gpu.SetCurrentContext()
            If gpu.IsMultithreadingEnabled Then gpu.Lock()

            ' allocate the memory on the GPU
            dev_alpha = gpu.Allocate(Of Double)(alpha)
            dev_ai = gpu.Allocate(Of Double)(ai)
            dev_bi = gpu.Allocate(Of Double)(bi)
            dev_Tc = gpu.Allocate(Of Double)(Tc)
            dev_Pc = gpu.Allocate(Of Double)(Pc)
            dev_W = gpu.Allocate(Of Double)(w)
            dev_a = gpu.Allocate(Of Double)(a)
            dev_vkij = gpu.Allocate(Of Double)(VKij)
            dev_Vx = gpu.Allocate(Of Double)(Vx)
            dev_aml2_temp = gpu.Allocate(Of Double)(aml2_temp)
            dev_aml_temp = gpu.Allocate(Of Double)(aml_temp)
            dev_bml_temp = gpu.Allocate(Of Double)(bml_temp)

            
            ' copy the arrays to the GPU
            gpu.CopyToDevice(alpha, dev_alpha)
            gpu.CopyToDevice(ai, dev_ai)
            gpu.CopyToDevice(bi, dev_bi)
            gpu.CopyToDevice(Tc, dev_Tc)
            gpu.CopyToDevice(Pc, dev_Pc)
            gpu.CopyToDevice(w, dev_W)
            gpu.CopyToDevice(a, dev_a)
            gpu.CopyToDevice(VKij, dev_vkij)
            gpu.CopyToDevice(Vx, dev_Vx)
            gpu.CopyToDevice(aml2_temp, dev_aml2_temp)
            gpu.CopyToDevice(aml_temp, dev_aml_temp)
            gpu.CopyToDevice(bml_temp, dev_bml_temp)

        ' launch subs
            gpu.Launch(n + 1, 1).pr_gpu_sum1(dev_alpha, dev_ai, dev_bi, dev_Tc, dev_Pc, dev_W, T)
            gpu.Launch(New dim3(n + 1, n + 1), 1).pr_gpu_sum2(dev_a, dev_ai, dev_vkij)
            gpu.Launch(n + 1, 1).pr_gpu_sum3(dev_Vx, dev_a, dev_aml_temp, dev_aml2_temp)
            gpu.Launch(n + 1, 1).pr_gpu_sum4(dev_Vx, dev_bi, dev_bml_temp)

        ' copy the arrays back from the GPU
            gpu.CopyFromDevice(dev_alpha, alpha)
            gpu.CopyFromDevice(dev_ai, ai)
            gpu.CopyFromDevice(dev_bi, bi)
            gpu.CopyFromDevice(dev_Tc, Tc)
            gpu.CopyFromDevice(dev_Pc, Pc)
            gpu.CopyFromDevice(dev_W, w)
            gpu.CopyFromDevice(dev_a, a)
            gpu.CopyFromDevice(dev_vkij, VKij)
            gpu.CopyFromDevice(dev_Vx, Vx)
            gpu.CopyFromDevice(dev_aml2_temp, aml2_temp)
            gpu.CopyFromDevice(dev_aml_temp, aml_temp)
            gpu.CopyFromDevice(dev_bml_temp, bml_temp)

        ' free the memory allocated on the GPU
            gpu.Free(dev_alpha)
            gpu.Free(dev_ai)
            gpu.Free(dev_bi)
            gpu.Free(dev_Tc)
            gpu.Free(dev_Pc)
            gpu.Free(dev_W)
            gpu.Free(dev_a)
            gpu.Free(dev_vkij)
            gpu.Free(dev_Vx)
            gpu.Free(dev_aml2_temp)
            gpu.Free(dev_aml_temp)
            gpu.Free(dev_bml_temp)

            If gpu.IsMultithreadingEnabled Then gpu.Unlock()

        End Sub
Thanks
Daniel
Sep 26, 2015 at 4:13 PM
I read some posts here and thought that maybe I needed to reload the module but, even with that...

Image
Sep 26, 2015 at 10:53 PM
It seems to work if I remove the Lock() and Unlock() calls from the above code, but I'm not sure if it's safe to leave it like that.
Feb 19, 2016 at 4:45 PM
Edited Feb 19, 2016 at 4:47 PM
Hi,

Just for informational purposes, I`ve been able to solve this issue by setting the .NET task scheduler (which runs the parallel tasks/threads) to
TaskScheduler.FromCurrentSynchronizationContext
instead of
TaskScheduler.Default
which was the root cause of the errors.

Regards
Daniel