This project is read-only.

Parallel execution of CUDAfy code

Apr 18, 2013 at 2:21 PM
Hi,

Is it possible to run cuda code in a single GPU inside a parallel cpu loop (i.e. Parallel.For)? I've tried something like that but an exception was thrown regarding the .cu file being locked and used by another process...

Thanks,
Daniel
Apr 19, 2013 at 8:30 AM
Please post some code so we can see what you are trying.
Apr 19, 2013 at 12:29 PM
Edited Apr 19, 2013 at 11:57 PM
Hi Nick,

Please take a look at this file, "CalcLnFugGPU" function: https://github.com/DanWBR/dwsim3/blob/master/DWSIM/Objects/PropertyPackages/LeeKeslerPlocker/Helper%20Classes/LKP.vb

It works fine in a single-threaded scenario, but my problem is that this function may be called in a multithreaded context so I would like to adapt it in order to make it work in such a case.

If the user chooses to enable CPU parallel processing, this function will be indirectly called two times in parallel by "DW_CalcKvalue" ("DW_CalcFugCoeff" => "CalcLnFug" => "CalcLnFugGPU") located in the base property package class:

https://github.com/DanWBR/dwsim3/blob/master/DWSIM/Objects/PropertyPackages/Base/PropertyPackage.vb
If My.Settings.EnableParallelProcessing Then
                My.Application.IsRunningParallelTasks = True
                Try
                    Dim task1 As Task = New Task(Sub()
                                                     fugliq = Me.DW_CalcFugCoeff(Vx, T, P, State.Liquid)
                                                 End Sub)
                    Dim task2 As Task = New Task(Sub()
                                                     If type = "LV" Then
                                                         fugvap = Me.DW_CalcFugCoeff(Vy, T, P, State.Vapor)
                                                     Else ' LL
                                                         fugvap = Me.DW_CalcFugCoeff(Vy, T, P, State.Liquid)
                                                     End If
                                                 End Sub)

                    task1.Start()
                    task2.Start()
                    Task.WaitAll(task1, task2)
                Catch ae As AggregateException
                    For Each ex As Exception In ae.InnerExceptions
                        Throw
                    Next
                End Try
                My.Application.IsRunningParallelTasks = False
            Else
                fugliq = Me.DW_CalcFugCoeff(Vx, T, P, State.Liquid)
                If type = "LV" Then
                    fugvap = Me.DW_CalcFugCoeff(Vy, T, P, State.Vapor)
                Else ' LL
                    fugvap = Me.DW_CalcFugCoeff(Vy, T, P, State.Liquid)
                End If
            End If
I've found something in the forums about EnableMultithreading(), SetCurrentContext(), Lock(), Unlock(), and also took a look at the Unit Tests sample, but couldn't get it to work. I kept getting an "ErrorInvalidContext" exception.

I should probably move the GPU instance to outside the function and make it 'global' in DWSIM, but I didn't have time to try that yet.

Thanks,
Daniel
Apr 19, 2013 at 1:55 PM
You need to create/get the device on the main thread. Remember also to allocate device memory shared between threads on the main thread, too. You must free memory on the thread it was created.

See MultiThreadedTests.cs
       [Test]
        public void Test_TwoThreadCopy()
        {
            _gpu = CudafyHost.GetDevice(eGPUType.Cuda);
            _gpuuintBufferIn3 = _gpu.Allocate(_uintBufferIn1);
            _gpuuintBufferIn4 = _gpu.Allocate(_uintBufferIn1);
            _gpu.EnableMultithreading();
            bool j1 = false;
            bool j2 = false;
            for (int i = 0; i < 10; i++)
            {
                Console.WriteLine(i);
                SetInputs();
                ClearOutputs();
                Thread t1 = new Thread(Test_TwoThreadCopy_Thread1);
                Thread t2 = new Thread(Test_TwoThreadCopy_Thread2);
                t1.Start();
                t2.Start();
                j1 = t1.Join(10000);
                j2 = t2.Join(10000);
                if (!j1 || !j2)
                    break;
            }

            _gpu.DisableMultithreading();           
            _gpu.FreeAll();
            Assert.IsTrue(j1);
            Assert.IsTrue(j2);
        }

        private void Test_TwoThreadCopy_Thread1()
        {
            try
            {
                _gpu.Lock();
                _gpuuintBufferIn1 = _gpu.CopyToDevice(_uintBufferIn1);
                _gpu.CopyOnDevice(_gpuuintBufferIn1, _gpuuintBufferIn3);
                _gpu.CopyFromDevice(_gpuuintBufferIn3, _uintBufferOut1);
                Assert.IsTrue(Compare(_uintBufferIn1, _uintBufferOut1));
                _gpu.Free(_gpuuintBufferIn1);
                _gpu.Unlock();
            }
            catch (Exception ex)
            {
                Debug.WriteLine(ex.ToString());
            }
        }
        
        private void Test_TwoThreadCopy_Thread2()
        {
            try
            {
                _gpu.Lock();
                _gpuuintBufferIn2 = _gpu.CopyToDevice(_uintBufferIn2);
                _gpu.CopyOnDevice(_gpuuintBufferIn2, _gpuuintBufferIn4);
                _gpu.CopyFromDevice(_gpuuintBufferIn4, _uintBufferOut2);
                Assert.IsTrue(Compare(_uintBufferIn2, _uintBufferOut2));
                _gpu.Free(_gpuuintBufferIn2);
                _gpu.Unlock();
            }
            catch (Exception ex)
            {
                Debug.WriteLine(ex.ToString());
            }
        }
Apr 19, 2013 at 2:10 PM
Thanks Nick, I'll try your suggestions asap.

I have another question: In the example you posted above, If I were to run a Sub inside the function (as is my case), where should I put the LoadModule() call? In the main thread or inside the function? After or before the Lock() call? Also, the CudafyModule instance is created on a per gpu or per thread basis?

Thanks,
Daniel
Apr 19, 2013 at 2:24 PM
Create and load the module in the main thread. Think of it as loading all possible functions on the GPU before you start working.
Apr 20, 2013 at 1:26 AM
Thanks Nick. I've moved the gpu instance to the application level and it is working fine now.