One GPU and multi-threading

Nov 28, 2012 at 7:16 AM


I wrote a multi-threading system with some algorithm. I have only one GPU with cc 2.1 and I use Cuda toolkit 4.2, i will change it in the near future to 5.

I present my problem with two of my algorithms.

First is Inverse Filter, other is Convolution Matrix.

If I run separately the Inverse Filter and Convolution Matrix (only Inverse Filter or only Convolution Matrix), there is no problem, If i run the Inverse Filter before/after Convolution Matrix, there is the problem. For example: The Inverse Filter's output is the input of the Convolution Matrix.

The code of the Inverse filter looks like:

		private object lockingObj = new object();

                private Bitmap CUDAInverzFilter(Bitmap bmp)
			Bitmap bitmap = new Bitmap(bmp);
			GPGPU gpu = GetGPU();

			byte[] dev_a0;
			BitmapData data1 = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
			int length = data1.Stride * data1.Height;
			int pixnum = bitmap.Height * bitmap.Width;
			int blockSize = 1024;
			int gridSize = (int)Math.Ceiling((double)pixnum / blockSize);//pixlength
			dev_a0 = gpu.Allocate<byte>(length);
			IntPtr data1Ptr = data1.Scan0;
			gpu.CopyToDevice<byte>(data1Ptr, 0, dev_a0, 0, length);
			//gpu.Launch(gridSize, blockSize, "inverseFilter", dev_a0);

			//===========================Strongly typed launch v1.12===========================
			gpu.Launch(gridSize, blockSize, (Action<GThread, byte[]>)(inverseFilter), dev_a0);

			gpu.CopyFromDevice<byte>(dev_a0, 0, data1Ptr, 0, length);

			lock (lockingObject)
			return bitmap;

My Convolution Matrix looks like:

		private object lockingObj = new object();

		private Bitmap ConvM_3x3(ConvolutionMatrix3x3 matrix3x3, Bitmap bmp)
			Bitmap input_Bitmap = new Bitmap(bmp);
			GPGPU gpu = GetGPU();
			int[] matrix = new int[11];

			if (matrix3x3 != null)
				matrix[0] = matrix3x3.TopLeft;
				matrix[1] = matrix3x3.TopMid;
				matrix[2] = matrix3x3.TopRight;
				matrix[3] = matrix3x3.MidLeft;
				matrix[4] = matrix3x3.Pixel;
				matrix[5] = matrix3x3.MidRight;
				matrix[6] = matrix3x3.BottomLeft;
				matrix[7] = matrix3x3.BottomMid;
				matrix[8] = matrix3x3.BottomLeft;

				matrix[9] = matrix3x3.Factor;
				matrix[10] = matrix3x3.Offset;

			byte[] device_Input, device_Output; //GPU-n allokalando byte tombok
			BitmapData input_Bitmap_Data = input_Bitmap.LockBits(new Rectangle(0, 0, input_Bitmap.Width, input_Bitmap.Height), ImageLockMode.ReadWrite, PixelFormat.Format24bppRgb);
			int length = input_Bitmap_Data.Stride * input_Bitmap_Data.Height; //teljes adat hossza int-ben
			int pixnumber = input_Bitmap.Height * input_Bitmap.Width; //kep pixeleinek a szama
			int x = (int)Math.Ceiling((double)input_Bitmap.Width / 16);
			int y = (int)Math.Ceiling((double)input_Bitmap.Height / 8);
			dim3 blockSize = new dim3(128, 3);
			dim3 gridSize = new dim3(x, y);
			//GPU-n lefoglalom a memoriahelyeket a 2db byte es 1 db int tipusu tombnek
			device_Input = gpu.Allocate<byte>(length);
			device_Output = gpu.Allocate<byte>(length);
			matrix = gpu.CopyToDevice<int>(matrix);
			IntPtr input_Bitmap_Data_Pointer = input_Bitmap_Data.Scan0; //bmdataPtr mutato a bdata elso byte-jara mutat
			gpu.CopyToDevice<byte>(input_Bitmap_Data_Pointer, 0, device_Input, 0, length); //bdata-t belemasolom a device-on levo device_a byte tipusu tombbe
			//gpu.Launch(gridSize, blockSize, "convolutionMatrix3x3", device_a, device_b, matrix, bitmap.Width, bitmap.Height, matrix3x3.Factor, matrix3x3.Offset); //igy inditom el es parameterezem fel a GPU-n levo kodot

			//===========================Strongly typed launch v1.12===========================
			gpu.Launch(gridSize, blockSize, (Action<GThread, byte[], byte[], int[], int, int, int, int>)(convolutionMatrix3x3), device_Input, device_Output, matrix, input_Bitmap.Width, input_Bitmap.Height, matrix3x3.Factor, matrix3x3.Offset);

			gpu.CopyFromDevice<byte>(device_Output, 0, input_Bitmap_Data_Pointer, 0, length); //device-on levo device_b byte tipusu tombot kimasolom a bdata-ba
			lock (lockingObj)

			return input_Bitmap;

The prolem is, that i got an Exception of Invalid value at gpu.Free()

This is CudafyHostException => The message is: CUDA.NET exception: ErrorInvalidValue

When i used Cudafy v1.9 this Exception didn't appear. Since I use Cudafy v1.12, this problem appears.

I have to use gpu.Free() instead of gpu.FreeAll(), beacuse If I use gpu.FreeAll() the other algorithm's data will be lost from the GPU memory.

What do you suggest to me, for solving this problem?

Thank you,


Nov 28, 2012 at 2:10 PM
Edited Nov 28, 2012 at 2:11 PM

Hi Zollie

May I sugest you trim down your code into the minimum ammount of instructions required in order to keep getting your error? In other words, a very small functional example of your situation. It would not only help you pinpoint the error, but also help us interperting your code.

Also, you say it's a multi-threading problem, but I see no thread creation in your code.


Nov 28, 2012 at 2:32 PM

Dear All,


Thank you for your replies. We managed to solve the problem with the gpu.SetCurrentContext() method as the source of the problem was that the gpu instance was used in another thread than it was created in.

Thanks again for the support!