This project is read-only.

Simulation

Jul 13, 2012 at 10:56 AM
Edited Jul 13, 2012 at 12:41 PM

Hi Nick,

I'd like to know how can I simulate the CUDA GPU?

I tried it:

 

CudafyModule module = CudafyTranslator.Cudafy(ePlatform.All, eArchitecture.sm_20);
GPGPU egpu = CudafyHost.GetGPGPU(eGPUType.Emulator, 0);
egpu.LoadModule(module, false);

CudafyTranslator.GenerateDebug = true;
GPGPUProperties gprop = egpu.GetDeviceProperties(true);

 

 

My program caught an Exception at the CopyToDevice:

 

egpu.CopyToDevice<byte>(data_Small_Picture.Scan0, 0, device_small_picture, 0, small_picture_data_length); 
egpu.CopyToDevice<byte>(data_Large_Picture.Scan0, 0, device_large_picture, 0, large_picture_data_length); 

 

Data is not host allocated.

What should i do?

Thank you!
Zollie 

Jul 14, 2012 at 2:03 PM

Did you allocate device_small_picture before copying to device?  Either explicitly do this or use the overload of CopyToDevice that automatically does this and returns the device array.

Nick

Jul 16, 2012 at 8:07 AM
Edited Jul 17, 2012 at 9:12 AM

Thank you for the fast answer.
Yes, I did allocate.
My code looks like this:

            CudafyModule cm = CudafyTranslator.Cudafy();
            GPGPU gpu = CudafyHost.GetDevice(eGPUType.Emulator);
            gpu.LoadModule(cm);

            CudafyTranslator.GenerateDebug = true;
            GPGPUProperties gprop = gpu.GetDeviceProperties(true);

            BitmapData data_Small_Picture = small_Bitmap.LockBits(new Rectangle(0, 0, small_Bitmap.Width, small_Bitmap.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);
            BitmapData data_Large_Picture = large_Bitmap.LockBits(new Rectangle(0, 0, large_Bitmap.Width, large_Bitmap.Height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);


            int smallpic_padding = data_Small_Picture.Stride - small_Bitmap.Width * 3;
            int largepic_padding = data_Large_Picture.Stride - large_Bitmap.Width * 3;

            int pixnumber_Small_Picture = small_Bitmap.Width * small_Bitmap.Height;//kicsi kep osszpixelszama
            int pixnumber_Large_Picture = large_Bitmap.Width * large_Bitmap.Height;//nagykep osszpixelszama

            int histogram_width = large_Bitmap.Width - small_Bitmap.Width;//nagy kep szelessegebol a kiskep szelessege
            int histogram_height = large_Bitmap.Height - small_Bitmap.Height;//nagy kep magassagabol a kiskep magassaga

            // computation for grid dimensions
            int thread_x = (int)Math.Ceiling((double)histogram_width / 24);
            int thread_y = (int)Math.Ceiling((double)histogram_height / 16);

            // instantiate dimensions
            dim3 blockSize = new dim3(24, 16);
            dim3 gridSize = new dim3(thread_x, thread_y);

            int small_picture_data_length = data_Small_Picture.Stride * data_Small_Picture.Height;
            int large_picture_data_length = data_Large_Picture.Stride * data_Large_Picture.Height;

            byte[] device_small_picture = gpu.Allocate<byte>(small_picture_data_length);//small
            byte[] device_large_picture = gpu.Allocate<byte>(large_picture_data_length);//large
           
            IntPtr device_small_picture_Ptr = gpu.HostAllocate<byte>(small_picture_data_length);
            IntPtr device_large_picture_Ptr = gpu.HostAllocate<byte>(large_picture_data_length);

            int[] histogram_device = gpu.Allocate<int>(histogram_width * histogram_height);
            gpu.Set<int>(histogram_device);

            Console.WriteLine("\nCopyToDevice");

            gpu.CopyToDevice<byte>(device_small_picture_Ptr, 0, device_small_picture, 0, small_picture_data_length); //minta kep copy
            gpu.CopyToDevice<byte>(device_large_picture_Ptr, 0, device_large_picture, 0, large_picture_data_length); //nagykep copy

            gpu.StartTimer();
            // start matching
            gpu.Launch(gridSize, blockSize, "gpuFineMatch",
                histogram_device, histogram_width, histogram_height,
                device_large_picture, large_Bitmap.Width, largepic_padding,
                device_small_picture, small_Bitmap.Width, small_Bitmap.Height, smallpic_padding,
                treshold);
            Console.WriteLine("Eltelt ido: " + gpu.StopTimer() + " msec");
            int[] histogram_host = new int[histogram_width * histogram_height];//Pixelben
            gpu.CopyFromDevice<int>(histogram_device, histogram_host);

            gpu.FreeAll();

            small_Bitmap.UnlockBits(data_Small_Picture);
            large_Bitmap.UnlockBits(data_Large_Picture);

Now this code doesn't throw Exception, but it works without data, all of the histogram_device elements are: 0 (at eGPUType.Cuda, it has the right elements, so no problem with Cudafiing :) )

I want to work with data_Small_Picture & data_Large_Picture, but at the CopyToDevice the data lost.

I know I can't copy data from host to device at emulation, but what should I do?

Thank You!
Zollie 

 
Jul 17, 2012 at 9:01 AM

Is it working on the GPU?

Jul 17, 2012 at 9:12 AM

Yes it works fine with this rows:

 

	    int small_picture_data_length = data_Small_Picture.Stride * data_Small_Picture.Height;
            int large_picture_data_length = data_Large_Picture.Stride * data_Large_Picture.Height;

            byte[] device_small_picture = gpu.Allocate<byte>(small_picture_data_length);//small
            byte[] device_large_picture = gpu.Allocate<byte>(large_picture_data_length);//large

            int[] histogram_device = gpu.Allocate<int>(histogram_width * histogram_height);
            gpu.Set<int>(histogram_device);

         
            gpu.CopyToDeviceAsync<byte>(data_Small_Picture.Scan0, 0, device_small_picture, 0, small_picture_data_length); // minta kep copy
            gpu.CopyToDeviceAsync<byte>(data_Large_Picture.Scan0, 0, device_large_picture, 0, large_picture_data_length); //nagykep copy

For emulation I needed the IntPtrs.

 This is my FineMatching algorithm. It's running time is ~7 millisecundum with a cc 1.2 card.  (The large picture has: 143*96 , the small picture has: 59*63 resolution). 

Thank You!
Zollie 

Jul 17, 2012 at 9:39 AM

Time is a little short right now on my side.  Have you downloaded the CUDAfy source code and compiled?  You can then link to the individual projects and step into the code.

Nick

Jul 17, 2012 at 10:05 AM

Yes, i downloaded it, it works fine.

But what I need: At the Emulator I want to work with Bitmaps/BitmapData. I can't use CopyToDevice at the emulation mode, I'd like to know what sould i do to attain the BitmapData?

In Cuda mode I have results in my histogram array, in Emulator mode, the array's every elements are 0, because of the algorithm can't attain the data.

I saw at the CudafyIntroduction the following in emulation mode:

#region Add vectors
                // Add vectors - GPUs are best at algorithms like working on matrices and large vectors
                // where lots of calculations can be done independently in parallel.
                int[] a = new int[N];
                int[] b = new int[N];
                int[] c = new int[N];
                // fill the arrays 'a' and 'b' on the CPU
                for (int i = 0; i < N; i++)
                {
                    a[i] = -i;
                    b[i] = i * i;
                }
                // copy the arrays 'a' and 'b' to the GPU - these overloads automatically allocate GPU memory
                int[] dev_a = _gpu.CopyToDevice(a);
                int[] dev_b = _gpu.CopyToDevice(b);
                // allocate memory on the GPU for the result - this allocate enough memory to hold a vector the
                // same length as vector c - it does not copy vector c (same as _gpu.Allocate<int>(c.Length);)
                int[] dev_c = _gpu.Allocate<int>(c);
                // Threads are grouped in Blocks. Blocks are grouped in a Grid. Here we launch N Blocks where
                // each block contains 1 thread. Note addVector contains a GThread arg - no need to pass this.
                // GThread is the Cudafy equivalent of the built-in CUDA variables. Use it to identify thread id.
                _gpu.Launch(N, 1).addVector(dev_a, dev_b, dev_c);
                // copy the array 'c' back from the GPU to the CPU
                _gpu.CopyFromDevice(dev_c, c);
                for (int i = 0; i < N; i++)
                    Debug.Assert(a[i] + b[i] == c[i]);
                Console.WriteLine("We just added {0} elements of our two vectors in {0} parallel threads.", N);
                // This used a bit more precious GPU memory than the earlier examples, so let's free it
                _gpu.FreeAll();
                #endregion
I know, that CopyToDevice allocates automatically.
I think i should convert into byte[] array my Bitmaps after it, I use CopyToDevice and then Launch().

Thank You!
Zollie