This project is read-only.

ErrorUnknown - Memory Transfer?

Jun 19, 2013 at 5:50 PM
Hello,

I am getting a non-deterministic error which returns the error "ErrorUnkown" in my Cudafy code.

It points to line 42 in my Cudafy code which is the first memory transfer in this:
gpu.Launch((arrayOnHost.GetUpperBound(0) + 1) / N + 1, N).methodCall(... Parameters...);

gpu.Synchronize();
gpu.CopyFromDevice<float>(Array1OnDevice, Array1OnHost); //<---Line 42
gpu.Synchronize();
gpu.CopyFromDevice<float>(Array2OnDevice, Array2OnHost);
gpu.Synchronize();
gpu.CopyFromDevice<float>(Array3OnDevice, Array3OnHost);
gpu.Synchronize();
gpu.CopyFromDevice<float>(Array4OnDevice, Array4OnHost);
gpu.Synchronize();
gpu.CopyFromDevice<float>(Array5OnDevice, Array5OnHost);
gpu.Synchronize();
After getting this error I added the gpu.Synchronize(); command thinking the memory transfers might be stepping on eachother and causing errors, but to no avail.

And the reason I say non-deterministic is because it works about 1/3 run attempts without an issue, then the other 2/3 it throws this error - with nothing changing between run attempts as far as I can tell.

Any ideas on how to fix this? Or even what might be the issue?

Thanks for your help,

Zach
Jun 19, 2013 at 6:01 PM
Hi Zach
Your error is actually happening inside the kernel call, and not on the memory transfer. As for the exact reason of your error, I once wrote a post on this board regarding possible causes for "error unknown". Just check past posts for "error unknown" and I'm sure you'll find it.
Jun 19, 2013 at 6:10 PM
Edited Jun 19, 2013 at 6:11 PM
For reference, I'm assuming you refering to this post: https://cudafy.codeplex.com/discussions/407193

Particularly:
  • If you’re getting an ErrorUnknown after either launching your kernel or (if launching it asynchronously) on the next cudafy instruction, that means that your kernel aborted unexpectedly. There could be a plethora of reasons for that, but the most frequent is memory access violations. Use emulation mode to pinpoint your problem. The CUDA Toolkit “cuda-memcheck.exe” can also help you there. Other frequent reasons for ErrorUnknown are calling “return” on only a few of the threads within the block, or dividing by zero, or even launching a kernel with an excessively large blockDim*GridDim.
.
I'll take a look and see what I can figure out. Thank you.
Jun 19, 2013 at 6:41 PM
memory access violations - There are many arrays, but they are all very simple. And if this was occurring it would error out regularly, not sporadically, correct?

calling “return” - This code doesn't call return at all.

dividing by zero - I put a check around the every divide in the kernel and the error persists. This code also runs flawlessly on the CPU version, so it is unlikely there are any divide by zero errors as that would throw an error on the CPU as well. When it does work, the GPU output matches the CPU output - suggesting it is coded correctly as it processes tens of thousands of data points. (If they were even slightly different we would expect to see some differences in output)

excessively large blockDim*GridDim - I assume it can be as high as an Integer? I am only pushing up to 83,000 and would be surprised if this was too high.

Did I miss anything in my assessment? What are the other possibilities?
Jun 19, 2013 at 10:05 PM
excessively large blockDim*GridDim
what are your values for each, respectively?
mem access problems won't necessarily always result in a kernel error. Have you tried running the mem checker tool?

could you provide us with a simplified version of your kernel code that still produces the error? It might help.
Jun 19, 2013 at 10:28 PM
pedritolo1 wrote:
excessively large blockDim*GridDim
what are your values for each, respectively?
mem access problems won't necessarily always result in a kernel error. Have you tried running the mem checker tool?

could you provide us with a simplified version of your kernel code that still produces the error? It might help.
I'm looking at 256x320 (give or take 2-3 on the 320 depending on base size)

I could not get the memchecker tool to work - it just blinks up for a quick second and disappears, but I'll dig into that some more and see if I can get it working.

Here's the simplified code. Keep in mind it's a proof of concept - there are various reasons I did it the way I did, but may not be self-evident.
public static void findInfo(GThread thread, float[,] inputArray1, int[,] inputArray2, float[,] resultArray1,
                    float[,] resultArray2, float[,] resultArray3, float[, ,] resultArray4, [] resultArray5)
        {
            int i = thread.blockIdx.x * thread.blockDim.x + thread.threadIdx.x; //which index are we working with
            if (i < inputArray2.Length)
            {
                    int v0 = inputArray2[i, 0];
                    int v1 = inputArray2[i, 1];
                    int v2 = inputArray2[i, 2];

                    if (3*v0 < inputArray1.Length && 3*v2 < inputArray1.Length && 3*v1 < inputArray1.Length)
                    {
                        resultArray1[i, 0] = (inputArray1[v0, 0] + inputArray1[v1, 0] + inputArray1[v2, 0]) / ((float)3.0); 
                        resultArray1[i, 1] = (inputArray1[v0, 1] + inputArray1[v1, 1] + inputArray1[v2, 1]) / ((float)3.0); 
                        resultArray1[i, 2] = (inputArray1[v0, 2] + inputArray1[v1, 2] + inputArray1[v2, 2]) / ((float)3.0); 
                    

                        float temp1X= resultArray1[i, 0] - inputArray1[v0, 0];     
                        float temp1Y= resultArray1[i, 1] - inputArray1[v0, 1];
                        float temp1Z= resultArray1[i, 2] - inputArray1[v0, 2];

                        float temp2X= resultArray1[i, 0] - inputArray1[v1, 0];     
                        float temp2Y= resultArray1[i, 1] - inputArray1[v1, 1];
                        float temp2Z= resultArray1[i, 2] - inputArray1[v1, 2];

                        float temp3X= resultArray1[i, 0] - inputArray1[v2, 0];     
                        float temp3Y= resultArray1[i, 1] - inputArray1[v2, 1];
                        float temp3Z= resultArray1[i, 2] - inputArray1[v2, 2];

                        float len0 = (float)Math.Sqrt((temp1X* temp1X) + (temp1Y* temp1Y) + (temp1Z* temp1Z));
                        float len1 = (float)Math.Sqrt((temp2X* temp2X) + (temp2Y* temp2Y) + (temp2Z* temp2Z));
                        float len2 = (float)Math.Sqrt((temp3X* temp3X) + (temp3Y* temp3Y) + (temp3Z* temp3Z));

                        resultArray3[i, 0] = (float)len0;       
                        resultArray3[i, 1] = (float)len1;
                        resultArray3[i, 2] = (float)len2;

                        if (len0 != 0 && len1 != 0 && len2 != 0)
                        {
                            resultArray4[i, 0, 0] = (float)(temp1X/ len0);   
                            resultArray4[i, 0, 1] = (float)(temp1Y/ len0);
                            resultArray4[i, 0, 2] = (float)(temp1Z/ len0);

                            resultArray4[i, 1, 0] = (float)(temp2X/ len1);   
                            resultArray4[i, 1, 1] = (float)(temp2Y/ len1);
                            resultArray4[i, 1, 2] = (float)(temp2Z/ len1);

                            resultArray4[i, 2, 0] = (float)(temp3X/ len2);   
                            resultArray4[i, 2, 1] = (float)(temp3Y/ len2);
                            resultArray4[i, 2, 2] = (float)(temp3Z/ len2);

                            float AX = inputArray1[v1, 0] - inputArray1[v0, 0];
                            float AY = inputArray1[v1, 1] - inputArray1[v0, 1];
                            float AZ = inputArray1[v1, 2] - inputArray1[v0, 2];

                            float BX = inputArray1[v2, 0] - inputArray1[v0, 0];
                            float BY = inputArray1[v2, 1] - inputArray1[v0, 1];
                            float BZ = inputArray1[v2, 2] - inputArray1[v0, 2];

                            float cpX = AY * BZ - AZ * BY;
                            float cpY = AZ * BX - AX * BZ;
                            float cpZ = AX * BY - AY * BX;

                            float len = (float)Math.Sqrt((cpX * cpX) + (cpY * cpY) + (cpZ * cpZ));

                            resultArray2[i, 0] = (cpX / len);
                            resultArray2[i, 1] = (cpY / len);
                            resultArray2[i, 2] = (cpZ / len);

                            resultArray5[i] = ((float)0.5) * len;
                        }
                    }
                }
            }

Jun 20, 2013 at 1:27 AM
ok, from what I can tell, inputArray2 contains indexes into all the other arrays.
1 - are you sure that the indexes v0,v1 or v2 won't ever result in a silent out-of-range access?
2 - are you sure inputArray2.Length gives the corrent value? You should use that array method that returns a specific dimension length (don't remember the name).
3 - there's at least 20 (many are hidden) arguments for your kernel. Some cards have limited stack size on their kernels. if all else fails, you could try to reduce the number of arguments, perhaps merging some of the arrays together somehow.

besides that, i've got nothing...

good luck
Jun 20, 2013 at 5:13 PM
pedritolo1 wrote:
ok, from what I can tell, inputArray2 contains indexes into all the other arrays.
1 - are you sure that the indexes v0,v1 or v2 won't ever result in a silent out-of-range access?
2 - are you sure inputArray2.Length gives the corrent value? You should use that array method that returns a specific dimension length (don't remember the name).
3 - there's at least 20 (many are hidden) arguments for your kernel. Some cards have limited stack size on their kernels. if all else fails, you could try to reduce the number of arguments, perhaps merging some of the arrays together somehow.

besides that, i've got nothing...

good luck
You are correct, inputArray2 contains indices into the other arrays.

1 - I did implement checks to verify that v0-v2 were within range, but took those out before posting the code here.

2 - I think you hit it here. I overlooked the fact that inputArray2 was multi-dimentional (n x 3) and so I have to check that 3i<inputArray2.Length.
(That's why I normally use .getUpperBound(0); but unfortunately that doesn't work in the GPU methods)

3 - While this is possible, I would be surprised as this should cause it to fail every time, not intermitently (unless the hidden arguments change in number between runs?)

Anyway, I have updated the code and it appears to be working, although I supposed I could just be getting very lucky (the curse of indeterminate errors...). I'll be back if it re-surfaces.

Thank you for your help!
Jun 21, 2013 at 9:11 AM
Try using GetLength on arrays in GPU code.