This project is read-only.

SynchronizeStream.... error

Mar 28, 2013 at 5:31 PM
Edited Mar 28, 2013 at 6:03 PM
ok, after a week of syntax errors, complie errors
now crashes on _gpu.SynchronizeStream

Error message:
The runtime has encountered a fatal error. The address of the error was at 0x6fab7de7, on thread 0x7f0. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.

anyone spot my mistake?

            // start gpu eval bit
            CudafyModule km = CudafyModule.TryDeserialize(typeof(Program).Name);
            if (km == null || !km.TryVerifyChecksums())
                km = CudafyTranslator.Cudafy(typeof(Program));
            GPGPU _gpu = CudafyHost.GetDevice(CudafyModes.Target);


            Single[] Ind0BufferIn = new Single[N];
            Single[] Ind1BufferIn = new Single[N];

            Random rand = new Random(DateTime.Now.Millisecond);
            for (int i = 0; i < N; i++)                                     // load data
                Ind0BufferIn[i] = (Single)rand.NextDouble() / 4;
                Ind1BufferIn[i] = (Single)rand.NextDouble() / 4;

            int batchSize = 8;
            int loops = 6;

            IntPtr[] stagingPostIn1 = new IntPtr[N];
            IntPtr[] stagingPostOut = new IntPtr[N];

            for (int i = 0; i < batchSize; i++)
                stagingPostIn1[i] = _gpu.HostAllocate<Single>(N);
                stagingPostOut[i] = _gpu.HostAllocate<Single>(N);

            Single[] _gpuuintBufferIn0 = _gpu.Allocate<Single>(N);
            Single[] _gpuuintBufferOut = _gpu.Allocate<Single>(N);

            Single[] _CPU_dataOut = _gpu.Allocate<Single>(N);

            for (int x = 0; x < loops; x++)
                for (int i = 0; i < batchSize; i++)
                    _gpu.CopyToDeviceAsync(Ind0BufferIn, 0, _gpuuintBufferIn0, 0, N, i + 1, stagingPostIn1[i]);
                for (int i = 0; i < batchSize; i++)
                    _gpu.LaunchAsync(1, 2, i + 1, "DoubleAllValues", _gpuuintBufferIn0, _gpuuintBufferOut);
                for (int i = 0; i < batchSize; i++)
                    _gpu.CopyFromDeviceAsync(_gpuuintBufferOut, 0, _CPU_dataOut, 0, N, i + 1, stagingPostOut[i]);
                for (int i = 0; i < batchSize; i++)
                    _gpu.SynchronizeStream(i + 1);
N = 1024 ;
Mar 29, 2013 at 11:03 AM
Having an error show up in Synchronize does not tell you much, only that it happened in one of the previously queued operations. Like often is said on this forum, try simplifying your program until the error goes away and then start adding again. Try setting your batchsize and loops to 1 and comment out the LaunchAsync section.
Mar 29, 2013 at 11:46 AM

Really appreciate your help here, but I feel im missing something, probably quite a lot!!

While looking at this, I can try as I have been for the past few months by trial and error, say 60 changes and hour, 60 compiles, 8 hours a day...... etc

You and I both know that a brute force approach like this will not work and certainly wont be the most efficient.
so how can we see "exactly" what is going on here? Emulator gives me what advantages? can I see more detail in Nsight?

I feel we have a something in common I used to live in Holland and worked at ESTEC for a few years on the Columbus programme, loved living there.
**** Can I offer to write some bits of help for Cudafy? **** I'm happy to come over and have a chat?

Trying to solve this problem above, I looked at running in emulation mode.... please direct me to the document that says how to set emulation mode? because I cant find it !
its not here:,0,179

so where?

By trial and error I found a bit on codeproject by luck! but thats not a good way to discover your beautiful product? I'm not criticising, far from it, there is more help on Cudafy than most code.... but its not easy to navigate and find the info.

Regards Carl
Mar 29, 2013 at 5:31 PM
Edited Mar 29, 2013 at 9:29 PM
Hi Carl

From your error description, it really feels like an invalid memory access during one of the many mem copy operations in your code. You're likely trying to copy/move/set memory beyond its allocated size.
While looking at this, I can try as I have been for the past few months by trial and error, say 60 changes and hour, 60 compiles, 8 hours a day...... etc
That's a very scary turn of phrase. Usually a development process is a bit more than a mere random walk. At least try embedding a simulated annealing approach unto your search algorithm joke.

Try regarding your problem as an onion. Yes, an onion. Each layer stands for an extra level of complexity, and your bug may be in one of those layers. The idea is to start removing the outer layers and work your way towards the core, as it were. I look at your code and two things come to mind
1 - You didn't provide us with the source for the kernel function, as far as I can tell it's the one from Cudafy's unit tests?
2 - Your code is complex, i.e., it uses "advanced" concepts such as Cudafy's SmartCopy and multi-stream operations. Each of these complex wrappers are as one of the layers from my onion analogy. Get rid of them and simplify as much as possible your problem, and see if the problem persists. When it no longer persists, whatever you just removed will contain your bug.


P.S: Emulation mode runs your kernels on the CPU and mem copy is only host2host. Use

CudafyModes.Target = eGPUType.Emulator;
Apr 2, 2013 at 10:31 AM
Hi Carl,
Can you send me an email?
Apr 2, 2013 at 11:01 AM
Hi Nick!