This project is read-only.

Double precision floating point numbers

Jan 8, 2015 at 12:01 PM
Hi there,

I have been reviewing the discussions and issues but I can't find much information about working with double precision floating point numbers.
Is it actually possible to work with doubles?

I have tried to run a simple test where arrays of doubles are summed, but I get an error when the code is being translated saying that the word "double" is not recognized. If I replace the doubles by floats it does work. My code is in VB. This is the version with singles:

    Private Const N As Integer = 1000

    Public Event PublishResult(ByVal Result As String)

    Public Sub Execute(device As Integer)

        ' Generate and load module:

        Dim MyModule As CudafyModule = CudafyTranslator.Cudafy(eArchitecture.OpenCL12) '.CudafyOld(ePlatform.All, eArchitecture.OpenCL, Nothing, False)
        Dim GPU As GPGPU = CudafyHost.GetDevice(eArchitecture.OpenCL12, device)
        GPU.LoadModule(MyModule)

        ' Run: 

        Dim a(N) As Single
        Dim b(N) As Single
        Dim c(N) As Single

        ' Allocate the memory on the GPU:

        Dim dev_a() As Single = GPU.Allocate(Of Single)(a)
        Dim dev_b() As Single = GPU.Allocate(Of Single)(b)
        Dim dev_c() As Single = GPU.Allocate(Of Single)(c)

        ' Fill the arrays 'a' and 'b' on the CPU

        For i = 0 To N - 1
            a(i) = i ' Math.Sin(Math.PI * i / N)
            b(i) = i
        Next

        ' Copy the arrays 'a' and 'b' to the GPU

        GPU.CopyToDevice(a, dev_a)
        GPU.CopyToDevice(b, dev_b)

        ' Launch Add on N threads

        Dim nThreads As Integer = 24
        Dim nBlocks As Integer = Math.Ceiling(N / nThreads)

        Dim start As Date = DateAndTime.Now

        GPU.Launch(nBlocks, nThreads, "Calculate", dev_a, dev_b, dev_c)

        Dim elapsed As TimeSpan = start - DateAndTime.Now

        RaiseEvent PublishResult(String.Format("Device {0}: {1}s{2}ms", device, elapsed.Seconds, elapsed.Milliseconds))

        ' Copy the array 'c' back from the GPU to the CPU

        GPU.CopyFromDevice(dev_c, c)

        ' Free the memory allocated on the GPU

        GPU.Free(dev_a)
        GPU.Free(dev_b)
        GPU.Free(dev_c)

    End Sub

    <Cudafy()>
    Public Shared Sub Calculate(ByVal thread As GThread, ByVal a() As Single, ByVal b() As Single, ByVal c As Single())

        Dim i As Integer = index(thread.threadIdx.x, thread.blockDim.x, thread.blockIdx.x)

        If (i < a.Length) Then c(i) = a(i) + b(i)

    End Sub

    <Cudafy()>
    Public Shared Function index(ByVal tid As Integer, bdi As Integer, ByVal bid As Integer) As Integer
        Return tid + bid * bdi
    End Function
Jan 8, 2015 at 1:26 PM
Edited Jan 8, 2015 at 1:57 PM
I have found that the compute capability of my device (AMD RADEON HD 6450) is 1.2. Would that be the reason why Cudafy crashes when attempting to run a kernel with doubles?
I have found the next comment in part D of the programming guide:

Double-Precision Floating-Point Functions
The errors listed below only apply when compiling for devices with native double-precision support. When compiling for devices without such support, such as devices of compute capability 1.2 and lower, the double type gets demoted to float by default and the double-precision math functions are mapped to their single-precision equivalents.
Jan 9, 2015 at 10:18 AM
You need to check carefully whether your AMD device supports doubles or not. Likely yours does not. Last I looked - a year ago - you needed to spend quite a bit of money to get an AMD with double support.
Jan 9, 2015 at 11:05 AM
Hi Nick,

Thanks for replaying. I have been using the HD6450 just for test purpose. I have checked its capability in wikipedia, and now I am 100% sure it doesn't feature double precision. In a couple of days I should be getting a new laptop with an NVIDIA GeForce 840M, which should support double precision (its capability is 5.0). Now I would only like to know if the problem I had was caused by the fact that the HD6450 doesn't feature double precision, or if it was because Cudafy cannot handle double precision.
Has anyone ever tried working with double precision?
Jan 9, 2015 at 1:21 PM
CUDAfy handles doubles fine. You don't give details of the crash so I really cannot say. With the 840M and using CUDA with CUDAfy you will find life much easier than OpenCL with CUDAfy.
Jan 9, 2015 at 1:42 PM
This is the error I get. The code compiles, and the error happens when "GPU.LoadModule(MyModule)" is called.
An unhandled exception of type 'Cudafy.CudafyCompileException' occurred in Cudafy.NET.dll

Additional information: Compilation error: "C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl", line 8: error: identifier

          "double" is undefined

  __kernel  void Calculate(global double* a, int aLen0, global double* b, int bLen0, global double* c, int cLen0);
                                  ^

"C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl", line 8: error: identifier

          "double" is undefined

  __kernel  void Calculate(global double* a, int aLen0, global double* b, int bLen0, global double* c, int cLen0);
                                                               ^

"C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl", line 8: error: identifier

          "double" is undefined

  __kernel  void Calculate(global double* a, int aLen0, global double* b, int bLen0, global double* c, int cLen0);
                                                                                            ^

"C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl", line 13: error: 

          identifier "double" is undefined

  __kernel  void Calculate(global double* a, int aLen0, global double* b, int bLen0, global double* c, int cLen0)
                                  ^

"C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl", line 13: error: 

          identifier "double" is undefined

  __kernel  void Calculate(global double* a, int aLen0, global double* b, int bLen0, global double* c, int cLen0)
                                                               ^

"C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl", line 13: error: 

          identifier "double" is undefined

  __kernel  void Calculate(global double* a, int aLen0, global double* b, int bLen0, global double* c, int cLen0)
                                                                                            ^

6 errors detected in the compilation of "C:\Users\GUILLE~1\AppData\Local\Temp\OCL7784T5.cl".

Frontend phase failed compilation.

.
And this is the code:
Public Class MyTest

    Private Const N As Integer = 1000

    Public Event PublishResult(ByVal Result As String)

    Public Sub Execute(device As Integer)

        ' Generate and load module:

        Dim MyModule As CudafyModule = CudafyTranslator.Cudafy(eArchitecture.OpenCL) '.CudafyOld(ePlatform.All, eArchitecture.OpenCL, Nothing, False)
        Dim GPU As GPGPU = CudafyHost.GetDevice(eArchitecture.OpenCL, device)
        GPU.LoadModule(MyModule)

        ' Run: 

        Dim a(N) As Double
        Dim b(N) As Double
        Dim c(N) As Double

        ' Allocate the memory on the GPU:

        Dim dev_a() As Double = GPU.Allocate(Of Double)(a)
        Dim dev_b() As Double = GPU.Allocate(Of Double)(b)
        Dim dev_c() As Double = GPU.Allocate(Of Double)(c)

        ' Fill the arrays 'a' and 'b' on the CPU

        For i = 0 To N - 1
            a(i) = i
            b(i) = i
        Next

        ' Copy the arrays 'a' and 'b' to the GPU

        GPU.CopyToDevice(a, dev_a)
        GPU.CopyToDevice(b, dev_b)

        ' Launch Add on N threads

        Dim nThreads As Integer = 24
        Dim nBlocks As Integer = Math.Ceiling(N / nThreads)

        Dim start As Date = DateAndTime.Now

        GPU.Launch(nBlocks, nThreads, "Calculate", dev_a, dev_b, dev_c)

        Dim elapsed As TimeSpan = start - DateAndTime.Now

        RaiseEvent PublishResult(String.Format("Device {0} / {1} / {2}MHz: {3}s{4}ms", device, GPU.GetDeviceProperties.Name, GPU.GetDeviceProperties.ClockRate, elapsed.Seconds, elapsed.Milliseconds))

        ' Copy the array 'c' back from the GPU to the CPU

        GPU.CopyFromDevice(dev_c, c)

        ' Free the memory allocated on the GPU

        GPU.Free(dev_a)
        GPU.Free(dev_b)
        GPU.Free(dev_c)

    End Sub

    <Cudafy()>
    Public Shared Sub Calculate(ByVal thread As GThread, ByVal a() As Double, ByVal b() As Double, ByVal c As Double())

        Dim i As Integer = index(thread.threadIdx.x, thread.blockDim.x, thread.blockIdx.x)

        If (i < a.Length) Then c(i) = a(i) + b(i)

    End Sub

    <Cudafy()>
    Public Shared Function index(ByVal tid As Integer, bdi As Integer, ByVal bid As Integer) As Integer
        Return tid + bid * bdi
    End Function

End Class
Jan 9, 2015 at 1:50 PM
I think it would be a good idea to automatically cast any double to single when the device doesn't support doubles, the same way it seems to be done in CUDA.
Jan 9, 2015 at 2:03 PM
OpenCL only compiles during the LoadModule, so yes the error is from OpenCL in combination with your target card.
As for your request: You would indeed think that OpenCL / AMD would do this on the fly but alas no. With CUDA we are relatively spoiled - spend a day in pure OpenCL world and one quits complaining about any CUDA headaches!
Jan 9, 2015 at 2:18 PM
Luckily I won't have to deal with OpenCL for much longer. I am looking forward to try out the geforce with CUDA.
Thanks for your help!