Non-primitive/-blittable data in kernel argument list

Dec 19, 2011 at 8:28 PM
Edited Dec 19, 2011 at 8:57 PM

Encountered upon kernel launch:

ArgumentException was unhandled: "Object contains non-primitive or non-blittable data."

It seems this struct is at fault:

 

[Cudafy]
struct SimConfig
{
	public bool DoMur;
}

After half an hour or so of fiddling... is bool not supported!? +___+

Coordinator
Dec 21, 2011 at 10:10 AM

Boolean is not a blittable type!

Dec 29, 2011 at 9:09 AM

A byte should be the best blittable replacement for booleans. As an alternative passing a bool[] seems to work fine as a parameter to the launch function.

Sep 5, 2012 at 4:11 AM
Edited Sep 5, 2012 at 3:01 PM


What about this type?  I get this same error while trying to convert an Array of Structure program to Structure of Arrays.  I am under the impression this might improve my CUDA performance.  These are all primitive types aren't they? 

    [Cudafy]
    public struct MarketData
    {
        public int Length;
        public float[] Open;
        public float[] Close;
        public float[] High;
        public float[] Low;
        public int[] Volume;
        public uint[] Time;
    } 

The same types work when not arrays (that's how the program runs now), but I do want to see if there is a performance increase using SOA.  I have even tried giving these all fixed intializers, but no luck.

Any ideas?

Sep 5, 2012 at 6:04 PM

I needed to do the same thing and ended up writing some code to construct a new type by reflection with arrays replaced by IntPtrs (which are blittable). Your example would yield the equivalent of:

    public struct MarketDataForDevice
    {
        public int Length;
        public IntPtr Open;
        public IntPtr Close;
        public IntPtr High;
        public IntPtr Low;
        public IntPtr Volume;
        public IntPtr Time;
    } 

I then allocated arrays on the device in the usual way and then for each device array I got an IntPtr from the corresponding DevicePtrEx (using the Pointer member). An instance of the new type is then created and the IntPtrs are assigned. The instance is then sent via an argument in the kernel launch in the usual way (this is dynamic so no problem there).

It seems to work OK (since everything is Plain Old Data) and is very flexible (allowing sharing of arrays between structs for example). However I would also be interested in seeing if there is a better way. If not, I could certainly neaten up my code and make this available? I suspect there is a nicer way to do the same thing (perhaps by serializing the host struct). I would also be very interested in any better ideas to do the same thing therefore!

 

Coordinator
Sep 6, 2012 at 10:41 AM

Yes, a struct with arrays in it is not blittable according to .NET unless they are fixed size.  Here's an example from the Cudafy.Host.UnitTests project:

    [Cudafy]
    [StructLayout(LayoutKind.Sequential, Size=80, CharSet = CharSet.Unicode)]
    public unsafe struct PrimitiveStruct
    {
        public int Value1;
        public int Value2;
        public int Value3;
        public int Value4;
        public fixed sbyte _message[32];
        public fixed char _messageChars[16];
        [CudafyIgnore]
        public string Message
        {
            get
            {
                fixed (char* ptr = _messageChars)
                {
                    string ts = new string(ptr);
                    return ts;
                }
            }
            set
            {
                fixed (char* srcptr = value)
                {
                    fixed (char* dstptr = _messageChars)
                    {

                        IntPtr src = new IntPtr(srcptr);
                        IntPtr dst = new IntPtr(dstptr);
                        GPGPU.CopyMemory(dst, src, (uint)Math.Min(32, value.Length * 2));
                    }
                }
            }
        }

If you can share, Joe, I'd be keen to see how you are handling things.

Cheers,

Nick

Sep 6, 2012 at 9:03 PM
Edited Sep 6, 2012 at 9:07 PM

Hi Nick,

Ended up pasting large chunks of code in the discussion(!), but here is a example:

 

namespace Framework
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Cudafy;
    using Cudafy.Host;
    using Cudafy.Translator;

    [Cudafy]
    public struct TestStruct
    {
        public float[] Test1;
        public int Count;
        public float[] Test2;
        public int[] Test3;
        public uint[] Test4;
        
        public float Calculate()
        {
            float test = 0;
            for (int i = 0; i < Count; ++i)
            {
                test += Test1[i] + Test2[i] + (float)Test3[i] + (float)Test4[i];
            }
            return test;
        }
    }

    public class StructWithArraysTest
    {
        public static void Execute()
        {
            CudafyModule km = CudafyTranslator.Cudafy(
                typeof(TestStruct), typeof(StructWithArraysTest));
            GPGPU gpu = CudafyHost.GetDevice(CudafyModes.Target, 0);

            gpu.LoadModule(km);

            float[] result = gpu.Allocate<float>(1);

            TestStruct testStruct = new TestStruct();
            testStruct.Test1 = new float[3] { 12, 48, 32 };
            testStruct.Test2 = new float[3] { 11, 17, 31 };
            testStruct.Test3 = new int[3] { 2, 3, 4 };
            testStruct.Test4 = new uint[3] { 5, 6, 7 };
            testStruct.Count = 3;

            object testStructObj = (object)testStruct;
            var instanceDevice = DeviceStructCreator.CreateDeviceObject(ref testStructObj, gpu);

            gpu.Launch(1, 1).kernel(instanceDevice, result);

            float[] resultHost = new float[1];
            gpu.CopyFromDevice(result, resultHost);

            // asset that resultHost[0] is testStruct.Test1.Sum() + testStruct.Test2.Sum() + testStruct.Test3.Sum() + 18 = 178
            Console.WriteLine(resultHost[0].ToString());
        }

        [Cudafy]
        public static void kernel(GThread thread, TestStruct[] testStruct, float[] result)
        {
            float tempResult = testStruct[0].Calculate();
            result[0] = tempResult;
        }
    }
}

 

and here is the helper to create the struct:

namespace Framework
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Reflection;
    using System.Reflection.Emit;
    using Cudafy.Host;

    /// <summary>
    /// A helper to create structs that can reference GPU device arrays and which can be transferred to the device.
    /// This can be used to create a struct of arrays.
    /// </summary>
    public class DeviceStructCreator
    {
        /// <summary>
        /// Send a struct containing arrays to the device.
        /// </summary>
        /// <param name="hostObject"></param>
        /// <param name="gpu"></param>
        /// <returns></returns>
        public static object CreateDeviceObject(ref object hostObject, GPGPU gpu)
        {
            List<FieldInfo> hostObjectArrayFields;
            List<FieldInfo> hostObjectBlittableFields;
            Type deviceType = CreateDeviceType(hostObject.GetType(), out hostObjectArrayFields, out hostObjectBlittableFields);
            
            // The device struct: 
            var deviceObject = Activator.CreateInstance(deviceType);
            
            // A single element array container for the struct so we can use the existing API:
            var deviceObjectArray = Array.CreateInstance(deviceType, 1);

            // Copy hostObject arrays to the device and set pointers to these on the deviceObject.
            AssignArrays(gpu, ref hostObject, ref deviceObject, hostObjectArrayFields);
            // Copy over everything else, assuming these are blittable.
            AssignBlittableFields(gpu, ref hostObject, ref deviceObject, hostObjectBlittableFields);

            deviceObjectArray.SetValue(deviceObject, 0);
            
            // Finally, copy the deviceObject to the device.
            var copy1DArrayToDevice = Copy1DArrayToDevice(deviceType);
            var instanceDevice = copy1DArrayToDevice.Invoke(gpu, new object[] { deviceObjectArray });

            return instanceDevice;
        }

        private static MethodInfo Copy1DArrayToDevice(Type arrayElementType)
        {
            var methods = typeof(GPGPU).GetMethods().Where(t => t.Name == "CopyToDevice" && t.GetParameters().First().ParameterType.Name == "T[]");
            var method = methods.First();
            return method.MakeGenericMethod(new Type[] { arrayElementType });
        }

        /// <summary>
        /// Create struct type than can contain pointers to device arrays.
        /// </summary>
        /// <param name="hostType"></param>
        /// <returns></returns>
        private static Type CreateDeviceType(Type hostType, out List<FieldInfo> arrayFields, out List<FieldInfo> hostObjectBlittableFields)
        {
            // We are assuming that the device class is plain old data (POD).
            // We further assume (can be relaxed later) that all fields will be mapped to pointers in CUDA
            // and either pointers to arrays or pointers to pointers.

            AssemblyName assemblyName = new AssemblyName("DynamicAssembly");
            AssemblyBuilder ab =
                AppDomain.CurrentDomain.DefineDynamicAssembly(
                    assemblyName,
                    AssemblyBuilderAccess.RunAndSave);

            ModuleBuilder mb =
                ab.DefineDynamicModule(assemblyName.Name, assemblyName.Name + ".dll");

            TypeBuilder tb = mb.DefineType("DynamicType", TypeAttributes.Public |
                TypeAttributes.Sealed | TypeAttributes.SequentialLayout |
                TypeAttributes.Serializable, typeof(ValueType));

            var fields = hostType.GetFields();

            arrayFields = new List<FieldInfo>();
            hostObjectBlittableFields = new List<FieldInfo>();

            foreach (var field in fields)
            {
                if (field.FieldType.IsArray)
                {
                    tb.DefineField(field.Name, typeof(IntPtr), FieldAttributes.Public);
                    arrayFields.Add(field);
                }
                else
                {
                    tb.DefineField(field.Name, field.FieldType, FieldAttributes.Public);
                    hostObjectBlittableFields.Add(field);
                }
            }

            return tb.CreateType();
        }

        /// <summary>
        /// Copy the arrays of the hostObject to the device and set pointers to these on the deviceObject.
        /// </summary>
        /// <param name="hostObject"></param>
        /// <param name="deviceObject"></param>
        /// <param name="arrayName"></param>
        private static void AssignArrays(GPGPU gpu, ref object hostObject, ref object deviceObject, List<FieldInfo> hostObjectArrayFields)
        {
            var deviceObjectFields = deviceObject.GetType().GetFields();

            foreach (FieldInfo hostObjectField in hostObjectArrayFields)
            {
                var deviceObjectField = deviceObjectFields.Where(f => f.Name == hostObjectField.Name).FirstOrDefault();
                if (deviceObjectField == null) throw new ArgumentException("Field not found.");
                
                // Get array and copy to the device.
                Array hostArray = (Array)hostObjectField.GetValue(hostObject);
                var copy1DArrayToDevice = Copy1DArrayToDevice(hostArray.GetType().GetElementType());
                
                // Insert a pointer to the array on the device into the deviceObject. 
                var deviceArray = copy1DArrayToDevice.Invoke(gpu, new object[] { hostArray });
              
                deviceObjectField.SetValue(deviceObject, gpu.TryGetDeviceMemory(deviceArray).Pointer);
            }
        }

        /// <summary>
        /// Copy blittable members to the deviceObject.
        /// </summary>
        /// <param name="gpu"></param>
        /// <param name="hostObject"></param>
        /// <param name="deviceObject"></param>
        /// <param name="hostObjectBlittableFields"></param>
        private static void AssignBlittableFields(GPGPU gpu, ref object hostObject, ref object deviceObject, List<FieldInfo> hostObjectBlittableFields)
        {
            var deviceObjectFields = deviceObject.GetType().GetFields();

            foreach (FieldInfo hostObjectField in hostObjectBlittableFields)
            {
                var deviceObjectField = deviceObjectFields.Where(f => f.Name == hostObjectField.Name).FirstOrDefault();
                if (deviceObjectField == null) throw new ArgumentException("Field not found.");

                object blittableField = hostObjectField.GetValue(hostObject);

                deviceObjectField.SetValue(deviceObject, blittableField);
            }
        }
    }
}

Cheers,

Joe

Mar 23, 2013 at 7:15 AM
Hi Joe,
This is working pretty well for me (I think), but I'm not sure how to get my array back off the device now. I really want to process the vector and get the results back. I guess I could use a second vector and store them there, but I'm not 100% sure. Any thoughts?
-Dan
Mar 24, 2013 at 10:40 PM
Hi Dan,

If you keep a reference to the device memory of your arrays (the arrays that form the struct of arrays), then you can copy back onto the host using the usual Cudfafy methods. You should be able to get the reference by storing off each 'deviceArray', from this line
var deviceArray = copy1DArrayToDevice.Invoke(gpu, new object[] { hostArray });
and you should be able to use these in a call to:
gpu.CopyFromDevice in the usual Cudafy way.

Cheers,
Joe