Calling Cuda functions from C#

This is a demonstration of creating a C# wrapper for a Cuda function.

The example Cuda function is ‘invertImageCuda()’ and it is contained in a Cuda dll called ‘image_processor.dll’. This dll file must exist in the same directory as the C# exe or in the path.

The C# File

In a C# file, create a C# entry point called ‘Invert()’. This entry point is a standard C# function and can be passed in any complex C# object type.

    /// <summary>
    /// Takes an array of float values, assumed to be pixels ranging from 0,1. Applies 'pixel = 1 - pixel' to all pixels in parallel Cuda operations.
    /// Original array is un-changed, inverted image is returned in a new array. 
    /// </summary>
    /// <param name="SrcPixels"></param>
    /// <param name="srcWidth"></param>
    /// <param name="srcHeight"></param>
    /// <returns></returns>
    public static float[] Invert(float[] SrcPixels, int srcWidth, int srcHeight)
    {
        float[] DstPixels = new float[srcWidth * srcHeight];

        unsafe
        {
            GCHandle handleSrcImage = GCHandle.Alloc(SrcPixels, GCHandleType.Pinned);
            float* srcPtr = (float*)handleSrcImage.AddrOfPinnedObject();

            GCHandle handleDstImage = GCHandle.Alloc(DstPixels, GCHandleType.Pinned);
            float* dstPtr = (float*)handleDstImage.AddrOfPinnedObject();

       
            // call a local function that takes c style raw pointers
            // this local function will in turn call the Cuda function
            invert(srcPtr, dstPtr, srcWidth, srcHeight);

            handleSrcImage.Free();
            handleDstImage.Free();
            GC.Collect();
        }
        return DstPixels;
    }

The ‘unsafe’ block tells C# that we are intentionally using raw c-style pointers. In the Visual Studio project properties editor, we must also check the box that allows un-safe code.

The GCHandle.Alloc() call creates a pinned pointer to a float[] so that the garbage collector cannot move the memory while the Cuda program is accessing it. We need to create a pinned pointer (GCHandle) for both the source and destination arrays.

The AddrOfPinnedObject() returns the pinned pointer that was allocated in the Alloc() function. We need c-style raw pointers to pass into the Cuda program.

A local function, invert(), will be called passing in only simple objects of pointers and int-s.

In the same C# file, create the Cuda wrapper function:

        [DllImport("image_processor.dll")]
        unsafe static extern int invertImageCuda(float* src, float* dst, Int32 width, Int32 sheight); 
        unsafe static int invert(float* src, float* dst, Int32 width, Int32 height)
        {
            return invertImageCuda(src, dst, width, height);
        }

The DllImport() line must be immediately above the Cuda function extern declaration and tells the compiler to look for invertImageCuda() in the dll.

The ‘invert()’ function is a local static, unsafe, function that accepts raw c-style pointers and then calls into the Cuda function, returning the value returned from Cuda (which is a success/error int value). The dst pointer is used by the Cuda function as the location to write the output values.

The Cuda File

In a separate Cuda file, in Cuda dll project, create the entry point:

//invertimage.h
#ifndef INVERTIMAGE_H
#define INVERTIMAGE_H
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

// public


#ifdef __cplusplus
extern "C" {
#endif

#define CUDA_CLIB_EXPORTS
#ifdef CUDA_CLIB_EXPORTS
#define CUDA_CLIB_API __declspec(dllexport) 
#else
#define CUDA_CLIB_API __declspec(dllimport) 
#endif

    CUDA_CLIB_API cudaError_t invertImageCuda(float* src, float* dst, unsigned int width, unsigned int height);

#ifdef __cplusplus
}
#endif

//private

__global__ void invertImageKernel(float* src, float* dst, unsigned int width, unsigned int height);



#endif

This c header will not be read or used by the C# program, but rather, the C# compiler will rely on the invertImageCuda() matching declaration in the C# file. But this header with the CUDA_CLIB_API __declspec(dllexport) will tell the Cuda build to export this function as a public function. The CUDA_CLIB_EXPORTS preprocessor variable is defined locally because the cuda compiler of invert.cu will be the only compiler to see this code.

Comments are closed.