Archive for NVIDIA Jetson

Porting a Keras trained model to Jetson Xavier

I have been training resnet models on my Windows desktop machine with big GPUs. Now I need to deploy these models on a Jetson Xavier for real-time predictions.

My model format historically has been the Keras .h5 file. So now I need to convert the .h5 model file into the onnx exchange format.

There are two steps:

1) convert the .h5 to a frozen .pb format – this step is done on the desktop machine

2) convert the .pb format to the onnx format – this step is done on the Jetson.

On the PC

# python 

from keras.models import load_model

# you can either load an existing h5 model using whatever you normally use for this
#  purpose, or just train a new model as save using the template below

model = load_model('model.h5',custom_objects=models.custom_objects())

# save in pb froozen pb format, this format saves in  folder 
#  and you are supplying the folder name

model.save('./outputs/models/saved_model/')  # SavedModel format

After running this python script, the folder saved_model will contain:

On Jetson

Assuming the tensorflow2 (Jetpack-5.0) runtime has already been installed on the Jetson, and python3 with pip3, follow the install procedure in the tensorflow-onnx project, but use pip3:

https://github.com/onnx/tensorflow-onnx

pip3 install onnxruntime
pip3 install -U tf2onnx

Then copy the entire folder saved_model folder from the PC to the Jetson workspace.

From a command line in Jetson, run the following command:


python3 -m tf2onnx.convert --saved-model path_to/saved_model --output model.onnx

Now, in the location where the command was run, there will be new file called model.onnx

Comments off

NVIDIA 2D Image and Signal Performance Primitives (NPP)

Based on the lack of examples and discussion in the forums, I assume the NPP are under-utilized and under-appreciated.  Since I discovered these, it has been a game changer for me in my image processing work. Since machine vision camera resolutions are now at 12 Mega-pixels and higher, its required to accelerate processing with a GPU. No longer do I need to create many of my own Cuda algorithms for 2D image processing – many of them already exist.

For example, resizing an image (x,y re-scale) is fully supported on any pixel data type and with multiple filter types, all accelerated with Cuda parallel operations (see my post and example project on an image resize implementation here).

The NVIDIA documentation is a bit sparse, the shear number of functions and sub-libraries are daunting. I suggest starting with this page.

https://docs.nvidia.com/cuda/npp/modules.html

Within this page, open the topics and drill down, I think you will be impressed with the number of Cuda functions available.

Comments off

Save Framos IMX 253 images as Jpeg on NVIDIA Jetson

I was recently working on a demonstration of a Framos IMX 253 mono-chrome camera with a 12-bit sensor supported on a Jetson Xavier. For the demo, I needed to save the images as jpeg format. I thought it may be useful for others to see the inner-workings of a jpeg compression implementation Jetson, using the hardware assisted jpeg compressor.

The image came from the camera driver as 12 bit format in 16 bit integer array format. The sensor is mono-chrome, but the Jetson jpeg compressor only takes a single YUV format as input.

For this demo, I skip the step of re-mapping the luminance values from a 12 bit range to an 8 bit range and simply take the lower 8-bits. A production implementation needs a scheme for this re-mapping of luminance range. The implementation on a Jetson should take advantage of the hardware assist.

Step 1: get things setup


// Info is a structure provided from the caller that contains info about the frame 

// prepare to time execution of the jpeg compression    
auto start = std::chrono::steady_clock::now();

// prepare the output file
std::string outFile="/path/to/file.jpg";
std::ofstream* outFileStr = new std::ofstream(outFile);
if(!outFileStr->is_open())
        return false;

// create an instance of the nvidia jetson jpeg encoder

NvJPEGEncoder* jpegenc = NvJPEGEncoder::createJPEGEncoder("jpenenc");

// the jpeg output buffer size is 1.5 times the width*height 
unsigned long out_buf_size = Info.Width * Info.Height * 3 / 2;
unsigned char *out_buf = new unsigned char[out_buf_size];

Step 2: create an nvidia native buffer

// V4L2_PIX_FMT_YUV420M =  is the only format which appears to be supported by the Jetson jpeg encoder

// allocate the buffer    
NvBuffer buffer(V4L2_PIX_FMT_YUV420M, Info.Width, Info.Height , 0);

buffer.allocateMemory();

NvBuffer::NvBufferPlane* plane = &buffer.planes[0];

//convert the image luminance from uint16 to 8 bits and copy into the nvidia buffer
for(int y=0; y < Info.Height;y++)
{
    for(int x=0; x < Info.Width;x++)
    {
        plane->data[x+(y*plane->fmt.stride)] = (unsigned char) (m_img[x+(y*Info.Width)]);
    }
}

plane->bytesused = 1 * plane->fmt.stride * plane->fmt.height;

Step 3: the Framos camera driver provides a mono-chrome image, so make the image actually mono-chrome by setting the UV vectors to neutral color (127d).

 // initialize the Cb plan to the 127d value which means 0 color

plane = &buffer.planes[1];
char* data = (char *) plane->data;
plane->bytesused = 0;
for (int j = 0; j < plane->fmt.height; j++)
{
    memset(data,127,plane->fmt.width);
    data += plane->fmt.stride;
}
plane->bytesused = plane->fmt.stride * plane->fmt.height;

// initialize the Cr plan to the 127d value which means 0 color

plane = &buffer.planes[2];
data = (char *) plane->data;
plane->bytesused = 0;
for (int j = 0; j < plane->fmt.height; j++)
{
    memset(data,127,plane->fmt.width);
    data += plane->fmt.stride;
}
plane->bytesused = plane->fmt.stride * plane->fmt.height;

Step 4: run the actual jpeg encode function, save the file, and measure the results:

jpegenc->encodeFromBuffer(buffer, JCS_YCbCr, &out_buf, out_buf_size, 95);

auto end = std::chrono::steady_clock::now();

outFileStr->write((char *) out_buf, out_buf_size);
outFileStr->close();

printf( "Jpeg Encode Elapsed time in nanoseconds: %d\n",std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count());

delete[] out_buf;
delete outFileStr;

Result:

I am seeing roughly 25 milliseconds for the encode and save on an image 3840 x 2160 pixels.

Comments off