Exploring NVIDIA Jetson Orin Nano Super Mode performance using Generative AI

eduardosalazar40

Jan 156 min read

NVIDIA Jetson Orin Nano Super Mode using Generative AI

Introduction on NVIDIA Jetson Super Mode

RidgeRun is always looking for the newest technologies to improve the service that we offer to our clients and the community by staying ahead of industry trends. So, in this post, we are exploring the new NVIDIA Jetson Orin Nano™ Super Developer Kit and its potential by running some NVIDIA’s Jetson Generative AI examples.

This post explores the Text Generation and Image Generation NVIDIA Jetson AI Lab tutorials in a Jetson Orin Nano with JetPack 6.1 which is the latest JetPack 6 version delivered by NVIDIA as of the date of writing this post and includes the Orin Super Developer Kit support. You’ll find information regarding the CPU, GPU and memory performance metrics when executing the examples in 15 watts power mode versus running the Generative AI at the MAX power mode, that means, Super Developer Kit mode. The reader will be able to check CPU, GPU and memory performance metrics when running the examples using the 15 watts power mode versus the max power mode which operates at 25 watts.

Results summary

The Jetson Orin Nano Super Developer kit mode capabilities, testing environment, instructions on how to enable the NVIDIA Jetson Orin Nano Super Developer kit mode and run the NVIDIA generative AI tutorials can be found in our developer’s wiki

In addition, the performance metrics’ details can be taken from the figures in the next sections, which are summarized in the following bullet points:

The duration of the text generation output when using the Super Developer kit mode decreased by 26.74%, while the tokens/s increased by 31.38% compared with the 15 watt mode.
The GPU's average usage increased by 1.25% when using the MAX power mode, and its frequency during the inference reached a maximum value of 1015 MHz. Meanwhile, the CPU and memory usage decreased by 0.2% and 15.23% on average, respectively, when testing the text generation example.
The duration taken for the generation of the image when using the Super Developer kit mode decreased by 34.37% while the iterations/s increased by 34.63% in comparison with the 15 watts mode.
For image generation, the GPU average usage increased by 5.07% when using the MAX power mode in comparison with the 15 watts mode and its frequency during the inference reached a maximum value of 1016 MHz. The CPU increased by 1.36% while the memory usage decreased by 0.37% on average.

Results and Performance

One of the main purposes of this post is exploring the performance that can be achieved for generative models when using the NVIDIA Jetson Orin Nano Super Developer Kit and comparing them with the 15 watts mode. There are plenty of metrics that can be extracted from these executions, the ones measured here are:

CPU usage
GPU usage
RAM usage
Tokens/s or iterations/s

The first three parameters are taken by parsing the NVIDIA tegrastats utility and plotting the measurements to have a better understanding of resource consumption. The execution time is reported by the application itself.

Text Generation with ollama client

The text generation model was tested on its capabilities for generating a response on what approach to follow to prevent race conditions in an application. The following prompt was used:

“What approach would you use to detect and prevent race conditions in a multithreaded application?”

Here are some questions that the user can ask the AI model.

15 watts mode results

This request took 49.97 seconds to be completed and the output was generated at the rate of 9.84 tokens/s when using the 15-Watts mode

This same execution was used to measure performance and profile resource usage for the model on the Jetson. The resources of interest are GPU, CPU and RAM consumption. The following plots show a summary of the performance achieved by the model.

GPU, CPU and Memory resource usage for text generation in 15 watts mode

These plots show the impact on CPU, GPU, and memory usage when the model starts; on the GPU, usage reached an average of 93.88% during inference. On the CPU, however, we see a smaller increase with an average usage of 4.45%, with some spikes during inference execution. Regarding memory usage, the average was 86.35% and we see no significant change with a variation of 2% approximately during the inference; this is caused by the model being already loaded beforehand.

MAX mode results

When using the MAX mode, the request output was generated at the rate of 14.36 tokens/s and took 36.61 seconds to be accomplished.

The resource usage plots that summarize the performance are the following:

GPU, CPU and Memory resource usage for text generation in Super mode

The GPU usage reached an average of 96.41% during inference. On the CPU side, we see an average usage of 3.74%. Regarding memory usage, the average was 72.75% and the usage increased with a variation of 1.75% approximately during the inference, this is caused by the model being already loaded beforehand.

We can see, that the duration taken for the text generation of the output when using the Super Developer kit mode decreased by 26.74% while the tokens/s increased by 31.38% in comparison with the 15 watts mode.

On the resource consumption side, the GPU average usage increased by 2.53% when using the MAX power mode and the GPU reached a frequency of 1015 MHz. While the CPU and the memory usage decreased by 0.71% and 13.6% on average, respectively.

Image Generation

The Image Generation generates an image based on a text input prompt. Here is the prompt used to create the following image and measure resource consumption:

Futuristic city with sunset, high quality, 4K image

15-Watts mode results

For the example executed in this mode, the model was loaded in 12.8s, the image was generated in 16 seconds with a rate of 1.34 iterations/s.

The performance and the profile resource usage for the model were measured during this same execution on the Jetson, the following plots show a summary of the performance achieved by this power mode.

GPU, CPU and Memory resource usage for image generation in 15 watts mode

These plots show the CPU, GPU and memory percentage resource usage when the tutorial is run. The GPU usage reached an average of 93.93% during inference. On the CPU side, the average was 43.94%. Regarding memory, average usage was 81.13% with an increase of 5.2%, this is caused by the model being already loaded beforehand.

MAX mode results

On the Super Developer Kit mode side, the model was loaded in 9.4s, the image was generated in 10.5 seconds at a rate of 2.05 iterations per second.

The plots that summarize the resource consumption are shown as follows.

GPU, CPU and Memory resource usage for image generation in Super mode

The GPU usage reached an average of 99% during inference. On the CPU side, the average was 45.3%. The memory usage increased by 1% approximately, this is caused by the model being already loaded beforehand, while the average usage was 80.76%.

We can see that the duration taken for the generation of the image when using the Super Developer kit mode decreased by 34.37% while the iterations/s increased by 34.63% in comparison with the 15 watts mode.On the resource consumption side, the GPU average usage increased by 5.07% when using the MAX power mode in comparison with the 15 watts mode and its frequency reached a maximum value of 1016 MHz. The CPU increased by 1.36% while the memory usage decreased by 0.37% on average, respectively.

Conclusions

With the results generated in this blog, the NVIDIA Jetson Orin Nano Super Developer Kit was enabled with success and some Generative AI examples were executed.

On the metrics performance side, when using the Super Developer Kit mode, the text generation duration and the tokens/s had an improvement of 26.74% and 31.38% respectively. While the GPU usage increased by 1.25% with a maximum frequency of 1015 MHz reached during the inference. However, the CPU and memory usage decreased by 0.2% and 15.23% on average respectively.

For the image generation example in the Super Developer Kit mode, the iterations/s and generated process duration had an improvement of 34.63% and 34.37% respectively. The GPU average usage increased by 5.07% with a maximum frequency of 1016 MHz reached during the inference. Also, the CPU increased by 1.36% while the memory usage decreased by 0.37% on average.

Finally, at RidgeRun we see an improvement when enabling the NVIDIA Jetson Orin Nano Super Developer Kit Mode and running the Jetson Generative AI examples. There are still a lot more things to explore in depth to adapt this technology to custom applications, but this survey allows us to explore the new NVIDIA Jetson Orin Nano capabilities. Watch out for more blogs on the topic and technical documentation in our developer’s wiki.

Exploring NVIDIA Jetson Orin Nano Super Mode performance using Generative AI

Introduction on NVIDIA Jetson Super Mode

Results summary

Results and Performance

Text Generation with ollama client

15 watts mode results

MAX mode results

Image Generation

15-Watts mode results

MAX mode results

Conclusions

References

Related Posts

ENGINEERING SERVICES

SUPPORTED SOC & PLATFORMS

RESOURCES

WORK WITH US

BUSINESS INFORMATION