NVIDIA DeepStream for highly performant video stream analytics

If, like me, you have ever been experimenting with applying deep learning models for video stream processing to perform object detection or scene classification, you will probably run into approaches where Python scripts are used to extract each individual frame from a video file in a loop and feed each of the frames to a model individually. In this blog post I will present a much more performant approach based on NVIDIA's DeepStream toolkit.

If you have read my blog post about detecting riders in a Tour de France stage you will recognize the same approach.

In my search for a better performing solution I came across NVIDIA’s DeepStream solution https://developer.nvidia.com/deepstream-sdk which allows you to utilize the performance of your NVIDIA GPU much better while retaining the options to use your own models and have the flexibility of Python integration. As I did not find that much information to get started on your own machine using a Docker image, I decided to write up the steps I took to get this working.

Installing Docker and DeepStream image

I recommend using the NVIDIA-provided Docker image to get started with DeepStream on your system as this bundles all dependencies in a single image. For NVIDIA’s instructions please refer to the following link: https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_docker_containers.html.

NVIDIA covers many platforms in their documentation at once, focusing on their Jetson single-board computers. For the purposes of this blog post I will be using a standard PC with an NVIDIA GPU running on Ubuntu.

I had the most success with installing Docker from the original developers as below. Do not install Docker from apt or snap or another Ubuntu source. Run the below from your home directory:

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh ./get-docker.sh

After installing Docker we need to install the NVIDIA Container Toolkit which allows a Docker image to utilize a locally running GPU. You may refer to the documentation at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

For Ubuntu, execute the steps for installation with Apt. First add the NVIDIA repository:

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Now update Apt as usual:

sudo apt-get update

And install the toolkit:

sudo apt-get install -y nvidia-container-toolkit

At this point you’ll require an NVIDIA NGC account from https://ngc.nvidia.com if you haven’t set this up before. After creating this and logging in, create an API key by clicking on your name at the top-right, going into Setup -> Generate API Key.

Now log in using your API key:

docker login nvcr.io

Enter the below credentials. Note: literally use $oauthtoken as your username, do not replace this. Do replace the API key with the key you have created (and stored safely for later usage).

a. Username: "$oauthtoken"
b. Password: "YOUR_NGC_API_KEY"

At this point we are finally ready to pull the Docker container from the NVIDIA repository using:

docker pull nvcr.io/nvidia/deepstream:7.1-triton-multiarch

Running the Docker image

Before the Docker container is able to output to your display, you’ll need to tell it to use the same graphical display where your current Ubuntu desktop is also displayed by typing:

export DISPLAY=:1
xhost +

To be able to exchange files between your host operating system and the Docker image I suggest creating a specific folder in your home directory. In my example this is called /home/dkemper/deepstream_files.

Now run the DeepStream Docker image as follows:

docker run -it --rm --net=host --gpus all -e DISPLAY=$DISPLAY --device /dev/snd -v /tmp/.X11-unix/:/tmp/.X11-unix -v /home/dkemper/deepstream_files:/opt/nvidia/deepstream/local_files nvcr.io/nvidia/deepstream:7.1-triton-multiarch

A few flags are important here:

The -e flag allows the Docker container to graphically output to your display
The second -v flag maps a host OS folder to the /opt/nvidia/deepstream/local_files folder in the container, allowing to exchange models and output between the container and your OS
The --gpus all flag allows the Docker container to use the host’s GPUs

Testing the DeepStream framework

At this point you should be in a running DeepStream container. The container bundles quite a few examples, making a quick test straightforward.

cd /opt/nvidia/deepstream/deepstream/samples/configs/deepstream-app
deepstream-app -c source30_1080p_dec_infer-resnet_tiled_display_int8.txt

Note: running this example when just having started the Docker container may take 1-2 minutes depending on your hardware. Just ignore the many warnings and errors displayed at this point and be patient.

This demo should display a Resnet network running on a video stream, performing tracking of the objects on the stream. Individual frames of the streams are displayed in a tiled way, indicating the way DeepStream is able to process many frames in advance of a video stream.

TAO framework

The TAO framework allows you to apply custom models which you have trained from scratch or are finetuned versions of other models and integrates nicely with DeepStream. This allows you to use DeepStream for your specific object or scene detection use case. In a later post I will provide a worked example of a custom model running on DeepStream for a specific application. To limit the scope of this manual however, I’ll show how to use the TAO sample apps for now.

The DeepStream container expects the sample apps in an already created folder, navigate there:

cd /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps

Now clone the sample apps repository here:

git clone https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps

Copy config files to the correct folder:

cp /opt/nvidia/deepstream/deepstream/sources/apps/sample_apps/deepstream_tao_apps/deepstream_app_tao_configs/* /opt/nvidia/deepstream/deepstream/samples/configs/tao_pretrained_models/ -r

Download the models:

cd /opt/nvidia/deepstream/deepstream/samples/configs/tao_pretrained_models
./download_models.sh

You are now able to run the downloaded TAO models with the configuration files from the repository, for example:

deepstream-app -c deepstream_app_source1_trafficcamnet.txt

This runs a TrafficCamNet which is able to detect cars, persons, road signs and bicycles on the bundled video example video stream.

Other computer vision models contained within TAO are documented at the following link and allow for some very interesting experiments: https://docs.nvidia.com/tao/archive/5.3.0/text/model_zoo/cv_models/index.html

Modifying the TAO configuration

By modifying one of the bundled configuration files you will be able to apply the detection network on another video stream and write it to a file in parallel. I will be using the TrafficCamNet network on a video file I have used in my Tour de France stage analysis.

Open file deepstream_app_source1_trafficcamnet.txt in a text editor and modify the following:

Under section [source0] update the input video stream to be your own file, e.g. file:///opt/nvidia/deepstream/local_files/tdf_2022_stage_13_example2.mp4

Note you have to use an URI notation here. Remember this folder is mapped to folder deepstream_files in your home directory on the host OS.

Now update section [sink1] as follows:

enable=1
output-file=/opt/nvidia/deepstream/local_files/out.mp4

This will write the output to out.mp4 in $HOME/deepstream_files on your host OS. An example of this for my file looks as follows:

Notice that the video file is written to disk at the same time of displaying the contents to the screen in real time without any slowdown or frame drops. This is much faster than the Python-based approach I used for my Tour de France stage analysis!

Next steps

Although quite a few steps have been discussed at this point, I am still making use of the standard deepstream-app demo application bundled with the DeepStream Docker container. It would be interesting to see how to leverage Python at this point, for instance for recording a log of object occurrences at certain timepoints. It needs to be shown if DeepStream’s performance holds up equally great when using custom scripts in Python.

Another experiment would be using a custom finetuned model with the TAO framework to see how that holds up. I will be documenting these next steps in a future blog post and update this post when ready.

Wrapup

In this blog post I have shown that DeepStream provides well-performing and real-time object classification and tracking capabilities on video streams, easily being able to process a 30 fps video which is both output to the screen and to a file on disk in real time. The TAO plugin architecture should be using custom models, either trained from scratch of finetuned from another model easy to integrate with DeepStream.

Dirk Kemper 2025-01-05
DeepStream TAO