top of page

SEI or In-band: Which Metadata solution do you need?

allannavarro1

Updated: Apr 29, 2024


screen showing formatted metadata

Introduction

Metadata is ubiquitous. Whether you need to add synchronized subtitles to a movie, GPS information to snapshots captured by a drone, or custom data to robot video recordings for ML inference, metadata injection/extraction is one of the most frequent requirements of our customers' multimedia/AI projects.


Over the years, RidgeRun has created several solutions to help our customers accelerate their time to market, two of which were specifically engineered to solve metadata requirements: GstSEIMetadata and In-band Metadata GStreamer plugins.


This article presents a guide to help understand and choose the metadata solution that will fit best your project requirements. The GstSEIMetadata and In-band Metadata sections explain the basic concepts and usage of each solution. The GstSEIMetadata vs In-band Metadata section presents the main limitations of each solution and provides a flowchart to help guide the choice according to your project requirements. In the Conclusion section, you can find a summary with the highlights of each option.


GstSEIMetadata

GstSEIMetadata is a GStreamer plugin developed by RidgeRun that provides elements to inject and extract metadata in H.264/H.265 encoded video streams. The metadata injection is performed according to a provision on the H.264 AVC and H.265 HEVC standards called “Supplemental enhancement information” or SEI messages. The SEI messages are a type of NAL unit.


NAL Units

Encoded video bytes in an H.264/H.265 stream are contained in the “Network Abstraction Layer” or NAL units. The structure of the NAL units is shown in Figure 1:


Diagram describing the structure of NAL units in H264 messages
Figure 1. Structure of NAL units.

There are different types of NAL units. SEI messages have a NAL unit type of 6 in the H.264 standard. You can learn more about GstSEIMetadata in our developer wiki.


Usage Example

The following subsections provide examples of how to inject and extract metadata using GstSEIMetadata.


Inject metadata

Figure 2 shows a diagram with the basic blocks needed in a GStreamer pipeline to inject metadata into a recording:

  1. A video source, such as a camera.

  2. An encoder (such as x264enc or nvv4l2h264enc).

  3. The seiinject element which is used to add the message in the encoded video.

  4. A muxer to store the video (such as mp4mux).

  5. A sink element that stores the video to a file or streams it through the network (with udpsink for example).

Block diagram showing GStreamer elements connected together to encode SEI metadata in a H264 stream
Figure 2. Injecting SEI metadata into the stream.

In concrete, the following GStreamer pipeline example will inject a "Hello World" string message into an H.264 sample video stream and store it as a video file with the mp4 container:

gst-launch-1.0 videotestsrc num-buffers=1000 ! 'video/x-raw,width=1280,height=720' ! x264enc ! seiinject metadata="Hello World" ! qtmux ! filesink location=testsei.mp4 -e


Extract metadata

The process for extracting the metadata stored within the H.264 stream is shown in Figure 3, the basic elements are:

  1. A recording file reader, such as filesrc. The input stream can also come from the network

  2. A demuxer, such as qtdemux.

  3. The seiextract element which pulls the metadata and decodes it back to text.

  4. A decoder, like avdec_h264.

  5. An element to display the video such as xvimagesink.


Block diagram showing GStreamer elements connected together to decode SEI metadata in a H264 stream
Figure 3. Extracting SEI metadata from the stream.

The following Python gist shows a concrete example of how to extract the encoded metadata:

When running this example pipeline, the image along with the encoded metadata will be displayed as in Figure 4.


A test pattern showing an overlay of decoded metadata
Figure 4. Decoded video and metadata

Use cases

Consider using GstSEIMetadata in the following cases:

  • Container-agnostic metadata is required. You can have any type of container that supports H.264/H.265 i.e. 3GP, MP4, TS, QuickTime, without having to worry about their particular method for storing metadata.

  • No particular metadata coding standard is needed.



In-Band Metadata

In-Band Metadata also allows the injection and extraction of metadata in your GStreamer-based application. It enables to process, record, or stream video and its associated metadata. Figure 5 shows the general block diagram of In-Band Metadata injection.


A block diagram showing the encoding process of metadata in a Transport Stream container
Figure 5. In-Band Metadata General Diagram.

In concrete, In-band Metadata augments the GStreamer mpegtsmux element to multiplex metadata along with video and audio streams into a MPEG Transport Stream (a standard container for video, audio, and metadata).


In some cases, the metadata needs to follow specific standards to enable consumer applications to extract it from a MPEG TS Stream. Often the metadata standards required are defined by the Motion Imagery Standard Board (MISB), whose mission is to maintain interoperability, integrity, and quality of motion imagery, associated metadata, audio, and others. Some of the standards defined by MISB are based on KLV encoding, such as the ST0601 (UAS Datalink Local Set). KLV is a data encoding standard, often used to transport metadata along with the video. The first chunk will indicate the Key (or data type), the second one will define the Length, and finally, the last bytes associated with the data itself (Value).


You can learn more about In-band Metadata in our developer wiki.


Usage Example

The following subsections provide examples of how to inject and extract metadata using In-band Metadata.


Inject metadata

An artificial source of metadata can be easily used to generate metadata and store it in a recording along with a video feed. Figure 6 shows the blocks and connections used to inject metadata into a recording file. The recording can be replaced with a streaming service over the internet, for example.


A block diagram showing the GStreamer elements needed to encode metadata in Transport Stream
Figure 6. Metadata injection using GStreamer In-Band metadata.

The block diagram of figure 6 can be translated into a GStreamer pipeline like the following:

gst-launch-1.0 -v metasrc metadata=%T period=1 ! 'meta/x-klv' !  mpegtsmux name=mux ! filesink location=metadata.ts videotestsrc is-live=true ! queue ! x264enc ! mux.

Extract metadata

Figure 7 shows how metadata extraction is done with GStreamer In-Band metadata. The video can be extracted from the container and displayed. Then, the metadata is extracted and parsed by a Sink element, with the ability to send the metadata as a signal back to the application.


A block diagram showing the GStreamer elements needed to decode metadata in Transport Stream
Figure 7. Metadata extraction using GStreamer In-Band metadata.

gst-launch-1.0 -v filesrc location=metadata.ts ! tsdemux name=demux demux. ! queue ! h264parse ! 'video/x-h264, stream-format=byte-stream, alignment=au' ! avdec_h264 ! autovideosink demux. ! queue ! 'meta/x-klv' ! metasink

Use case

The main use case of In-band Metadata is when you need a standard way to carry MISB compliant metadata within a Transport Stream container. This means that the stream would be compatible with most of the Transport Stream players for metadata extraction.



GstSEIMetadata vs In-band Metadata

Both solutions handle metadata injection/extraction to/from a video stream, however, the solution that better suits your project will depend on your specific requirements. The main factors to take into account are:


Playback

Some media player applications support the extraction of KLV metadata compliant with specific MISB standards from MPEG-TS streams. If your use case needs an existing media player application to extract MISB-compliant metadata from the MPEG-TS stream, then In-band Metadata is the most convenient solution.


Video Compression Formats

GstSEIMetadata supports only H.264 and H.265 encoded video streams. In-band Metadata support is limited to the video codecs supported by the mpegtsmux GStreamer plugin. For GStreamer 1.20, mpegtsmux supports MPEG and Dirac compression formats in addition to H.264 and H.265. Therefore In-band metadata provides more options to choose from for the video compression format than GstSEIMetadata.


Multimedia Containers

With In-band metadata, you are limited to the MPEG-TS container. On the other hand, GstSEIMetadata is container-agnostic, so you can use any container that supports H.264 or H.265 compressed video as input.


Image to Metadata Mapping

If you need to associate the metadata to a specific video frame, then GstSEIMetadata is the most convenient solution, because the metadata is injected into the encoded frame NAL unit, becoming part of the same stream. It is possible to associate each metadata buffer with a corresponding video frame with In-band metadata too, however, with this solution the metadata and the video are separate streams, and some work to map the video buffer timestamp to the closest metadata buffer timestamp is required to make the mapping.



Which solution should I evaluate?

RidgeRun can provide evaluation versions of GstSEIMetadata and In-band Metadata on demand. Figure 8 presents a flow chart to help you choose which of the two solutions is more convenient for your project:



A flow chart with the important questions to decide if it is more convenient to evaluate GstSEIMetadata or In-band Metadata
Figure 8: Evaluation of GstSEIMetadata or In-band Metadata flowchart

Conclusion

RidgeRun offers two metadata injection/extraction solutions that can be used in different scenarios. In-band Metadata is the most convenient for use cases that require to use of a playback application that needs MISB-compliant metadata inside a MPEG-TS stream, or when flexibility to choose from more video codec options is required. On the other hand, GstSEIMetadata is the most convenient when flexibility on the multimedia container is needed and if an easy way to map each video frame to its associated metadata is required.


Feel free to contact us if you have any questions about our metadata solutions!

1,758 views
bottom of page