Questions regarding design choices #5626

treasan · 2024-09-06T19:13:59Z

Describe the question.

Hello everyone,

I have a question regarding some design choices when building a video dataset with DALI. My pipeline consists of several steps where some steps happen within DALI pipelines and some steps are normal python code. Specifically, I have a web dataset consisting of video containing tar files, so my first step is to invoke DALI's webdataset reader within a pipeline. Afterwards, I would like to filter out unwanted video files before decoding based on their metadata. Afterwards I invoke a second DALI pipeline for decoding the video files. Then, I process the decoded videos (e.g. cutting them up into smaller snippets and finally forward those to another DALI processing pipeline (e.g., for resizing etc). A dummy code looks something like this:

@pipeline_def()
def wds_extraction(paths):
    raw_video_bytes = fn.readers.webdataset(paths=paths, ...)
    return raw_video_bytes

def filter(source):
    for video_bytes in source:
        duration, fps = get_metadata(video_bytes)
        ...
        yield video_bytes, duration, fps

@pipeline_def()
def decoding(source, device):
    inputs = fn.external_source(source, num_outputs=3) # bytes, duration, fps
    video = fn.experimental.decoders.video(inputs [0], device=device)
    return video, *inputs[1:] # simply forward duration and fps unchanged ...

def cutting_snippets(source):
    ...

@pipeline_def()
def resizing(source):
    fn.external_source(source, ...)
    ...

def iterator(paths):
    source = wds_extraction_iter(paths) # wraps the wds_extraction pipeline in a DALIRaggedIterator
    source = filter(source)
    source = decoding_iter(source) # wraps the decoding pipeline in a DALIRaggedIterator
    source = cutting_snippets(source)
    source = resizing_iter(source) # wraps the resizing pipeline in a DALIRaggedIterator
    yield from source

I wanted to ask whether this design choice is efficient even with the context switches between pure python and DALI pipelines. Are there some disadvantages performance-wise? Another quite bothering thing is that I have to forward each piece of data through every DALI pipeline even though they do not get updated anymore. For example, I extract the duration and fps of each video in the filter method and want to forward them until the end to the user. Hence, I must also load them into the DALI pipelines and simply output them again.

Is there a better way to achieve a pipeline like this?

Check for duplicates

I have searched the open bugs/issues and have found no duplicates for this bug report

The text was updated successfully, but these errors were encountered:

mdabek-nvidia · 2024-09-11T08:08:19Z

Hi @treasan,

Thank you for reaching out.
Your design is overall not that bad. The improvement I can think of is using the parallel external sources to asynchronously load and filter videos before the decoding.
Regarding the metadata you are passing through pipelines, my impression is that they are not that heavy and the overhead will be small.

treasan added the question Further information is requested label Sep 6, 2024

dali-automaton assigned mdabek-nvidia Sep 9, 2024

treasan mentioned this issue Sep 10, 2024

Add an operator for receiving video metadata #5630

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding design choices #5626

Questions regarding design choices #5626

treasan commented Sep 6, 2024 •

edited

Loading

mdabek-nvidia commented Sep 11, 2024

Questions regarding design choices #5626

Questions regarding design choices #5626

Comments

treasan commented Sep 6, 2024 • edited Loading

Describe the question.

Check for duplicates

mdabek-nvidia commented Sep 11, 2024

treasan commented Sep 6, 2024 •

edited

Loading