Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding design choices #5626

Open
1 task done
treasan opened this issue Sep 6, 2024 · 1 comment
Open
1 task done

Questions regarding design choices #5626

treasan opened this issue Sep 6, 2024 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@treasan
Copy link

treasan commented Sep 6, 2024

Describe the question.

Hello everyone,

I have a question regarding some design choices when building a video dataset with DALI. My pipeline consists of several steps where some steps happen within DALI pipelines and some steps are normal python code. Specifically, I have a web dataset consisting of video containing tar files, so my first step is to invoke DALI's webdataset reader within a pipeline. Afterwards, I would like to filter out unwanted video files before decoding based on their metadata. Afterwards I invoke a second DALI pipeline for decoding the video files. Then, I process the decoded videos (e.g. cutting them up into smaller snippets and finally forward those to another DALI processing pipeline (e.g., for resizing etc). A dummy code looks something like this:

@pipeline_def()
def wds_extraction(paths):
    raw_video_bytes = fn.readers.webdataset(paths=paths, ...)
    return raw_video_bytes

def filter(source):
    for video_bytes in source:
        duration, fps = get_metadata(video_bytes)
        ...
        yield video_bytes, duration, fps

@pipeline_def()
def decoding(source, device):
    inputs = fn.external_source(source, num_outputs=3) # bytes, duration, fps
    video = fn.experimental.decoders.video(inputs [0], device=device)
    return video, *inputs[1:] # simply forward duration and fps unchanged ...

def cutting_snippets(source):
    ...

@pipeline_def()
def resizing(source):
    fn.external_source(source, ...)
    ...

def iterator(paths):
    source = wds_extraction_iter(paths) # wraps the wds_extraction pipeline in a DALIRaggedIterator
    source = filter(source)
    source = decoding_iter(source) # wraps the decoding pipeline in a DALIRaggedIterator
    source = cutting_snippets(source)
    source = resizing_iter(source) # wraps the resizing pipeline in a DALIRaggedIterator
    yield from source

I wanted to ask whether this design choice is efficient even with the context switches between pure python and DALI pipelines. Are there some disadvantages performance-wise? Another quite bothering thing is that I have to forward each piece of data through every DALI pipeline even though they do not get updated anymore. For example, I extract the duration and fps of each video in the filter method and want to forward them until the end to the user. Hence, I must also load them into the DALI pipelines and simply output them again.

Is there a better way to achieve a pipeline like this?

Check for duplicates

  • I have searched the open bugs/issues and have found no duplicates for this bug report
@treasan treasan added the question Further information is requested label Sep 6, 2024
@mdabek-nvidia
Copy link
Collaborator

Hi @treasan,

Thank you for reaching out.
Your design is overall not that bad. The improvement I can think of is using the parallel external sources to asynchronously load and filter videos before the decoding.
Regarding the metadata you are passing through pipelines, my impression is that they are not that heavy and the overhead will be small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants