-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support checkpointing in experimental video reader #5180
Conversation
Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>
@@ -19,7 +19,7 @@ | |||
#include "dali/operators/reader/loader/video/video_loader_decoder_cpu.h" | |||
|
|||
namespace dali { | |||
class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>> { | |||
class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to all checkpointing PRs:
I recommend adding a template alias:
template <typename Backend, typename LoadTarget, typename ParseTarget = LoadTarget>
using CheckpointingDataReader = DataReader<Backend, LoadTarget, ParseTarget, true>;
to avoid retyping the ParseTarget manually all over the place.
class VideoReaderDecoderCpu : public DataReader<CPUBackend, VideoSample<CPUBackend>, VideoSample<CPUBackend>, true> { | |
class VideoReaderDecoderCpu : public CheckpointingDataReader<CPUBackend, VideoSample<CPUBackend>> { |
Alternatively - check if we even need a separate ParseTarget, or swap the order of arguments (chekpointing, parsetarget).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be removing the supports_checkpointing
parameter soon, once we have full support, so it's rather temporary
!build |
CI MESSAGE: [10993505]: BUILD STARTED |
CI MESSAGE: [10993505]: BUILD FAILED |
Signed-off-by: Szymon Karpiński <skarpinski@nvidia.com>
!build |
CI MESSAGE: [10993812]: BUILD STARTED |
CI MESSAGE: [10993812]: BUILD PASSED |
Category:
New feature (non-breaking change which adds functionality)
Description:
This PR adds checkpointing support to
fn.experimental.readers.video
.Additional information:
Adding checkpointing support is a simple task, similar for every loader and reader.
The following changes were required to enable checkpointing in video loader:
Loader<..., supports_checkpointing=true>
Skip()
method.Skip
should behave likeReadSample
in terms of side effects, but should skip a sample instead of reading it. This method is used to implement fast-forwarding inLoader
baseclass.Reset
to usevirtual_shard_id_
(which is the shard currently processed)instead of
shard_id_
(which is the initial shard requested by the user). See [2] for more details.The following changes were required to enable checkpointing in video reader:
DataReader<..., supports_checkpointing=true>
. See [1] for more details.this->SetInitialSnapshot()
.Subsequent snapshots are saved by
DataReader
baseclass.[1] Changing
DataReader
template parametersInheriting from
DataReader<..., supports_checkpointing=true>
might look strange in the diff, because theDataReader
is defined as:and is used mostly as:
so to enable checkpointing one needs to add two parameters:
DataReader<Backend, Target, Target, true>
[2]
virtual_shard_id_
virtual_shard_id_
andshard_id_
might differ whenstick_to_shard=False
.This change doesn't impact existing code, because
Reset
is normally called after each full pass over the data, so then those two are equal. It might happen that a checkpoint is saved when the reader was is processing a different shard thatshard_id_
, so to restore from such checkpoint we need to be able to reset to the current shard (virtual_shard_id_
).The same change was made in
FileReader
in #4954.Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-3702