Skip to content

H5Dataset with PyTorch DataLoader #4

Answered by francois-rozet
Dingel321 asked this question in Q&A
Discussion options

You must be logged in to vote

Hello 👋

The advantage of DataLoader with num_workers > 0 is that the data processing is concurrent with the main process. Therefore, if a loop iteration takes longer than it takes to fetch a batch and transfer it to the main process, the next iteration will not have to wait for data. However, if iterations are fast, the overhead added by the transfer of data between processes could outweigh the benefits.

Here is an example (train.h5 contains 1M samples) where using a DataLoader is worthwhile. The effect is accentuated by the (very) large batch size.

>>> import lampe
>>> import time
>>> import torch
>>> import torch.utils.data as data
>>> import tqdm
>>>
>>> dataset = lampe.data.H5Dataset('…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by francois-rozet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants