Skip to content

Releases: NVIDIA/DALI

DALI v1.13.0

22 Apr 17:30
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added support for per-frame (temporal) arguments to the Gaussian Blur and Laplacian operators (#3715 and #3723).
  • Optimized audio decoder resampling for ARM (#3745).
  • Improved the debug (immediate execution) mode:
    • Added direct operator calls in debug mode (#3734).
    • Added a debug mode benchmark (#3762).
  • Added support for GPU positional arguments in the Slice operator (#3741).
  • Documentation improvements:
    • Split the operator documentation into separate pages (#3794).
    • Added a mechanism for cross-referencing examples and operators (#3748).
    • Added an FAQ section to the DALI user guide (#3761).
    • Added new GTC talks (#3757).
    • Added shuffling and shards handling snippets to the parallel external source examples (#3744).

Fixed Issues

  • Fixed the handling of samples that exceed 2GBs in the parallel external source (#3768).

Improvements

  • Add per-frame operator (#3723)
  • Add support for per-frame arguments to Gaussian Blur and Laplacian operators (#3715)
  • Separate the documentation pages! (#3794)
  • Update zlib to 1.2.12 version (#3787)
  • Trim TL0_tensorflow_plugin and TL0_python-self-test-readers-decoders tests (#3796)
  • Add _schema_name attribute in fn API (#3798)
  • Add resize checkerboard tests, comparing to ONNX reference precomputed data (#3792)
  • Update nvJPEG2000 to 0.5.0 version (#3791)
  • Fix header in parallel external source notebook (#3790)
  • Update documentation link to the '22 roadmap (#3786)
  • Bump Nvidia TF1 version used in tests to 22.03 (#3769)
  • Add mechanism for crossreferencing examples and operators (#3748)
  • Add direct operator calls in debug mode (#3734)
  • Make number of samples in batch signed (#3789)
  • Add debug mode benchmark (#3762)
  • Fix the cuBLAS version to one compatible with nvTF 22.01 (#3781)
  • Apply changes from TV sample encapsulation in NVJPEG2K (#3780)
  • Ensure sample encapsulation in Tensor Vector (#3701)
  • Add a TL0 test that runs on more than 1 GPU (#3772)
  • Add FAQ section to the DALI documentation (#3761)
  • Remove the compose operator from the fn API table (#3767)
  • Add new GTC talks. Update old link (#3757)
  • Update to CUDA 11.6u2 (#3764)
  • RNG to use pinned memory for kernel launch args (#3765)
  • Revert "Pin webdataset version to the last compatible with python 3.6 (#3746)" (#3763)
  • Fix the wrong patch for CVE-2022-0907 which by mistake duplicated CVE-2022-0909 (#3760)
  • Quantize GDS chunk size to 1 MB. (#3759)
  • Add GDS-compatible allocator with 4k alignment. (#3754)
  • Update error messaging of nvJPEG (#3756)
  • Allow GPU slice arguments (#3741)
  • Add filename to the error message in the numpy reader (#3753)
  • Fix libtiff vulnerabilities (#3752)
  • Update parallel external source notebook and include shuffling example.. (#3744)
  • Add supported python version classifier to DALI TF plugin setup.py (#3751)
  • Vectorize audio resampling for ARM NEON. (#3745)
  • Remove prints from the regular DALI execution flow (#3740)
  • Pin webdataset version to the last compatible with python 3.6 (#3746)
  • Align test expectations with slice implementation rounding logic (#3738)
  • Update RapidJSON (#3737)
  • Regenerate getting started jupyter examples (#3732)
  • Improve documentation for AccessOrder wait and set_order. (#3736)

Bug Fixes

  • Add missing copying of pinned prop when sharing buffer (#3797)
  • Disable PES large sample test on Xavier runner (#3788)
  • Fix source device in PyTorch cross-device test. (#3775)
  • Fix large mini-batch handling in parallel external source (#3768)
  • Fix Yolo v4 example non-fatal teardown error (#3739)
  • Rework Image Decoder example (#3731)
  • Check return value of a CUDA function call. (#3733)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
    If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
  • The DALI TensorFlow plug-in might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have the prebuilt plug-in binary that is shipped with DALI, ensure that the compiler that is used to build TensorFlow exists on the system during the plug-in installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.13.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.13.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.13.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.13.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.12.0

24 Mar 12:07
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added support for the GPU-accelerated decoding of videos with a variable frame rate (experimental.readers.video) (#3668).
  • Reduced the binary size (#3680 and #3682).
  • Improved the TensorFlow plug-in installation even when none of the prebuilt binaries matches the exact TensorFlow version (#3720).
  • Improved performance by increasing the usage of pinned memory in argument input buffers (#3728).
  • Documentation improvements (#3722, #3684, and #3674).

Fixed Issues

  • Fixed the TensorFlow plug-in issue that prevented it from working in the CPU-only mode (#3719).

Improvements

  • [DALI TF] Try building from source when TF version doesn't match exactly. Add test step to installation script. (#3720)
  • Add supported layouts to Crop, CropMirrorNormalize (#3722)
  • Make output buffers for arugment inputs to GPU operators pinned. (#3728)
  • Bump up TensorFlow version used in tests (#3688)
  • Fix coverity issues (#3679)
  • Bump up CUDA to 11.6U1 (#3709)
  • Add test to check if importing DALI doesn't break Torch process forking (#3669)
  • Add non-owning SampleView (#3706)
  • Use pinned buffers for kernel parameters and for ToContiguousGPU. (#3689)
  • Update deps version for libtiff-CVE-2022-0561 fix (#3693)
  • Update documentation regarding GDS being part of CUDA toolkit (#3684)
  • Add VideoReaderDecoder GPU (#3668)
  • Custom build: subset of file patterns for kernel and operators (#3672)
  • Remove lineinfo from RelWithDebInfo DALI builds (#3680)
  • Build DALI only for major arch versions (#3682)
  • Remove mpiexec affinity binding in TensorFlow TL1 and TL3 RN50 test (#3681)
  • Remove Scratchpad from KernelManager (#3678)
  • Update dependencies (#3677)
  • Use DynamicScratchpad in KernelManager. (#3670)
  • Add an info about fill_values being used by pad_output in crop_mirror_normalize (#3674)

Bug Fixes

  • Fix CVE-2022-0626 in libtiff (#3727)
  • Fix TensorFlow plugin operation without GPU (#3719)
  • Syncrhonize at the end of BoxEncoder's constructor. (#3724)
  • Fix ES debug mode test failing with missing batch (#3712)
  • Add missing import nose.SkipTest in optical flow tests (#3707)
  • Fix stream handling in video loader and nvdecoder. (#3705)
  • Fix typos found in tensor_shape.h docs (#3695)
  • Fix optical flow tests for Turing (#3685)
  • Fix Slice's adaptive tiling for smaller output types (#3687)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.12.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.12.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.12.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.12.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

v1.11.1: Fix stream usage in C API (#3713)

04 Mar 13:36
2b25d34
Compare
Choose a tag to compare

Key Features and Enhancements

This is a patch release.

Fixed Issues

  • Fixed wrong handling of input data by GPU external source in multi-GPU scenario
  • Fixed wrong usage of streams in C API

Improvements

  • None

Bug Fixes

  • Fix multi-device GPU external source. (#3710)
  • Fix constructing GPU Tensor from DLPack capsule (#3711)
  • Fix stream usage in C API (#3713)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker
  • The experimental.readers.video operator causes a crash during the process teardown with driver versions 460 to 470.21

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.11.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.11.1

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.11.1
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.11.1

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.11.0

28 Feb 16:42
cd794e5
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added the GPU laplacian operator (#3644, #3618).
  • Updated the optical_flow operator to use the latest SDK capabilities (#3625).
  • Extended the readers.webdataset operator to support pax POSIX.1-2001 tar format. (#3645).
  • Improved the performance of the slice operator (#3604, #3600).
  • Improved the debug (immediate execution) mode:
    • Added the direct use of external sources (#3605).
    • Extended the API and added a string representation and the .shape method to data nodes (#3647, #3591).
    • Added support for deterministic seed generation (#3589).
    • Added a tutorial notebook (#3648).

Fixed Issues

  • Fixed the incorrect construction of TensorList from a list of tensors (#3626).
  • Fixed an issue in the CPU readers.video operator that prevented it from working in the CPU-only mode (#3660).

Improvements

  • Improve checking if it is safe to fork the DALI process (#3671)
  • Add debug mode tutorial notebook (#3648)
  • Dynamic & stream-aware scratchpad (#3667)
  • Use fn API in non-silent tests (#3666)
  • Frames decoder gpu (#3615)
  • Add Laplacian GPU operator (#3644)
  • Update third party (#3632)
  • Improve the documentation about CPU tensors and named arguments (#3655)
  • Update docs for the parallel option in external source (#3654)
  • Update optical flow operator to use the latest OF SDK capabilities (#3625)
  • Remove deprecated usage of .dtype() method (#3650)
  • Update pattern used to generate TFRecord idx files (#3653)
  • Add one_hot benchmark (#3553)
  • Add str and repr for Tensor, TensorList and DataNode[Debug] (#3647)
  • Relax test tolerance in DisplacementTest/Sphere and Water (#3649)
  • Update warp_affine test and docs (#3639)
  • Remove unnecessary Dockerfile.cuda116.x86_64deps file (#3642)
  • Updates FindNVJPEG.cmake (#3643)
  • Add JPEG compression distortion to augmentation gallery (#3633)
  • Use index slicing in geometric transformation notebook (#3635)
  • Add support for tar pax POSIX.1-2001 WebDataset (#3645)
  • Remove redundant tests (#3634)
  • Add dtype member for TensorList and modify dtype for Tensor (#3628)
  • Remove dependency between dali_test.bin and dali_operators lib (#3637)
  • Add Laplacian GPU kernel (#3618)
  • Updated PR template (#3619)
  • Remove synchronization from deallocate. (#3497)
  • ArgHelper tests to not depend on operators from dali_operators lib (#3631)
  • Add dtype argument to ExternalSource in examples (#3611)
  • Add CUDA 11.6 support (#3623)
  • Make data objects stream-aware (#3536)
  • Changing WDS Reader source_info property (#3614)
  • Relax test tolerance in DisplacementTest/Sphere (#3621)
  • Video tests utils and refactor (#3620)
  • Debug mode direct ExternalSource (#3605)
  • Remove Buffer inheritence from TensorList (#3576)
  • Relax test tolerance in DisplacementTest/Water (#3616)
  • Improve Slice's adaptive tiling (#3604)
  • Explicitly coalesce stores in Slice for smaller output types (#3600)
  • Add an upper bound for the video decoder workaround (#3609)
  • Deterministic seeds in debug mode (#3589)
  • Move from zlib to zlib-ng optimized fork (#3570)
  • TensorList shape (#3591)

Bug Fixes

  • Fix frames decoder destruction (#3662)
  • Removes check of CUDA runtime and linked libs from the backend (#3664)
  • Remove CUDA call from CUDAStreamPool's constructor (#3663)
  • Fix librosa bugs after 0.9 release (#3665)
  • Fix VideoReader CPU only variant (#3660)
  • Add a separate initialization method to OpticalFlowAdapter (#3657)
  • Fix get-pip.py for python 3.6 (#3652)
  • Fix sphinx warnings in the docs (#3651)
  • Fix synchronization bug in operator benchmark (#3638)
  • Replace calls to exp2 with std::exp2f (#3646)
  • Fix null_stream constant evaluation fallback (#3630)
  • Fix CVE-2021-4156 in libsnd (#3624)
  • Fix TensorList constructor from list of tensors. (#3626)
  • Fix CVE-2022-22844 in libtiff (#3612)
  • Fix dtype in external_source with multiple outputs. (#3608)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker
  • The experimental.readers.video operator causes a crash during the process teardown with driver versions 460 to 470.21

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.11.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.11.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.11.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.11.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.10.0

25 Jan 10:44
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • The get_property operator (CPU and GPU) that is used to fetch tensor metadata, such as the source file name (#3572).
    • The laplacian operator (CPU) (#3563).
  • Color-based augmentations were extended to support video data (#3580).
  • Improved performance of the slice operator (#3584, #3573, and #3568).
  • Added an experimental debug (immediate execution) mode (#3586 and #3531).

Fixed Issues

No major issues were fixed in this release.

Improvements

  • Adds video support to color based augmentations (#3580)
  • Fixed cmake error (#3601)
  • Fix debug build failures in benchmark code (#3585)
  • Make sanitizers tests fail when it encounters the first issue (#3583)
  • Use proper attribute filters for nosetests (#3592)
  • Fix wrong parameter name in Laplacian docs (#3593)
  • QA script fix: Add an empty negative branch to a conditional to prevent automatic error (#3588)
  • Small refactoring in Slice GPU kernel (#3584)
  • GetProperty operator CPU+GPU (#3572)
  • Add comments about scale argument (#3581)
  • Fix coverity issues (#3579)
  • Check when using ES source and feed_input (#3574)
  • Prototype of the debug mode (#3531)
  • Enable tests for dynamically loaded cuda libraries (#3540)
  • Add Laplacian operator [CPU] (#3563)
  • Add CUDAStreamPool & CUDAStreamLease. (#3569)
  • Coalesce stores in Slice for smaller output types (#3568)
  • Turn off OpticalFlow test on aarch64 platform for driver r495.x and newer (#3566)

Bug Fixes

  • Fixing typos in WDS's source_info (#3602)
  • Fix handling of scalar argument in slice operator (#3596)
  • Use the same device for debug mode test and baseline (#3594)
  • Fix JPEG distortion GPU quality argument handling for sequences (#3590)
  • Use current device in _as_gpu (#3586)
  • Fix version_ge: command not found error in TL0_python-self-test-base-cuda (#3582)
  • Disable coalescing values in Slice for CUDA 10 (#3573)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.10.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.10.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.10.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.10.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.9.0

03 Jan 11:30
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Extended the jpeg_compression_distortion operator to support video inputs (#3482 and #3447).
  • Added the file_filter argument to the readers.file operator that allows you to filter files by names (#3459).
  • Extended the slice operator to support per-sample axes arguments and negative axis indexing (#3516).
  • Extended the pad operator to support per-sample axes, fill_value arguments, and negative axis indexing (#3534).
  • Improved the performance of the slice operator for small batch sizes (#3557).
  • Added the Laplacian CPU kernel (#3565, #3535, and #3518).

Fixed Issues

This DALI release includes the following fixes:

  • Fixed a race condition that randomly caused incorrect outputs in the TensorFlow plugin (#3547).
  • Fixed synchronization issues in the PaddlePaddle plugin that may have caused incorrect results (#3498 and #3487).

Improvements

  • Make Slice kernel tiling adaptive (#3557)
  • Add Laplacian CPU kernel (#3518)
  • Allows DALI to dlopen dependent CUDA toolkit libraries: NPP, cuFFT and nvJPEG (#3519)
  • Fix test code to be compatible with python 3.6 (#3550)
  • Fix a typo in warp jupyter notebook. (#3554)
  • Add Cast and CoinFlip GPU benchmarks (#3541)
  • Fix DALI TL3 test for 21.11 (#3529)
  • Pad operator: Add support for per-sample axes and fill_value arguments, and negative axes (#3534)
  • Add FlipGPU and GaussianBlurGPU benchmarks (#3538)
  • Make bundle-wheel.sh more configurable (#3539)
  • Enable DALI test on python 3.9 and add 3.10 support (#3522)
  • Add transform parameter to convolution cpu (#3535)
  • Remove nvJPEG leak sanitizer workaround in tests (#3532)
  • Dependency update Nov 2021 (#3523)
  • Add support for per-sample axes and negative axes in Slice (#3516)
  • Refactor ArgValue to support empty samples and batch shape expectations (#3528)
  • Move to CUDA 11.5 update 1 (#3526)
  • Add Copy GPU benchmark (#3517)
  • Move to CUDA_CALL for nvJPEG, nvJPEG2k, and NPP (#3521)
  • Silence warning in LookupTable (#3508)
  • Move unfold_outer_dim to common utilities. (#3486)
  • Remove Context from memory resources. (#3485)
  • Set minimum python version to 3.7 for TF 2.7 (#3489)
  • Allow video inputs to JpegCompressionDistortion (#3482)
  • Bump up TensorFlow version to 2.7 in tests (#3475)
  • Change the way how NVML wrapper is linked internally (#3481)
  • Add support for file_filters in FileReader (#3459)
  • Allow video inputs to JpegCompressionDistortion (#3447)
  • Move to Ubuntu 20.04 for cuda 10.2 toolkit image (#3477)
  • Move to Ubuntu 20.04 for cuda toolkit image (#3476)
  • Pin Keras version for TensorFlow 2.6 (#3474)
  • Add support for BatchInfo in experimental TF DALI Dataset (#3468)

Bug Fixes

  • Replace equality with EqualEpsRel in Laplacian kernel tests (#3565)
  • Synchronize CUDA stream once in operator benchmark (#3525)
  • Ensure that num_devices and device are stored in correct order. (#3560)
  • Fix conda test for CUDA 10.x (#3556)
  • Fix race condition when initializing per-device default memory resources (#3555)
  • Fix data race when copying outputs in TF plugin (#3547)
  • CUDA VM resource bugfixes (#3545)
  • Fix build of DALI TensorFlow plugin during installation (#3546)
  • Fix issues found during static analysis (#3524)
  • Fix lack of proper device id used to obtain relevant cuda stream in paddle plugin (#3498)
  • Add type check to last_batch_policy argument (#3490)
  • Fix DALI paddle plugin stream synchronization error (#3487)
  • Reuse GaussianBlur windows between iterations (#3484)
  • Add synchronization when destroying the Executor. Make all destructors noexcept. (#3492)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.9.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.9.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.9.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.9.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.8.0

22 Nov 18:51
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added batch mode support to external_source operator with parallel callback. (#3420 and #3397)
  • Extended crop_mirror_normalize operator to support per-sample normalization parameters. (#3455)
  • Improved error messages when trying to decode images with unsupported format. (#3445)
  • Documentation improvements. (#3448 and #3439)

Fixed Issues

This DALI release includes the following fixes:

  • Fixed unsound interpretation of the aspect ratio parameter in the random_bbox_crop operator, when input shape is provided. (#3425)
  • Fixed incorrect output shape in the experimental.readers.video operator. (#3460)

Improvements

  • Remove reseeding of numpy in RandomlyShapedDataIterator (#3466)
  • Add indexing information to TF external source tests (#3467)
  • Extend setup_packages.py to bing package with its dependencies (#3464)
  • Update dependency versions (#3457)
  • Optionally load plugins global symbols. (#3462)
  • Add NVIDIA Video Codec SDK - NVDECODE API (#3458)
  • CropMirrorNormalize: Add support for per-sample normalization arguments (#3455)
  • Support batch mode in parallel external source (#3397)
  • Turn off part of TL0_FW_iterators tests when sanitizers are enabled (#3456)
  • Read ArgValue constant arguments only once (#3453)
  • Rename InputRef/OutputRef to Input/Output in workspace API (#3451)
  • Reduce number of Workspace Input/Output APIs (#3446)
  • Fix error reporting in image factory (#3445)
  • Update custom op example for newer CMake (#3448)
  • Update TF dataset to 2.8 (#3442)
  • Fix documentation of CropMirrorNormalize dtype argument (#3439)
  • Bump up nvJPEG2k version to 0.4 (#3440)
  • Enable CUDA 11.5 builds (#3436)
  • Enable sanitizers in regular CI runs (#3422)
  • Improve the way how available python version is available (#3438)
  • RandomBBoxCrop: Fix interpretation of aspect ratio, when input shape is provided (#3425)
  • Change the permute function to infer the output size from the indices. (#3434)
  • Move to the upstream deb packages for JetPack compilation (#3432)
  • Change C++ standard to c++17 for non-CUDA sources (#3423)
  • Add epoch number to SampleInfo and introduce BatchInfo (#3420)
  • Separate type setting from data access in Buffer (#3414)
  • Make SBSA build compatible with all armv8-a CPUs (#3417)
  • Update TF plugin for future API change (#3415)
  • Replace pointers with references for ShareData parameter (#3408)
  • Code cleanup: remove unused variables, fix buffer overflow (#3410)
  • Enable usage of sanitizers in tests (#3377)

Bug Fixes

  • Update tensorflow version in conda build (#3471)
  • Fix STRING_VEC default arguments presentation in docs (#3470)
  • Remove broken class method from DALI Dataset (#3465)
  • Fix experimental.readers.video output shape (#3460)
  • Fix static analysis detected issues (#3444)
  • Silence output from build_per_python_lib cmake utility (#3454)
  • Make Workspace::Input return const reference (#3452)
  • Update imports from collections to collections.abc where needed (#3429)
  • Install boost/preprocessor headers (#3443)
  • Fix ShareData for TensorVector with no elements (#3435)
  • Update GCC version in conda recipe to 7.5 to workaround GCC bug 82461. (#3431)
  • Add a missing state destruction for the NVJPEG HW decoder (#3416)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.8.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.8.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.8.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.8.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.7.0

25 Oct 07:02
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • readers.webdataset, which is a reader for the Webdataset format (#3395, #3385, #3375, #3372, #3360, and #3306).
    • experimental.readers.video (CPU), which is an experimental video reader and decoder that includes support for the variable frame rate (#3412, #3411, #3391, and #3362).
  • Performance improvements:
    • warp_affine performance has been improved for some common cases (#3370).
    • Other minor general performance improvements (#3363 and #3338).
  • Added the DALI_DISABLE_NVML and DALI_RESTRICT_PINNED_MEM environment variables. These variables allow you to limit the use of NVML and pinned memory and enable DALI on more platforms (#3404 and #3382).

Fixed Issues

This DALI release includes the following fixes:

  • Fixed an issue in the pad operator that caused a crash when the operator was used with a variable batch size (#3354).
  • Fixed a race condition that occurred in the readers.video operator (#3355).
  • Fixed a bug in the C API that caused invalid memory access in some use cases (#3350).

Improvements

  • Add more logging to FramesDecoder (#3412)
  • Reduce the TensorList and TensorVector API scope (#3403)
  • Add an env variable DALI_DISABLE_NVML to disable NVML usage on demand (#3404)
  • Enable BUILD_LDMB by default (#3406)
  • Add error message checking into existing python tests (#3401)
  • Bump up Nvidia TensorFlow version in tests to 21.09 (#3383)
  • Add VideoReaderDecoder (#3391)
  • Webdataset automatic index file inference (#3385)
  • Add an environment variable that determines whether pinned memory usage should be restricted. (#3382)
  • Notebook with an example of webdataset usage (#3372)
  • Add frames decoder (#3362)
  • Move to libtar fork - https://github.com/tklauser/libtar (#3375)
  • Remove possibility of access to contiguous TL buffer (#3373)
  • Add error message checks (#3371)
  • Update libcudacxx to include fix for build with ASAN. (#3374)
  • Specialize warp kernels for common numbers of channels. (#3370)
  • Webdataset performance and cosmetic optimizations (#3360)
  • Update documentation about enabling sanitizers (#3365)
  • general perf changes alongside WDS perf (#3363)
  • Update CUTLASS and Google Benchmark (#3361)
  • Remove access to contiguous TL buffer from Coco Reader tests (#3351)
  • Remove access to contiguous TL buffer from BoxEncoder, Resize, Shapes and Warp (#3339)
  • Bump clang version to 12.0.1 in deps image (#3342)
  • Use DALIDataType where possible. (#3338)
  • Update asserts in python tests (#3336)
  • Webdataset reader operator implementation (#3306)
  • Work around PyTorch internal fragmentation in L3 SSD test. (#3343)
  • Make view converters operate on samples only (#3325)
  • Add an ability to avoid class remapping in coco reader (#3333)
  • Remove access to underlying contiguous TL buffer from tests (#3319)

Bug Fixes

  • Fix the Webdataset documentation formatting (#3395)
  • Fix documentation formating (#3369)
  • Fix sharding and shuffling in VideoLoaderDecoder (#3411)
  • Fix pool process tracking in parallel ES tests, cleanup batches properly (#3400)
  • Fix ownership issues in Share APIs for Tensor, TL and TV (#3407)
  • Fix memory leak in async_pool destructor. (#3402)
  • Fix off build (#3399)
  • Fix HW decoder overwriting growth factor for CPU buffers (#3398)
  • Fix libtiff build (#3392)
  • Fix the memory kind stored in AllocInfo in nvjpeg memory. (#3393)
  • Fix bug in TensorList test (#3388)
  • Adjust default eps in video test (#3389)
  • Fix FFMPEG conda build (#3386)
  • Fix errors in TF YOLO example (#3379)
  • Adjust growth and shrink threshold for cpu buffers (#3378)
  • Fix error reporting in TL3_EfficientDet_convergence and TL3_YOLO_convergence (#3376)
  • Fix problems detected by asan and lsan (#3367)
  • Fix Coverity issues (#3366)
  • Fix EfficientDet docs link (#3364)
  • Fix Video reader race condition (#3355)
  • Fix variable batch size handling in pad operator (#3354)
  • Fix bugs in C API and refactor tests (#3350)
  • Fix and optimize name handling in TypeInfo. (#3349)
  • Fix sequence rearrange python test (#3353)
  • Handle SIGV situation when trying to load prebuild DALI TF Plugin (#3347)
  • Fix DeviceBuffer copy - use proper copy function. (#3344)
  • Skip Keras TF tests in versions with broken execption handling (#3341)
  • Fix squeeze operator test on Python3.7 and earlier (#3337)
  • Use memory resources in DeviceBuffer and TestTensorList. (#3334)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.7.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.7.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.7.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.7.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.6.0

24 Sep 11:23
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Added support for lambdas and local functions as callback in parallel external_source operator (#3270, #3269).
  • Added the following tutorials:
    • TensorFlow DALI Dataset input handling (#3212).
    • Parallel external_source operator (#3199).
  • Added DALI preprocessing to the EfficientDet example (#3118).

Fixed issues

This DALI release includes the following fixes:

  • Fixed a crash that happened in the gaussian_blur operator for inputs where one of the dimensions equals 1 (#3291).
  • Fixed random Python crashes on the process teardown when the external_source operator was used (#3245).
  • Fixed readers.video hanging on some HEVC samples (#3247).

Improvements

  • Add error message checking in python tests (#3324)
  • Optimize bundling wheel by using multiprocessing in build_helper.sh (#3323)
  • Changed "accross" to "across" in README.rst (#3329)
  • Move to CUDA 11.4 update 2 (#3322)
  • Fix FFmpeg vulnerabilities (CVE-2020-22037, CVE-2021-38171, CVE-2021-38291) (#3315)
  • Rework diplacement filter to sample-based approach (#3311)
  • Remove kernels/alloc.h (#3309)
  • Adjust usage of rasies and assert_raises in tests (#3318)
  • Move static UserStream variable to the Get function inside the class (#3242)
  • Adjust usage of raise and assert_raises (#3316)
  • Update README with third parties dependencies (#3320)
  • Add input type validation to feed_ndarray in MXNet and PyTorch (#3308)
  • Add parameters checks when deserializing a pipeline (#3253)
  • Extend BlockSetup with 1-dim specialization (#3304)
  • Move back to upstream libtar from conda (#3301)
  • Rework LUT to batch processing and remove access to TL buffer (#3298)
  • Add checking a message of the expected exception against a pattern in nose tests (#3302)
  • Use libcu++ interfaces. (#3297)
  • Update third party dependencies (#3300)
  • Pin nvJPEG2000 and GPU Direct dependencies (#3299)
  • Bump up nvidia tensorflow version to 21.08 in tests (#3296)
  • Implement InputDatasets for DALIDataset (#3292)
  • Remove access to underlying contiguous TL buffer in bb_flip op (#3283)
  • Make memory kind a tag type instead of an enum value. (#3290)
  • Add examples on serialization to parallel external source notebook (#3270)
  • Support lambdas and local functions as callbacks in parallel ExternalSource (#3269)
  • TarArchive::TellArchie implementation + renaming (#3286)
  • Remove access to underlying contiguous TL buffer in Flip op (#3280)
  • Remove access to underlying contiguous TL buffer in Normalize op (#3281)
  • Use default resources for allocating tensors (#2948)
  • Remove access to underlying contiguous TL buffer in Constant op (#3276)
  • TarArchive additional features (#3273)
  • Add ScatterGatherCPU and rework Copy op to batch processing (#3266)
  • Change the way how start and end timestamps are converted to frames (#3252)
  • Update RMM to an up-to-date & version with interface rework applied. (#3254)
  • Test fused decoder out-of-bounds error (#3175)
  • Bump supported tested TensorFlow versions (#3250)
  • Update supported CUDA version in docker/build.sh (#3248)
  • Adjust capitalization in tutorials (#3246)
  • Remove not applicable aclaratory note from PyTorch and Paddle iterators (#3235)
  • Add tutorial about TF DALI Dataset input handling (#3212)
  • Add tutorials for Parallel External Source (#3199)
  • Add DALI to EfficientDet example (#3118)
  • Use fn.random module in tests and examples (#3174)

Bug Fixes

  • Improve tests for expected errors + fix PythonFunction (#3332)
  • Fix incorrect use of a global variable in the test of operator Shapes. (#3310)
  • Rework Cast to batch processing (#3278)
  • Fix HEVC video handling (#3247)
  • Fix infinite loop for convolution with extent equal 1 (#3291)
  • Add yaml as a Webdataset test dependency, adjust to new WDS API (#3295)
  • Fix missing condition variable include (#3289)
  • Remove the inclusion of scatter_gather.h from types.h (#3288)
  • Fix cast warning in ScatterGather (#3284)
  • Clear to_dealloc and notify under a lock. (#3282)
  • Fix notification method in deferred deallocation. (#3279)
  • Fix race condition when initializing plain host memory resource. (#3268)
  • Fix alignment constraints in CUDA VM resource. (#3274)
  • Fix missing sizeof in Tensor Test (#3267)
  • Fix hw decoder tests disabled on old drivers (#3257)
  • Don't increase alignment to upstream alignment when retrying to allocate (#3264)
  • Avoid creating primary context for synchronization. (#3263)
  • Avoid upstream allocation stampede by retrying to allocate from free after gaining the upstream lock. (#3258)
  • Remove excessive synchronization in AsyncPool. (#3256)
  • Ensure keeping py_pool alive until pipline is garbage collected (#3245)
  • Fix running Python core tests (#3249)
  • Fix an assigment of py::none() to py::dict in backend_impl.cc (#3244)
  • Fix interoperation between DALI and PyTorch lightning due to buffering (#3239)
  • Reduce number of iterations in L0 tests (#3173)
  • Fix memory leak in backend_impl.cc caused by PyObject_GetAttr (#3233)
  • Fix FFmpeg CVE-2021-38114 (#3231)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.6.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.6.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.6.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.6.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

DALI v1.5.0

23 Aug 09:28
Compare
Choose a tag to compare

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • Extended decoders.image to support WebP decoding (#3206)
  • Added indexing (NumPy-like) API for tensor slicing (#3200 and #3195)
  • Extended external_source to support source argument in TensorFlow DALI Dataset (#3215, #3193, #3177 and #3176)
  • Added examples:
    • Tensorflow YOLOv4 (#2883)
    • WebDataset usage with external_source (#3153)

Fixed issues

This DALI release includes the following fixes:

  • Fixed include paths that prevented including some parts of DALI in other C/C++ projects (#3210)
  • Fixed a crash when only anchors and no shapes were provided in multi_paste (#3166)
  • In the spectrogram operator, extracted windows are now correctly centered before FFT calculation, when the nfft argument is bigger than length of the window. (#3180)
  • Fixed a minor memory leak in decoders.image (#3148)

Improvements

  • Add documentation for indexing. (#3200)
  • Move to CUDA 11.4U1 (#3213)
  • Add WebP support to image decoder (#3206)
  • libtar API implementation (#3198)
  • Tensor indexing (#3195)
  • Make TF graph-mode tests faster (#3204)
  • Add support for ES source in TF DALI Dataset (#3177)
  • Add tensorflow YOLOv4 example (#2883)
  • Refactor Python External Source code (#3176)
  • Update third party dependencies to latest release versions (#3184)
  • Add deferred deallocation to cuda_vm_resource. (#3154)
  • Adjust test scripts and section header for webadataset notebook (#3162)
  • Add Webdataset-ExternalSource Jupyter notebook (#3153)
  • Update PR template (#3150)
  • Update PR template (#3129)

Bug Fixes

  • Fix failing TarArchive tests (#3226)
  • Build custom libtar in conda (#3223)
  • Improve validation in DALIDataset (#3215)
  • Update DALI_DEPS_VERSIOn to include NVIDIA/DALI_deps#19 (#3224)
  • Fix identity check in _is_generator_function which. Add test. (#3216)
  • Fix unused imports in test_utils.py (#3214)
  • Remove the usage of ManagedMemory from the OpticalFlow tests (#3211)
  • Suppress test using unified memory when it is not supported (#3209)
  • Remove include prefix from include paths (#3210)
  • Fix CVE-2021-3246 in libsnd (#3208)
  • Fix pytorch-lighting test (#3196)
  • Fix coverity issues + skip tests involving managed memory when not supported. (#3190)
  • Disable NVJPEG HW decoder for driver < 455 due to performance reason (#3189)
  • Fix compilation with newer GCC (#3188)
  • Disallow some types of sources for parallel ES explicitly (#3193)
  • Center windows when extracting windows to a bigger output window (#3180)
  • Add a compute cap value before running the GDS test (#3185)
  • MultiPaste to adjust the region shape to cover up to the end of the input shape (#3166)
  • Fix wording in docs (#3165)
  • Fix image decode (#3148)
  • Fix LastBatchPolicy doc and update Parallel ES wording (#3152)
  • Fix some errors (#3147)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.5.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.5.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.5.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.5.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code: