Skip to content

DALI v1.7.0

Compare
Choose a tag to compare
@banasraf banasraf released this 25 Oct 07:02
· 1504 commits to main since this release

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • New operators:
    • readers.webdataset, which is a reader for the Webdataset format (#3395, #3385, #3375, #3372, #3360, and #3306).
    • experimental.readers.video (CPU), which is an experimental video reader and decoder that includes support for the variable frame rate (#3412, #3411, #3391, and #3362).
  • Performance improvements:
    • warp_affine performance has been improved for some common cases (#3370).
    • Other minor general performance improvements (#3363 and #3338).
  • Added the DALI_DISABLE_NVML and DALI_RESTRICT_PINNED_MEM environment variables. These variables allow you to limit the use of NVML and pinned memory and enable DALI on more platforms (#3404 and #3382).

Fixed Issues

This DALI release includes the following fixes:

  • Fixed an issue in the pad operator that caused a crash when the operator was used with a variable batch size (#3354).
  • Fixed a race condition that occurred in the readers.video operator (#3355).
  • Fixed a bug in the C API that caused invalid memory access in some use cases (#3350).

Improvements

  • Add more logging to FramesDecoder (#3412)
  • Reduce the TensorList and TensorVector API scope (#3403)
  • Add an env variable DALI_DISABLE_NVML to disable NVML usage on demand (#3404)
  • Enable BUILD_LDMB by default (#3406)
  • Add error message checking into existing python tests (#3401)
  • Bump up Nvidia TensorFlow version in tests to 21.09 (#3383)
  • Add VideoReaderDecoder (#3391)
  • Webdataset automatic index file inference (#3385)
  • Add an environment variable that determines whether pinned memory usage should be restricted. (#3382)
  • Notebook with an example of webdataset usage (#3372)
  • Add frames decoder (#3362)
  • Move to libtar fork - https://github.com/tklauser/libtar (#3375)
  • Remove possibility of access to contiguous TL buffer (#3373)
  • Add error message checks (#3371)
  • Update libcudacxx to include fix for build with ASAN. (#3374)
  • Specialize warp kernels for common numbers of channels. (#3370)
  • Webdataset performance and cosmetic optimizations (#3360)
  • Update documentation about enabling sanitizers (#3365)
  • general perf changes alongside WDS perf (#3363)
  • Update CUTLASS and Google Benchmark (#3361)
  • Remove access to contiguous TL buffer from Coco Reader tests (#3351)
  • Remove access to contiguous TL buffer from BoxEncoder, Resize, Shapes and Warp (#3339)
  • Bump clang version to 12.0.1 in deps image (#3342)
  • Use DALIDataType where possible. (#3338)
  • Update asserts in python tests (#3336)
  • Webdataset reader operator implementation (#3306)
  • Work around PyTorch internal fragmentation in L3 SSD test. (#3343)
  • Make view converters operate on samples only (#3325)
  • Add an ability to avoid class remapping in coco reader (#3333)
  • Remove access to underlying contiguous TL buffer from tests (#3319)

Bug Fixes

  • Fix the Webdataset documentation formatting (#3395)
  • Fix documentation formating (#3369)
  • Fix sharding and shuffling in VideoLoaderDecoder (#3411)
  • Fix pool process tracking in parallel ES tests, cleanup batches properly (#3400)
  • Fix ownership issues in Share APIs for Tensor, TL and TV (#3407)
  • Fix memory leak in async_pool destructor. (#3402)
  • Fix off build (#3399)
  • Fix HW decoder overwriting growth factor for CPU buffers (#3398)
  • Fix libtiff build (#3392)
  • Fix the memory kind stored in AllocInfo in nvjpeg memory. (#3393)
  • Fix bug in TensorList test (#3388)
  • Adjust default eps in video test (#3389)
  • Fix FFMPEG conda build (#3386)
  • Fix errors in TF YOLO example (#3379)
  • Adjust growth and shrink threshold for cpu buffers (#3378)
  • Fix error reporting in TL3_EfficientDet_convergence and TL3_YOLO_convergence (#3376)
  • Fix problems detected by asan and lsan (#3367)
  • Fix Coverity issues (#3366)
  • Fix EfficientDet docs link (#3364)
  • Fix Video reader race condition (#3355)
  • Fix variable batch size handling in pad operator (#3354)
  • Fix bugs in C API and refactor tests (#3350)
  • Fix and optimize name handling in TypeInfo. (#3349)
  • Fix sequence rearrange python test (#3353)
  • Handle SIGV situation when trying to load prebuild DALI TF Plugin (#3347)
  • Fix DeviceBuffer copy - use proper copy function. (#3344)
  • Skip Keras TF tests in versions with broken execption handling (#3341)
  • Fix squeeze operator test on Python3.7 and earlier (#3337)
  • Use memory resources in DeviceBuffer and TestTensorList. (#3334)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.7.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.7.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.7.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.7.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code: