Parallelize AveragePool op #138

robertknight · 2024-05-01T07:47:56Z

Generalize the existing parallel MaxPool implementation and use it to implement both MaxPool and AveragePool. This should generalize to LpPool in future too. The result is still far from optimal, but it serves as a starting point for implementing pooling ops for each reduction type (max, average, lp) and future data type.

On a Yolov9e ONNX model taken from HuggingFace this reduces time in AveragePool ops by about 2x on my system (30ms -> 14ms with a 256x256 input).

Generalize the existing parallel MaxPool implementation and use it to implement both MaxPool and AveragePool. The result is still far from optimal, but it serves as a starting point for implementing pooling ops for each reduction type (max, average, lp) and future data type.

robertknight merged commit 96c55c3 into main May 1, 2024
2 checks passed

robertknight deleted the generic-pool branch May 1, 2024 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize AveragePool op #138

Parallelize AveragePool op #138

robertknight commented May 1, 2024 •

edited

Loading

Parallelize AveragePool op #138

Parallelize AveragePool op #138

Conversation

robertknight commented May 1, 2024 • edited Loading

robertknight commented May 1, 2024 •

edited

Loading