Further tune vector-matrix product #253

robertknight · 2024-06-27T21:12:24Z

The kernel prefers the K dimension to be small if B has unit column stride or large if it has unit row stride. In the case of unit column stride, reducing the K dimension further helped.

These changes made GPT-2 medium fp32 about 10% faster on an Ice Lake i5.

The kernel prefers the K dimension to be small if B has unit column stride or large if it has unit row stride. In the case of unit column stride, reducing the K dimension further helped. These changes made GPT-2 medium fp32 about 10% faster on an Ice Lake i5.

robertknight merged commit 9eea36d into main Jun 27, 2024
2 checks passed

robertknight deleted the gemv-tune-2 branch June 27, 2024 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further tune vector-matrix product #253

Further tune vector-matrix product #253

robertknight commented Jun 27, 2024

Further tune vector-matrix product #253

Further tune vector-matrix product #253

Conversation

robertknight commented Jun 27, 2024