Vectorize audio resampling for ARM NEON. #3745

mzient · 2022-03-17T13:55:37Z

Signed-off-by: Michał Zientkiewicz mzient@gmail.com

Category:

Other Performance optimization (main purpose)
Bug fix (additional, in the same file)

Description:

Implements vectorized single-channel audio resampling for ARM NEON.

Additional information:

Vectorization is done differently than on SSE (pairwise loads are faster on ARM).
Other changes include improved handling of floor function.
Bug fix: SSE implementation used rounding instead of truncation - fixed.

Affected modules and functionalities:

Audio resampling (audio decoder).

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2651

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2022-03-17T13:57:49Z

CI MESSAGE: [4171295]: BUILD STARTED

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

JanuszL · 2022-03-17T14:54:31Z

dali/kernels/signal/resampling.h

+        vgetq_lane_s32(i, 2),
+        vgetq_lane_s32(i, 3)
+    };
+    float32x2_t c0 = vld1_f32(&lookup[idx[0]]);


JanuszL · 2022-03-17T14:57:16Z

dali/kernels/signal/resampling.h

+            f4 = vfmaq_f32(f4, vld1q_f32(in_block_ptr + i), w4);
+            x4 = vaddq_f32(x4, vdupq_n_f32(4));
+        }
+        // Reduce elements in f4


You can add the same comment in L214 - it was unclear for me why is happening there.

JanuszL · 2022-03-17T14:58:10Z

dali/kernels/signal/resampling.h

        float f = 0;
        int i = i0;

+#ifdef __ARM_NEON


Maybe we can extract this vectorized parts into separate functions, instead having a lot of variants for different architectures in one body.

Possibly, but I'd have to re-evaluate the performance. There's quite a bit of variables that are modified in the loop and they all would need to be passed by reference.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2022-03-17T18:02:36Z

CI MESSAGE: [4171320]: BUILD STARTED

dali-automaton · 2022-03-17T18:06:32Z

CI MESSAGE: [4173226]: BUILD STARTED

dali-automaton · 2022-03-17T23:59:15Z

CI MESSAGE: [4173226]: BUILD FAILED

dali-automaton · 2022-03-18T14:20:18Z

CI MESSAGE: [4180682]: BUILD STARTED

JanuszL · 2022-03-18T14:25:12Z

dali/kernels/signal/resampling.h

+    int i = i_ref;
+    float32x4_t x4 = vaddq_f32(vdupq_n_f32(i - in_pos), _0123);
+
+    for (; i + 3 <= i1; i += 4) {


Shouldn't you change that as well like in L174?

True. Fixed.

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

dali-automaton · 2022-03-18T14:37:18Z

CI MESSAGE: [4180791]: BUILD STARTED

dali-automaton · 2022-03-18T19:52:58Z

CI MESSAGE: [4180791]: BUILD PASSED

prak-nv · 2022-03-21T10:08:33Z

dali/kernels/signal/resampling.h

+#ifdef __ARM_NEON
+
+inline float32x4_t vsetq_f32(float x0, float x1, float x2, float x3) {
+    float32x4_t x;


Very little nitpick, could be float32x4_t x = vdubpq_n_f32(x0);

prak-nv

LGTM, asked out of curiosity

prak-nv · 2022-03-21T10:20:10Z

dali/kernels/signal/resampling.h

 #ifdef __SSE2__
  inline __m128 operator()(__m128 x) const {
    __m128 fi = _mm_add_ps(x * _mm_set1_ps(scale), _mm_set1_ps(center));
-    __m128i i = _mm_cvtps_epi32(fi);
+    __m128i i = _mm_cvttps_epi32(fi);


Was doing conversion without truncate an issue before?

Yes, it was a bug - I detected it now that there's a slight difference between ARM and SSE and dug deeper to find out that it's SSE that's wrong.

* Vectorize audio resampling for ARM NEON. * Fix rounding mode in SSE vectorization. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Vectorize audio resampling for ARM NEON.

752db6a

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

JanuszL self-assigned this Mar 17, 2022

Revert change made for building outside DALI source tree.

da2f3b6

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

JanuszL reviewed Mar 17, 2022

View reviewed changes

Refactorin, as per review.

a818c56

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

JanuszL approved these changes Mar 17, 2022

View reviewed changes

jantonguirao assigned prak-nv Mar 18, 2022

JanuszL reviewed Mar 18, 2022

View reviewed changes

Bugfix.

907fd20

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

mzient force-pushed the AudioResamplingNEON branch from e91d112 to 907fd20 Compare March 18, 2022 14:35

JanuszL self-requested a review March 18, 2022 16:26

JanuszL approved these changes Mar 18, 2022

View reviewed changes

JanuszL approved these changes Mar 21, 2022

View reviewed changes

prak-nv reviewed Mar 21, 2022

View reviewed changes

prak-nv approved these changes Mar 21, 2022

View reviewed changes

mzient merged commit 0fdc119 into NVIDIA:main Mar 21, 2022

JanuszL mentioned this pull request Mar 30, 2022

DALI 2022 roadmap #3774

Closed

cyyever pushed a commit to cyyever/DALI that referenced this pull request May 13, 2022

Vectorize audio resampling for ARM NEON. (NVIDIA#3745)

6369365

* Vectorize audio resampling for ARM NEON. * Fix rounding mode in SSE vectorization. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

cyyever pushed a commit to cyyever/DALI that referenced this pull request Jun 7, 2022

Vectorize audio resampling for ARM NEON. (NVIDIA#3745)

9869808

* Vectorize audio resampling for ARM NEON. * Fix rounding mode in SSE vectorization. Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize audio resampling for ARM NEON. #3745

Vectorize audio resampling for ARM NEON. #3745

mzient commented Mar 17, 2022

dali-automaton commented Mar 17, 2022

JanuszL Mar 17, 2022

JanuszL Mar 17, 2022

JanuszL Mar 17, 2022

mzient Mar 17, 2022

dali-automaton commented Mar 17, 2022

dali-automaton commented Mar 17, 2022

dali-automaton commented Mar 17, 2022

dali-automaton commented Mar 18, 2022

JanuszL Mar 18, 2022

mzient Mar 18, 2022 •

edited

Loading

dali-automaton commented Mar 18, 2022

dali-automaton commented Mar 18, 2022

prak-nv Mar 21, 2022

prak-nv left a comment

prak-nv Mar 21, 2022

mzient Mar 21, 2022

Vectorize audio resampling for ARM NEON. #3745

Vectorize audio resampling for ARM NEON. #3745

Conversation

mzient commented Mar 17, 2022

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Checklist

Tests

Documentation

DALI team only

Requirements

dali-automaton commented Mar 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Mar 17, 2022

dali-automaton commented Mar 17, 2022

dali-automaton commented Mar 17, 2022

dali-automaton commented Mar 18, 2022

Choose a reason for hiding this comment

mzient Mar 18, 2022 • edited Loading

Choose a reason for hiding this comment

dali-automaton commented Mar 18, 2022

dali-automaton commented Mar 18, 2022

Choose a reason for hiding this comment

prak-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Mar 18, 2022 •

edited

Loading