Quda work ndg force #612

Marcogarofalo · 2024-04-15T14:13:27Z

No description provided.

…QUDA

kostrzewa · 2024-04-19T11:53:27Z

Awesome, this is working great. Here's a comparison on Juwels Booster on 4 nodes, 64c128 at the physical point with consistent random numbers.

no force offloading (for reference)

00001054 0.545663519531 0.020930789411 9.792867e-01 81 14613 1688 261376 327 32074 36839 0 1658 10002 1160 54301 143 9753 4907 97531 1 1.469851e+04 3.131941e-01
00001055 0.545676315454 -0.208904463798 1.232327e+00 80 14626 1609 261254 323 32075 36858 0 1706 9976 1140 54258 140 9681 4871 97607 1 1.460724e+04 3.132264e-01
00001056 0.545682037202 0.138889044523 8.703246e-01 80 14594 1654 260446 324 31963 35986 0 1649 10182 1133 54098 148 9950 4767 96687 1 1.474566e+04 3.132276e-01
00001057 0.545682509451 0.226925754920 7.969800e-01 80 14507 1617 258577 320 31691 35898 0 1681 10081 1123 53740 140 9920 4733 96450 1 1.461649e+04 3.132220e-01

light force offloading only

00001401 0.545681973756 0.569591499865 5.657565e-01 80 14628 1603 259506 315 31451 35516 0 1624 10011 1109 53519 138 9568 4718 96323 1 1.242204e+04 3.131983e-01
00001402 0.545701067640 0.041082913056 9.597496e-01 81 14689 1610 261058 316 31719 35647 0 1627 10023 1111 53903 145 9794 4727 97032 1 1.396068e+04 3.132205e-01
00001403 0.545696849214 0.160429969430 8.517775e-01 79 14590 1584 259024 311 31397 35699 0 1634 10081 1099 53497 141 9808 4745 96752 1 1.282431e+04 3.132271e-01
00001404 0.545662972281 -0.045102979988 1.046136e+00 80 14479 1594 256674 314 31056 35580 0 1608 10038 1103 52961 143 9822 4731 95628 1 1.241667e+04 3.131858e-01

+ ND force offloading

(first trajectory includes tuning)

00001401 0.545681973797 0.569108584896 5.660298e-01 80 14628 1603 259486 315 31451 35526 0 1626 10007 1107 53504 137 9569 4719 96307 1 1.025544e+04 3.131983e-01
00001402 0.545701067693 0.040016509593 9.607736e-01 81 14690 1611 261098 316 31727 35650 0 1627 10010 1112 53930 136 9832 4725 97018 1 8.803802e+03 3.132205e-01
00001403 0.545696849273 0.159103434533 8.529081e-01 79 14586 1583 258986 312 31393 35698 0 1636 10063 1101 53510 139 9783 4743 96757 1 8.879981e+03 3.132271e-01
00001404 0.545662972387 -0.047294547781 1.048431e+00 80 14479 1596 256673 313 31054 35575 0 1606 10046 1102 52984 143 9825 4727 95639 1 9.201252e+03 3.131858e-01

kostrzewa · 2024-04-19T11:55:42Z

The speed-up will be even greater on a machine like Leonardo or LUMI-G. I'll let Andrey know that he can run some first tests for the finite-temperature runs. I'll put you in CC @Marcogarofalo

urbach · 2024-04-19T11:56:07Z

super!

kostrzewa · 2024-04-19T12:00:19Z

14500 -> 12500 -> 9000 !

kostrzewa · 2024-04-19T13:22:59Z

@Marcogarofalo There's an issue with the timing on the QUDA side. It seems like the time spent in computeTMCloverForceQuda is counted internally multiple times.

kostrzewa · 2024-04-19T13:28:13Z

What I mean is the following:

   computeTMCloverForceQuda Total time =   382.651 secs
                 download     =    95.261 secs ( 24.895%),       with     2582 calls at 3.689e+04 us per call
                   upload     =    83.468 secs ( 21.813%),       with     1033 calls at 8.080e+04 us per call
                     init     =    15.232 secs (  3.981%),       with    26165 calls at 5.821e+02 us per call
                  compute     =  7920.459 secs (2069.889%),      with   292924 calls at 2.704e+04 us per call
                    comms     =    54.129 secs ( 14.146%),       with     6426 calls at 8.423e+03 us per call
                     free     =    20.674 secs (  5.403%),       with   236599 calls at 8.738e+01 us per call
        total accounted       =  8189.223 secs (2140.127%)
        total missing         = -7806.571 secs (-2040.127%)
WARNING: Accounted time  8189.223 secs in computeTMCloverForceQuda is greater than total time   382.651 secs

This doesn't affect anything on our side but it does mess with the QUDA profile.

Marcogarofalo · 2024-04-23T14:01:40Z

here is a comparison of the data before and after the last commit, the speedup can not be seen in such a small test:

debug level 1 rel precision + no strict checks b415eb6

00000000 0.112190826944 8544.821712773226 0.000000e+00 56 182 128 246 209 338 0 1.132254e+00 5.069101e-02
00000001 0.112190826944 10141.057194254070 0.000000e+00 56 184 127 245 206 340 0 2.598577e-01 5.069101e-02
00000002 0.112190826944 8569.440461946775 0.000000e+00 55 182 125 241 206 335 0 2.536940e-01 5.069101e-02
00000003 0.112190826944 7382.444491649191 0.000000e+00 55 181 124 239 204 334 0 2.550959e-01 5.069101e-02

debug level 1 rel precision + no strict checks, e29573f

00000000 0.112190826944 8544.821712773226 0.000000e+00 56 182 128 246 209 338 0 4.350938e+00 5.069101e-02
00000001 0.112190826944 10141.057194254070 0.000000e+00 56 184 127 245 206 340 0 2.661940e-01 5.069101e-02
00000002 0.112190826944 8569.440461946775 0.000000e+00 55 182 125 241 206 335 0 2.616898e-01 5.069101e-02
00000003 0.112190826944 7382.444491649191 0.000000e+00 55 181 124 239 204 334 0 2.592305e-01 5.069101e-02

debug level 4 rel precision + no strict checks b415eb6

00000000 0.112190826944 8544.821712773226 0.000000e+00 56 364 128 492 209 676 0 1.407556e+00 5.069101e-02
00000001 0.112190826944 10141.057194254070 0.000000e+00 56 368 127 490 206 680 0 5.201004e-01 5.069101e-02
00000002 0.112190826944 8569.440461946775 0.000000e+00 55 364 125 482 206 670 0 5.171427e-01 5.069101e-02
00000003 0.112190826944 7382.444491649191 0.000000e+00 55 362 124 478 204 668 0 5.416352e-01 5.069101e-02

debug level 4 rel precision + no strict checks, e29573f

00000000 0.112190826944 8544.821712773226 0.000000e+00 56 364 128 492 209 676 0 1.419756e+00 5.069101e-02
00000001 0.112190826944 10141.057194254070 0.000000e+00 56 368 127 490 206 680 0 4.816360e-01 5.069101e-02
00000002 0.112190826944 8569.440461946775 0.000000e+00 55 364 125 482 206 670 0 4.727600e-01 5.069101e-02
00000003 0.112190826944 7382.444491649191 0.000000e+00 55 362 124 478 204 668 0 4.695284e-01 5.069101e-02

Marcogarofalo added 19 commits December 15, 2023 17:45

start ndcloverrat derivative quda

afbf25c

remove reset of mom_quda

725c28f

passing the spinors to quda

5dfe316

Merge branch 'quda_work_clover_force' into quda_work_ndg_force

c138b01

Merge branch 'quda_work_clover_force' into quda_work_ndg_force

b72a1c9

pass a doublet to quda

f04b49b

more printing

74d95a3

add TODO

a9c7c8c

allocate vector of coeffecients

b8923aa

correct compilation error

e4c7760

setup coefficient for ndg force

e492a33

trace log term

298339c

compute trlog when needed

799bdb5

remove printing for quda debug

971243e

use new parameter evmax

1bb347c

free memory

a5f6fef

cleaning up

161587f

remove some warning

d198f15

NDCLOVERRAT derivative on QUDA only if also the inversion is done on …

b415eb6

…QUDA

do not compute sw_term when possible

e29573f

kostrzewa added 3 commits May 8, 2024 13:24

Merge remote-tracking branch 'origin/master' into quda_work_ndg_force

f0556da

update QUDA docs for NDG force

fffd452

remove unnecessary tm_stopwatch_push

cc59ad5

kostrzewa marked this pull request as ready for review May 8, 2024 12:45

kostrzewa self-requested a review May 8, 2024 12:46

kostrzewa approved these changes May 8, 2024

View reviewed changes

kostrzewa merged commit 950a3a1 into master May 8, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quda work ndg force #612

Quda work ndg force #612

Marcogarofalo commented Apr 15, 2024

kostrzewa commented Apr 19, 2024 •

edited

Loading

kostrzewa commented Apr 19, 2024

urbach commented Apr 19, 2024 via email

kostrzewa commented Apr 19, 2024

kostrzewa commented Apr 19, 2024

kostrzewa commented Apr 19, 2024

Marcogarofalo commented Apr 23, 2024

Quda work ndg force #612

Quda work ndg force #612

Conversation

Marcogarofalo commented Apr 15, 2024

kostrzewa commented Apr 19, 2024 • edited Loading

no force offloading (for reference)

light force offloading only

+ ND force offloading

kostrzewa commented Apr 19, 2024

urbach commented Apr 19, 2024 via email

kostrzewa commented Apr 19, 2024

kostrzewa commented Apr 19, 2024

kostrzewa commented Apr 19, 2024

Marcogarofalo commented Apr 23, 2024

kostrzewa commented Apr 19, 2024 •

edited

Loading