DAOS-14105 object: collectively punch object #13386

Nasf-Fan · 2023-11-26T07:39:14Z

Currently, when punch an object with multiple redundancy groups, to guarantee the atomicity, we handle the whole punch via single internal distributed transaction. The DTX leader will forward the CPD RPC to every object shard within the same transaction. For a large-scaled object, such as a SX object, punching it will generate N RPCs (N is equal to the count of all the vos targets in the system). That will be very slow and hold a lot of system resource for relative long time. If the system is under heavy load, related RPC(s) may get timeout, then trigger DTX abort, and then client will resend RPC to the DTX leader for retry, that will make the situation to be worse and worse.

To resolve such bad situation, we will collectively punch the object.

The basic idea is that: when punch an object with multiple redundancy groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On the DTX leader, instead of forwarding the request to all related vos targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all involved engines. And then related engines will generate collective tasks to punch the object shards on each own local vos targets. That will save a lot of RPCs and resources.

On the other hand, for large-scaled object, transferring related DTX participants information (that will be huge) will be heavy burden in spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC does not transfer dtx_memberships, instead, related engines in spite leader or not, will calculate the dtx_memberships data based on the obejct layout by themselves. That will cause some overhead. Compare with broadcast huge DTX participants information on network, it may be better choice.

Introduce two environment varilables to control the collective punch:

DAOS_DTX_COLL_TREE_WIDTH:
The bcast RPC tree width for collective transaction on server. The valid range is [4, 64].
The default value is 16.

DAOS_OBJ_COLL_PUNCH_THRESHOLD:
The threshold for triggering collectively punch object on client.
The default (and also the min) value is 16.

Required-githooks: true

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

github-actions · 2023-11-26T07:39:33Z

Bug-tracker data:
Ticket title is 'Punch large-scaled object collectively'
Status is 'Awaiting Verification'
Labels: 'tds'
https://daosio.atlassian.net/browse/DAOS-14105

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-11-26T08:19:36Z

Test stage NLT on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/1/testReport/

daosbuild1 · 2023-11-26T22:03:52Z

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/1/execution/node/1560/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-11-29T07:02:45Z

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/2/execution/node/1331/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-11-30T01:09:54Z

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/3/execution/node/1332/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-11-30T06:31:58Z

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/4/execution/node/1332/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1

LGTM. No errors found by checkpatch.

Nasf-Fan · 2023-12-05T15:26:18Z

Passed CI tests, but have to rebase to resolve the merge conflict.

daosbuild1

LGTM. No errors found by checkpatch.

daltonbohning

ftest LGTM

daltonbohning · 2023-12-05T16:39:25Z

Due to the size of this PR, shouldn't it probably run with some Features:?

Nasf-Fan · 2023-12-06T13:43:24Z

Due to the size of this PR, shouldn't it probably run with some Features:?

Any suggested features to be tested? Thanks! @daltonbohning

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-12-07T15:58:28Z

Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/403/log

daosbuild1 · 2023-12-07T16:02:36Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/380/log

daosbuild1 · 2023-12-07T16:04:46Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/401/log

daosbuild1 · 2023-12-07T16:06:29Z

Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/395/log

daosbuild1 · 2023-12-07T16:09:14Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/373/log

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-12-08T18:40:03Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/20/testReport/

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-12-09T05:26:20Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/21/testReport/

daosbuild1

LGTM. No errors found by checkpatch.

daosbuild1 · 2023-12-11T17:36:08Z

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/22/execution/node/1462/log

daosbuild1 · 2023-12-11T19:23:01Z

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/22/testReport/

daosbuild1 · 2023-12-11T20:37:08Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/22/execution/node/1435/log

Currently, when punch an object with multiple redundancy groups, to guarantee the atomicity, we handle the whole punch via single internal distributed transaction. The DTX leader will forward the CPD RPC to every object shard within the same transaction. For a large-scaled object, such as a SX object, punching it will generate N RPCs (N is equal to the count of all the vos targets in the system). That will be very slow and hold a lot of system resource for relative long time. If the system is under heavy load, related RPC(s) may get timeout, then trigger DTX abort, and then client will resend RPC to the DTX leader for retry, that will make the situation to be worse and worse. To resolve such bad situation, we will collectively punch the object. The basic idea is that: when punch an object with multiple redundancy groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On the DTX leader, instead of forwarding the request to all related vos targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all involved engines. And then related engines will generate collective tasks to punch the object shards on each own local vos targets. That will save a lot of RPCs and resources. On the other hand, for large-scaled object, transferring related DTX participants information (that will be huge) will be heavy burden in spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC does not transfer dtx_memberships, instead, related engines in spite leader or not, will calculate the dtx_memberships data based on the obejct layout by themselves. That will cause some overhead. Compare with broadcast huge DTX participants information on network, it may be better choice. Introduce two environment varilables to control the collective punch: DAOS_DTX_COLL_TREE_WIDTH: The bcast RPC tree width for collective transaction on server. The valid range is [4, 64], the default value is 16. DAOS_OBJ_COLL_PUNCH_THD: The threshold for triggering collectively punch object on client. The default (and also the min) value is 16. Required-githooks: true Signed-off-by: Fan Yong <fan.yong@intel.com>

From client perspective, the latency for collective punch will be redunced. Signed-off-by: Fan Yong <fan.yong@intel.com>

Then it bypasses pool_map_find_target() when need to locate DAOS target according to object layout. Signed-off-by: Fan Yong <fan.yong@intel.com>

That will distribute collective punch load to IO handler XS. Signed-off-by: Fan Yong <fan.yong@intel.com>

For locating the performance bottle neck. Signed-off-by: Fan Yong <fan.yong@intel.com>

daosbuild1

LGTM. No errors found by checkpatch.

Nasf-Fan · 2023-12-14T15:14:28Z

Replaced by #13493

daosbuild1 reviewed Nov 26, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 54abfb6 to 4456c34 Compare November 29, 2023 03:05

daosbuild1 reviewed Nov 29, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 4456c34 to 489c83a Compare November 29, 2023 17:25

daosbuild1 reviewed Nov 29, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 489c83a to b092617 Compare November 30, 2023 02:04

daosbuild1 reviewed Nov 30, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from b092617 to 8c62f1a Compare December 1, 2023 03:50

daosbuild1 reviewed Dec 1, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 8c62f1a to 9d63542 Compare December 1, 2023 13:10

daosbuild1 reviewed Dec 1, 2023

View reviewed changes

Nasf-Fan marked this pull request as ready for review December 5, 2023 15:45

Nasf-Fan requested review from a team as code owners December 5, 2023 15:45

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 9d63542 to 19a0fc8 Compare December 5, 2023 15:46

daosbuild1 reviewed Dec 5, 2023

View reviewed changes

daltonbohning reviewed Dec 5, 2023

View reviewed changes

Nasf-Fan mentioned this pull request Dec 6, 2023

DAOS-14105 object: collectively punch object #13287

Closed

18 tasks

Nasf-Fan requested a review from daltonbohning December 7, 2023 05:30

daosbuild1 reviewed Dec 7, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch 4 times, most recently from 194e753 to 8d7b90d Compare December 8, 2023 15:46

daosbuild1 reviewed Dec 8, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 8d7b90d to 756d339 Compare December 9, 2023 02:44

daosbuild1 reviewed Dec 9, 2023

View reviewed changes

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 756d339 to 0646655 Compare December 9, 2023 06:01

daosbuild1 reviewed Dec 9, 2023

View reviewed changes

Nasf-Fan added 5 commits December 12, 2023 08:46

DAOS-14105 dtx: asynchronously commit collective DTX

9c0508c

From client perspective, the latency for collective punch will be redunced. Signed-off-by: Fan Yong <fan.yong@intel.com>

DAOS-14105 placement: pack target information into object layout

4ff610e

Then it bypasses pool_map_find_target() when need to locate DAOS target according to object layout. Signed-off-by: Fan Yong <fan.yong@intel.com>

DAOS-14105 object: send collective RPC to non-system XS

8c9c87b

That will distribute collective punch load to IO handler XS. Signed-off-by: Fan Yong <fan.yong@intel.com>

DAOS-14105 object: latency metrics for collective punch

15704d4

For locating the performance bottle neck. Signed-off-by: Fan Yong <fan.yong@intel.com>

Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch from 0646655 to 15704d4 Compare December 12, 2023 01:06

daosbuild1 reviewed Dec 12, 2023

View reviewed changes

Nasf-Fan closed this Dec 14, 2023

Nasf-Fan deleted the Nasf-Fan/DAOS-14105_6 branch March 8, 2024 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-14105 object: collectively punch object #13386

DAOS-14105 object: collectively punch object #13386

Nasf-Fan commented Nov 26, 2023 •

edited

Loading

github-actions bot commented Nov 26, 2023

daosbuild1 left a comment

daosbuild1 commented Nov 26, 2023

daosbuild1 commented Nov 26, 2023

daosbuild1 left a comment

daosbuild1 commented Nov 29, 2023

daosbuild1 left a comment

daosbuild1 commented Nov 30, 2023

daosbuild1 left a comment

daosbuild1 commented Nov 30, 2023

daosbuild1 left a comment

daosbuild1 left a comment

Nasf-Fan commented Dec 5, 2023

daosbuild1 left a comment

daltonbohning left a comment

daltonbohning commented Dec 5, 2023

Nasf-Fan commented Dec 6, 2023 •

edited

Loading

daosbuild1 left a comment

daosbuild1 left a comment

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 left a comment

daosbuild1 commented Dec 8, 2023

daosbuild1 left a comment

daosbuild1 commented Dec 9, 2023

daosbuild1 left a comment

daosbuild1 commented Dec 11, 2023

daosbuild1 commented Dec 11, 2023

daosbuild1 commented Dec 11, 2023

daosbuild1 left a comment

Nasf-Fan commented Dec 14, 2023

DAOS-14105 object: collectively punch object #13386

DAOS-14105 object: collectively punch object #13386

Conversation

Nasf-Fan commented Nov 26, 2023 • edited Loading

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Nov 26, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Nov 26, 2023

daosbuild1 commented Nov 26, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Nov 29, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Nov 30, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Nov 30, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

Nasf-Fan commented Dec 5, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daltonbohning left a comment

Choose a reason for hiding this comment

daltonbohning commented Dec 5, 2023

Nasf-Fan commented Dec 6, 2023 • edited Loading

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 commented Dec 7, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Dec 8, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Dec 9, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

daosbuild1 commented Dec 11, 2023

daosbuild1 commented Dec 11, 2023

daosbuild1 commented Dec 11, 2023

daosbuild1 left a comment

Choose a reason for hiding this comment

Nasf-Fan commented Dec 14, 2023

Nasf-Fan commented Nov 26, 2023 •

edited

Loading

Nasf-Fan commented Dec 6, 2023 •

edited

Loading