-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14105 object: collectively punch object #13386
Conversation
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage NLT on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/1/testReport/ |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/1/execution/node/1560/log |
54abfb6
to
4456c34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/2/execution/node/1331/log |
4456c34
to
489c83a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/3/execution/node/1332/log |
489c83a
to
b092617
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/4/execution/node/1332/log |
b092617
to
8c62f1a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
8c62f1a
to
9d63542
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Passed CI tests, but have to rebase to resolve the merge conflict. |
9d63542
to
19a0fc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
Due to the size of this PR, shouldn't it probably run with some |
Any suggested features to be tested? Thanks! @daltonbohning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/403/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/380/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/401/log |
Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/395/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/373/log |
194e753
to
8d7b90d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/20/testReport/ |
8d7b90d
to
756d339
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/21/testReport/ |
756d339
to
0646655
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/22/execution/node/1462/log |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/22/testReport/ |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/22/execution/node/1435/log |
Currently, when punch an object with multiple redundancy groups, to guarantee the atomicity, we handle the whole punch via single internal distributed transaction. The DTX leader will forward the CPD RPC to every object shard within the same transaction. For a large-scaled object, such as a SX object, punching it will generate N RPCs (N is equal to the count of all the vos targets in the system). That will be very slow and hold a lot of system resource for relative long time. If the system is under heavy load, related RPC(s) may get timeout, then trigger DTX abort, and then client will resend RPC to the DTX leader for retry, that will make the situation to be worse and worse. To resolve such bad situation, we will collectively punch the object. The basic idea is that: when punch an object with multiple redundancy groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On the DTX leader, instead of forwarding the request to all related vos targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all involved engines. And then related engines will generate collective tasks to punch the object shards on each own local vos targets. That will save a lot of RPCs and resources. On the other hand, for large-scaled object, transferring related DTX participants information (that will be huge) will be heavy burden in spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC does not transfer dtx_memberships, instead, related engines in spite leader or not, will calculate the dtx_memberships data based on the obejct layout by themselves. That will cause some overhead. Compare with broadcast huge DTX participants information on network, it may be better choice. Introduce two environment varilables to control the collective punch: DAOS_DTX_COLL_TREE_WIDTH: The bcast RPC tree width for collective transaction on server. The valid range is [4, 64], the default value is 16. DAOS_OBJ_COLL_PUNCH_THD: The threshold for triggering collectively punch object on client. The default (and also the min) value is 16. Required-githooks: true Signed-off-by: Fan Yong <fan.yong@intel.com>
From client perspective, the latency for collective punch will be redunced. Signed-off-by: Fan Yong <fan.yong@intel.com>
Then it bypasses pool_map_find_target() when need to locate DAOS target according to object layout. Signed-off-by: Fan Yong <fan.yong@intel.com>
That will distribute collective punch load to IO handler XS. Signed-off-by: Fan Yong <fan.yong@intel.com>
For locating the performance bottle neck. Signed-off-by: Fan Yong <fan.yong@intel.com>
0646655
to
15704d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
Replaced by #13493 |
Currently, when punch an object with multiple redundancy groups, to guarantee the atomicity, we handle the whole punch via single internal distributed transaction. The DTX leader will forward the CPD RPC to every object shard within the same transaction. For a large-scaled object, such as a SX object, punching it will generate N RPCs (N is equal to the count of all the vos targets in the system). That will be very slow and hold a lot of system resource for relative long time. If the system is under heavy load, related RPC(s) may get timeout, then trigger DTX abort, and then client will resend RPC to the DTX leader for retry, that will make the situation to be worse and worse.
To resolve such bad situation, we will collectively punch the object.
The basic idea is that: when punch an object with multiple redundancy groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On the DTX leader, instead of forwarding the request to all related vos targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all involved engines. And then related engines will generate collective tasks to punch the object shards on each own local vos targets. That will save a lot of RPCs and resources.
On the other hand, for large-scaled object, transferring related DTX participants information (that will be huge) will be heavy burden in spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC does not transfer dtx_memberships, instead, related engines in spite leader or not, will calculate the dtx_memberships data based on the obejct layout by themselves. That will cause some overhead. Compare with broadcast huge DTX participants information on network, it may be better choice.
Introduce two environment varilables to control the collective punch:
DAOS_DTX_COLL_TREE_WIDTH:
The bcast RPC tree width for collective transaction on server. The valid range is [4, 64].
The default value is 16.
DAOS_OBJ_COLL_PUNCH_THRESHOLD:
The threshold for triggering collectively punch object on client.
The default (and also the min) value is 16.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: