Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-14105 object: collectively punch object #13386

Closed
wants to merge 5 commits into from

Conversation

Nasf-Fan
Copy link
Contributor

@Nasf-Fan Nasf-Fan commented Nov 26, 2023

Currently, when punch an object with multiple redundancy groups, to guarantee the atomicity, we handle the whole punch via single internal distributed transaction. The DTX leader will forward the CPD RPC to every object shard within the same transaction. For a large-scaled object, such as a SX object, punching it will generate N RPCs (N is equal to the count of all the vos targets in the system). That will be very slow and hold a lot of system resource for relative long time. If the system is under heavy load, related RPC(s) may get timeout, then trigger DTX abort, and then client will resend RPC to the DTX leader for retry, that will make the situation to be worse and worse.

To resolve such bad situation, we will collectively punch the object.

The basic idea is that: when punch an object with multiple redundancy groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On the DTX leader, instead of forwarding the request to all related vos targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all involved engines. And then related engines will generate collective tasks to punch the object shards on each own local vos targets. That will save a lot of RPCs and resources.

On the other hand, for large-scaled object, transferring related DTX participants information (that will be huge) will be heavy burden in spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC does not transfer dtx_memberships, instead, related engines in spite leader or not, will calculate the dtx_memberships data based on the obejct layout by themselves. That will cause some overhead. Compare with broadcast huge DTX participants information on network, it may be better choice.

Introduce two environment varilables to control the collective punch:

DAOS_DTX_COLL_TREE_WIDTH:
The bcast RPC tree width for collective transaction on server. The valid range is [4, 64].
The default value is 16.

DAOS_OBJ_COLL_PUNCH_THRESHOLD:
The threshold for triggering collectively punch object on client.
The default (and also the min) value is 16.

Required-githooks: true

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate watchers.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

Bug-tracker data:
Ticket title is 'Punch large-scaled object collectively'
Status is 'Awaiting Verification'
Labels: 'tds'
https://daosio.atlassian.net/browse/DAOS-14105

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/1/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/1/execution/node/1560/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/2/execution/node/1331/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/3/execution/node/1332/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/4/execution/node/1332/log

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Dec 5, 2023

Passed CI tests, but have to rebase to resolve the merge conflict.

@Nasf-Fan Nasf-Fan marked this pull request as ready for review December 5, 2023 15:45
@Nasf-Fan Nasf-Fan requested review from a team as code owners December 5, 2023 15:45
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Contributor

@daltonbohning daltonbohning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftest LGTM

@daltonbohning
Copy link
Contributor

Due to the size of this PR, shouldn't it probably run with some Features:?

@Nasf-Fan
Copy link
Contributor Author

Nasf-Fan commented Dec 6, 2023

Due to the size of this PR, shouldn't it probably run with some Features:?

Any suggested features to be tested? Thanks! @daltonbohning

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.4 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/403/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/380/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/401/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.4 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/395/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/15/execution/node/373/log

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-14105_6 branch 4 times, most recently from 194e753 to 8d7b90d Compare December 8, 2023 15:46
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/20/testReport/

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/21/testReport/

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/22/execution/node/1462/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13386/22/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13386/22/execution/node/1435/log

Currently, when punch an object with multiple redundancy groups,
to guarantee the atomicity, we handle the whole punch via single
internal distributed transaction. The DTX leader will forward the
CPD RPC to every object shard within the same transaction. For a
large-scaled object, such as a SX object, punching it will generate
N RPCs (N is equal to the count of all the vos targets in the system).
That will be very slow and hold a lot of system resource for relative
long time. If the system is under heavy load, related RPC(s) may get
timeout, then trigger DTX abort, and then client will resend RPC to
the DTX leader for retry, that will make the situation to be worse
and worse.

To resolve such bad situation, we will collectively punch the object.

The basic idea is that: when punch an object with multiple redundancy
groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On
the DTX leader, instead of forwarding the request to all related vos
targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all
involved engines. And then related engines will generate collective
tasks to punch the object shards on each own local vos targets. That
will save a lot of RPCs and resources.

On the other hand, for large-scaled object, transferring related DTX
participants information (that will be huge) will be heavy burden in
spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC
does not transfer dtx_memberships, instead, related engines in spite
leader or not, will calculate the dtx_memberships data based on the
obejct layout by themselves. That will cause some overhead. Compare
with broadcast huge DTX participants information on network, it may
be better choice.

Introduce two environment varilables to control the collective punch:

DAOS_DTX_COLL_TREE_WIDTH:
The bcast RPC tree width for collective transaction on server.
The valid range is [4, 64], the default value is 16.

DAOS_OBJ_COLL_PUNCH_THD:
The threshold for triggering collectively punch object on client.
The default (and also the min) value is 16.

Required-githooks: true

Signed-off-by: Fan Yong <fan.yong@intel.com>
From client perspective, the latency for collective punch will be redunced.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Then it bypasses pool_map_find_target() when need to locate DAOS
target according to object layout.

Signed-off-by: Fan Yong <fan.yong@intel.com>
That will distribute collective punch load to IO handler XS.

Signed-off-by: Fan Yong <fan.yong@intel.com>
For locating the performance bottle neck.

Signed-off-by: Fan Yong <fan.yong@intel.com>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@Nasf-Fan
Copy link
Contributor Author

Replaced by #13493

@Nasf-Fan Nasf-Fan closed this Dec 14, 2023
@Nasf-Fan Nasf-Fan deleted the Nasf-Fan/DAOS-14105_6 branch March 8, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants