Skip to content

Commit

Permalink
DAOS-14105 object: collectively punch object
Browse files Browse the repository at this point in the history
Currently, when punch an object with multiple redundancy groups,
to guarantee the atomicity, we handle the whole punch via single
internal distributed transaction. The DTX leader will forward the
CPD RPC to every object shard within the same transaction. For a
large-scaled object, such as a SX object, punching it will generate
N RPCs (N is equal to the count of all the vos targets in the system).
That will be very slow and hold a lot of system resource for relative
long time. If the system is under heavy load, related RPC(s) may get
timeout, then trigger DTX abort, and then client will resend RPC to
the DTX leader for retry, that will make the situation to be worse
and worse.

To resolve such bad situation, we will collectively punch the object.

The basic idea is that: when punch an object with multiple redundancy
groups, the client will send OBJ_COLL_PUNCH RPC to the DTX leader. On
the DTX leader, instead of forwarding the request to all related vos
targets, it uses bcast RPC to spread the OBJ_COLL_PUNCH request to all
involved engines. And then related engines will generate collective
tasks to punch the object shards on each own local vos targets. That
will save a lot of RPCs and resources.

On the other hand, for large-scaled object, transferring related DTX
participants information (that will be huge) will be heavy burden in
spite of via RPC body or RDMA (for bulk data). So OBJ_COLL_PUNCH RPC
does not transfer dtx_memberships, instead, related engines in spite
leader or not, will calculate the dtx_memberships data based on the
obejct layout by themselves. That will cause some overhead. Compare
with broadcast huge DTX participants information on network, it may
be better choice.

Introduce two environment varilables to control the collective punch:

DTX_COLL_TREE_WIDTH: the bcast RPC tree width for collective transaction
on server. The valid range is [4, 64], the default value is 16.

OBJ_COLL_PUNCH_THRESHOLD: the threshold for triggerring collectively
punch object on client. The default (and also the min) value is 16.

Required-githooks: true

Signed-off-by: Fan Yong <fan.yong@intel.com>
  • Loading branch information
Nasf-Fan committed Nov 29, 2023
1 parent c7df5df commit 4456c34
Show file tree
Hide file tree
Showing 34 changed files with 2,953 additions and 500 deletions.
25 changes: 16 additions & 9 deletions src/container/srv_target.c
Original file line number Diff line number Diff line change
Expand Up @@ -1621,6 +1621,8 @@ ds_cont_tgt_open(uuid_t pool_uuid, uuid_t cont_hdl_uuid,
uuid_t cont_uuid, uint64_t flags, uint64_t sec_capas,
uint32_t status_pm_ver)
{
int *exclude_tgts = NULL;
uint32_t exclude_tgt_nr = 0;
struct cont_tgt_open_arg arg = { 0 };
struct dss_coll_ops coll_ops = { 0 };
struct dss_coll_args coll_args = { 0 };
Expand Down Expand Up @@ -1657,18 +1659,22 @@ ds_cont_tgt_open(uuid_t pool_uuid, uuid_t cont_hdl_uuid,
coll_args.ca_func_args = &arg;

/* setting aggregator args */
rc = ds_pool_get_failed_tgt_idx(pool_uuid, &coll_args.ca_exclude_tgts,
&coll_args.ca_exclude_tgts_cnt);
if (rc) {
rc = ds_pool_get_failed_tgt_idx(pool_uuid, &exclude_tgts, &exclude_tgt_nr);
if (rc != 0) {
D_ERROR(DF_UUID "failed to get index : rc "DF_RC"\n",
DP_UUID(pool_uuid), DP_RC(rc));
return rc;
goto out;
}

rc = dss_thread_collective_reduce(&coll_ops, &coll_args, 0);
D_FREE(coll_args.ca_exclude_tgts);
if (exclude_tgts != NULL) {
rc = dss_build_coll_bitmap(exclude_tgts, exclude_tgt_nr, &coll_args.ca_tgt_bitmap,
&coll_args.ca_tgt_bitmap_sz);
if (rc != 0)
goto out;
}

if (rc != 0) {
rc = dss_thread_collective_reduce(&coll_ops, &coll_args, 0);
if (rc != 0)
/* Once it exclude the target from the pool, since the target
* might still in the cart group, so IV cont open might still
* come to this target, especially if cont open/close will be
Expand All @@ -1678,9 +1684,10 @@ ds_cont_tgt_open(uuid_t pool_uuid, uuid_t cont_hdl_uuid,
D_ERROR("open "DF_UUID"/"DF_UUID"/"DF_UUID":"DF_RC"\n",
DP_UUID(pool_uuid), DP_UUID(cont_uuid),
DP_UUID(cont_hdl_uuid), DP_RC(rc));
return rc;
}

out:
D_FREE(coll_args.ca_tgt_bitmap);
D_FREE(exclude_tgts);
return rc;
}

Expand Down
3 changes: 2 additions & 1 deletion src/dtx/SConscript
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ def scons():
# dtx
denv.Append(CPPDEFINES=['-DDAOS_PMEM_BUILD'])
dtx = denv.d_library('dtx',
['dtx_srv.c', 'dtx_rpc.c', 'dtx_resync.c', 'dtx_common.c', 'dtx_cos.c'],
['dtx_srv.c', 'dtx_rpc.c', 'dtx_resync.c', 'dtx_common.c', 'dtx_cos.c',
'dtx_coll.c'],
install_off="../..")
denv.Install('$PREFIX/lib64/daos_srv', dtx)

Expand Down
Loading

0 comments on commit 4456c34

Please sign in to comment.