Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16168 build: Ignore scons version deprecation (#14715) #14745

Closed
wants to merge 77 commits into from

Conversation

jolivier23
Copy link
Contributor

Disable warning for deprecated support for python
version so it doesn't fail the build.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

chowes and others added 30 commits April 10, 2024 13:30
…iptor

libfuse supports opening /dev/fuse and passing the file descriptor as the
mountpoint. In some cases, realpath may not work for these file descriptors,
and so we should ignore ENOENT errors and instead check that we can get file
descriptor attributes from the given path.

Change-Id: I2e9aad0e11a4c6f27ec2c4b1aeb75fc651d2540d
The setuid, setgid, and sticky bit can cause fatal errors when the datamover
tool sets file permissions after copying a file, since these are not
supported by DFS.  We can just ignore this bit when calling dfs_chmod.

Change-Id: Ibf2b6d793f95dd59c902c8d847bc087fb479c5ea
In order to prevent known race to occur due to lack of
locking in Glibc environment APIs (getenv()/[uns]setenv()/
putenv()/clearenv()), they have been overloaded and
strengthened in Gurt with hooks now all using a common
lock/mutex.

Libgurt is the preferred place for this as it is the lowest
layer in DAOS, so it will be the earliest to be loaded and
will ensure the hook to be installed as early as possible
and could prevent usage of LD_PRELOAD.

This will address the main lack of multi-thread protection
in the Glibc APIs but do not handle all unsafe use-cases
(like the change/removal of an env var when its value address
has already been grabbed by a previous getenv(), ...).

Change-Id: I38cda09746ddb4e79f0297fee26c2a22e1cb881b
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: Ic0eeee9df2f0ef29f3f3f047080fdce109af71bf
TESTED=https://paste.googleplex.com/6208972604833792
BUG=311738671

Change-Id: Ia6658d7c99c8d21c35d724b86fa2c1c48b41069f
The upstream 2.4 release has support for storing engine
metadata outside of tmpfs, but it is tied to the new
MD-on-SSD feature preview. With some small adjustments
to the code, we can enable external metadata without
MD-on-SSD.

Required-githooks: true

Change-Id: If3e728a2db7a4994572bbe53c92654f2e9b01ee0
Signed-off-by: Michael MacDonald <mjmac@google.com>
- D_QUOTA_RPCS envariable added. When set, limits the number of RPCs on a wire being sent out by the process.
- RPCs that exceed quota limit (if set), will now be queued by the sender
- Quota support code added to handle and track resources

Required-githooks: true

Signed-off-by: Alexander A Oganezov <alexander.a.oganezov@intel.com>
Adds some cart-level metrics for RPC quota
exceeded and RPC queue depth.

Required-githooks: true
Change-Id: I5760c255e13ca9a70d352017cae2f6bcee5a6959
Signed-off-by: Michael MacDonald <mjmac@google.com>
Matches new default in 2.6+; aligns default value with
standard tuning practices.

Required-githooks: true
Change-Id: I817927a160fc3dbb2c60a12107da668147e78706
Signed-off-by: Michael MacDonald <mjmac@google.com>
It should be part of server build, not tests

Required-githooks: true

Change-Id: I28b537e1ea7c32a323036c3ec935517ec97ad80c
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
This PR is a subset of the PR #13250 allowing thread safe management of environment variables: it has been split into smaller PRs to facilitate the review process.
This PR mainly add thread safe environment variables management functions.
It also remove and replace old non thread safe custom environment management functions.
Finally, it replace the setenv() function with d_setenv().

Required-githooks: true

Change-Id: Ife6690e2c63dd6c47279a2ac8c3c5a3da5cf8213
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>
Fix regression of d_getenv_xxx() functions used for retrieve int
envioronment variable: support of string reprsenting signed integer.

Required-githooks: true

Change-Id: I7a7f84fe17378ffca1cc0179e1119c1f17a3c4da
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>
Replace getenv() function with d_agetenv_str() and d_freeenv_str()

Required-githooks: true

Change-Id: I6a3e3fafc82327c091bfe96bea3e5f0ef5bece48
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>
Required-githooks: true

Change-Id: I886d130eb20194a1870579bd47ade2b6e4b3b35a
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
…13053)

Allow metadata caching even when the file is open. This was
initially disabled due to conflicts with the interception library
however dfuse now tracks interception library use so it's possible
to only disable when the interception library is in-use rather than
all the time.

Required-githooks: true

Change-Id: Ida03a854030f6b9ded24c5465e0f1126fcba310e
Signed-off-by: Ashley Pittman ashley.m.pittman@intel.com
fuse will call this often to read non-existent xattrs for every write request
so short-circuit these to avoid server round-trips.

Required-githooks: true

Change-Id: I3337b1724f237cc50a5a537e0844f05f0ed9cc61
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
* DAOS-14981 gurt: restore d_getenv_int undefined symbol

Restore missing plain function d_getenv_int() to fix missing symbol with
libdaos.

Required-githooks: true

Change-Id: I86d5c2f5d4d8bbd3c4ab3fdef70ffc5b41ce0921
Signed-off-by: Cedric Koch-Hofer <cedric.koch-hofer@intel.com>
Change-Id: I6bf8765142024e3fd404d51f186c830e8af4bca5
getlogin does not work on the GKE pods that host our presubmits.

BUG=318885377

Change-Id: If4175d8a19b0174d489754659f34d4237cab6e97
Add a STATIC_FUSE option, default is off.  When enabled
DAOS will link statically with the fuse library.
Also add developer build.  This needs some work on
the libfuse RPM side.

Change-Id: I976f135af29d4e3da61cad9129ee19cbb419cddb
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
This ensures the dfuse we ship uses the version of
libfuse we want.

Required-githooks: true

Change-Id: I5aca28fdcb0e678fbd19df94cbf7428f5b9d61d2
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
1. target count calculation should not use pool_tree_count, which might
count the target count under other domain, thus corrupt the pool map
during extending.

2. return correct error code in migrate_pool_tls_lookup_create() and
mrone_one_fetch.

3. Missing free in regenerate_task_of_type.

Signed-off-by: Di Wang <di.wang@intel.com>
Adds a gauge to measure SWIM delay and a counter
for glitches (temporary network outages).

Change-Id: Ibd85c08ab3e3a38931d795d62270f3e4059d7c67
Required-githooks: true

Change-Id: I854937dd249ad9f7211a3b7d40d3365a3e2f79f2
Signed-off-by: Michael MacDonald <mjmac@google.com>
During migration, it should choose the minimum epoch from
rebuild stable epoch and EC aggregation boundary to make
sure correct data is being fetched during recovery.

Add tests to verify the process.

Signed-off-by: Di Wang <di.wang@intel.com>
Use stable epoch for partial parity update to make sure
these partial updates are not below stable epoch boundary,
otherwise both EC and VOS aggregation might operate on
the same recxs at the same time, which can corrupt the data
during rebuild.

During EC aggregation, it should consider the un-aggregate epoch on
non-leader parity as well, otherwise if the leader parity failed, which
will be excluded from global EC stable epoch calculation immediately,
then before the leader parity is being rebuilt, the global stable epoch
might pass the un-aggregated epoch on the failed target, then these
partial update on the data shard might be aggregated before EC
aggregation, which might cause data corruption.

And also it should choose a less fseq shard among all parity shards as
the aggregate leader, in case the last parity can not be rebuilt in
time.

Signed-off-by: Di Wang <di.wang@intel.com>
Add missing properties to the check (for testing purpose) in ds_pool_query_handler.

Add missing DAOS_FAIL_ALWAYS to POOL10.

Clear fail_loc in the MGMT and POOL tests even if DAOS_FAIL_ONCE has been
requested. Other fail_loc-using tests will be cleaned up later.

Change-Id: Ied6c248763ec60fc722a1c636bad08ffff0cc58c
Signed-off-by: Li Wei <wei.g.li@intel.com>
Fix and clean up fail_loc usage in daos_test CONTAINER tests. Also, fix
bugs revealed by the fixed tests:

  - cont_iv_prop_l2g should set DAOS_CO_QUERY_PROP_SCRUB_DIS for
    DAOS_PROP_CO_SCRUBBER_DISABLED.

  - CONT_ACL_UPDATE should update the IV.

Change-Id: I1fa3a25d8283c9e5ef0b7ddaa76febd29b100cfb
Signed-off-by: Li Wei <wei.g.li@intel.com>
Correct some doxygen style formatting that was not valid doxygen.

Change-Id: If332fc006b7ed615903a19f1ee59337322a406c0
Signed-off-by: Ashley Pittman <ashley.m.pittman@intel.com>
Check RF and other performance before retry check, so
non-allowed write should return failuer immediately,
instead of retrying endless.

Use rebuild/reintegrate_pool_rank in daos_container test
to avoid DER_BUSY failure.

Change-Id: I421defce185a928ebd3e52f59f1b19247d90f420
Signed-off-by: Di Wang <di.wang@intel.com>
* DAOS-14010 rebuild: add delay rebuild

Add "delay rebuild" healing mode, so the delay rebuild process is

1) SWIM detects dead ranks and report to the PS leader, which update
the pool map, i.e. marking the related targets as DOWN.
2) Though the rebuild job will not be scheduled, until there are further
manual pool operations, for example drain, extend, reintegration.
3) Then all these pool operations will be merged into one rebuild job,
then scheduled.

Update placement algothrim to be able to calculate the layout with
merged pool operation.

Abort the rebuild job immediately if it finds further pool map update,
so the current job will be merged to the following rebuild job. So
concurrent pool operation will be allowed, no EBUSY check anymore.

Add various tests to verify the delay rebuild process.

Change-Id: If6f163345938bb7e1ee7550124770babd815c695
Signed-off-by: Di Wang <di.wang@intel.com>
DAOS-16039 object: fix EC aggregation wrong peer address (#14593)
DAOS-16009 rebuild: fix O_TRUNC file size related handling
DAOS-15056 rebuild: add rpt to the rgt list properly (#13862)
DAOS-15517 rebuild: refine lock handling for rpt list (#14064)
DAOS-13812 container: fix destroy vs lookup (#12757)
DAOS-15627 dtx: redunce stack usage for DTX resync to avoid overflow (#14189)
DAOS-14845 rebuild: do not wait for EC agg for reclaim (#13610)

Signed-off-by: Xuezhao Liu <xuezhao.liu@intel.com>
Signed-off-by: Mohamad Chaarawi <mohamad.chaarawi@intel.com>
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
Signed-off-by: Wang, Di <wddi218@gmail.com>
Signed-off-by: Di Wang <di.wang@intel.com>
Signed-off-by: Wang Shilong <shilong.wang@intel.com>
Signed-off-by: Fan Yong <fan.yong@intel.com>
Disable warning for deprecated support for python
version so it doesn't fail the build.

Signed-off-by: Jeff Olivier <jeffolivier@google.com>
@jolivier23 jolivier23 requested review from a team as code owners July 11, 2024 14:25
Copy link

Bug-tracker data:
Ticket title is 'Functional on <> / FTEST_dfuse.DaosBuild.<>-./dfuse/daos_build.py:DaosBuild.test_dfuse_daos_build_wt_il'
Status is 'Awaiting backport'
Labels: 'ci_master_daily,daily_test,pr_test,scrubbed_2.8'
Job should run at elevated priority (1)
https://daosio.atlassian.net/browse/DAOS-16168

@github-actions github-actions bot added the priority Ticket has high priority (automatically managed) label Jul 11, 2024
mjmac
mjmac previously approved these changes Jul 11, 2024
@jolivier23 jolivier23 requested review from a team as code owners July 11, 2024 14:28
@jolivier23 jolivier23 closed this Jul 11, 2024
@jolivier23 jolivier23 deleted the jvolivie/fix_scons_2.4 branch July 11, 2024 14:30
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -16,6 +16,8 @@ run-parts() {
for i in $(LC_ALL=C; echo "${dir%/}"/*[^~,]); do
# don't run vim .swp files
[ "${i%.sw?}" != "${i}" ] && continue
# for new repo, skip old changeId script
[ $(basename "${i}") == "20-user-changeId" ] && continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(lint) Quote this to prevent word splitting. [SC2046]

@@ -1167,6 +1212,7 @@ crt_context_req_track(struct crt_rpc_priv *rpc_priv)
d_list_t *rlink;
d_rank_t ep_rank;
int rc = 0;
int quota_rc = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int quota_rc = 0;
int quota_rc = 0;

int len = 0;

int res = sscanf(mountpoint, "/dev/fd/%u%n", &fd, &len);
if (res != 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (res != 1) {
int res = sscanf(mountpoint, "/dev/fd/%u%n", &fd, &len);

}

int fd_flags = fcntl(fd, F_GETFD);
if (fd_flags == -1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (fd_flags == -1) {
int fd_flags = fcntl(fd, F_GETFD);

* fail for these paths.
*/
int fd = check_fd_mountpoint(dfuse_info->di_mountpoint);
if (fd == -1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (fd == -1) {
int fd = check_fd_mountpoint(dfuse_info->di_mountpoint);

if (version != 0 && version < rpc_map_ver) {
D_DEBUG(DB_IO, DF_UUID" retry rpc ver %u > rebuilding %u\n",
DP_UUID(child->sc_pool_uuid), rpc_map_ver,
version);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
version);
version);

Comment on lines +1556 to +1557
mrone, oh, &iod, 1, fetch_eph, update_eph,
DIOF_EC_RECOV_FROM_PARITY | DIOF_FOR_MIGRATION, ds_cont);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mrone, oh, &iod, 1, fetch_eph, update_eph,
DIOF_EC_RECOV_FROM_PARITY | DIOF_FOR_MIGRATION, ds_cont);
mrone, oh, &iod, 1, fetch_eph, update_eph,
DIOF_EC_RECOV_FROM_PARITY | DIOF_FOR_MIGRATION, ds_cont);

if (rebuild_ver == 0 || rebuild_ver != migrate_in->om_version) {
rc = -DER_SHUTDOWN;
DL_ERROR(rc, DF_UUID" rebuild ver %u om version %u",
DP_UUID(migrate_in->om_pool_uuid), rebuild_ver, migrate_in->om_version);
DP_UUID(migrate_in->om_pool_uuid), rebuild_ver, migrate_in->om_version);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DP_UUID(migrate_in->om_pool_uuid), rebuild_ver, migrate_in->om_version);
DP_UUID(migrate_in->om_pool_uuid), rebuild_ver, migrate_in->om_version);

* member of the specific placement map we're
* converting to.
* \param[in] map A pointer to a pl_map which is the first member of the specific placement
* map we're converting to.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* map we're converting to.
* map we're converting to.

debug_print_allow_status(allow_status);
layout->ol_ver = allow_version;
D_DEBUG(DB_PL, "Building layout. map version: %d/%u/%u/%u\n",
pl_map_version(&(jmap->jmp_map)), layout_ver, allow_version, gen_mode);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pl_map_version(&(jmap->jmp_map)), layout_ver, allow_version, gen_mode);
pl_map_version(&jmap->jmp_map), layout_ver, allow_version, gen_mode);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority Ticket has high priority (automatically managed)
Development

Successfully merging this pull request may close these issues.