Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Valgrind Reporting Many Warnings With Graphs and Contexts #544

Open
lkaneda opened this issue Jan 6, 2023 · 9 comments
Open

Valgrind Reporting Many Warnings With Graphs and Contexts #544

lkaneda opened this issue Jan 6, 2023 · 9 comments

Comments

@lkaneda
Copy link

lkaneda commented Jan 6, 2023

Running valgrind on our implemented software, we found there were many errors coming from Tim-VX with regards to the graph and context instances. To verify if this was an issue with our software or something happening internally, we ran it against the lenet example provided in this repo and saw the same output. I've attached the valgrind log here. It's hard for me to tell if this is a tim-vx issue or an openvx issue (or potentially an us issue) so I'm hoping this log can help figure out what may be happening.

The trend I see in the log is that it happens on all tim-vx functions: creating, initalizing, validating (compile), executing (run), and destroying.

The command we ran to get this output:
valgrind --tool=memcheck --leak-check=full --error-limit=no --log-file="{filename}" ./{program executable filename}

valgrindOutput3.txt

@sunshinemyson
Copy link
Contributor

@lhawana ,

Thanks for sharing. We are working on this internally. Will keep you posted once we addressed them.

@sunshinemyson
Copy link
Contributor

@lhawana ,

We fixed some issue detected by valgrind for tim-vx/vx-delegate in past month. You can check commit history for the fixes.

And we double confirmed most issue in our low-level driver is false alert.

Thanks

@BralSLA
Copy link

BralSLA commented Jun 29, 2023

Hey @sunshinemyson, sorry for the silence on this ticket; I'll be handling it from here. Do you have a commit in particular that I should checkout? I tried merging this commit without success. It still reports the errors/warnings in the valgrind output after merging just the change in this file.

If I try merging the entire file, there are a lot of other dependencies I have to merge as well to get it to compile in our version, and even still I'm unable to get it to load successfully.

Is there either a commit you can point me to that should have this addressed, or are you able to tell me what I need to merge between the version of tim-vx we are using, and this version?

Thanks

@sunshinemyson
Copy link
Contributor

@BralSLA B,

Can you update your version to latest version? We didn't maintain legacy version yet.

Thanks

@BralSLA
Copy link

BralSLA commented Jul 20, 2023

Hey @sunshinemyson ,

I've updated to the latest version, but it's failing to compile in my yocto build. I'm getting the 2 following errors:

In constructor 'tim::vx::ops::Topk::Topk(tim::vx::Graph*, uint32_t, int32_t)': | /home/slroot/build_001/NXPBuild/build-ucm-imx8m-plus/workspace/sources/tim-vx/src/tim/vx/ops/topk.cc:37:39: error: 'vsi_nn_topk_param' {aka 'struct _vsi_nn_topk_param'} has no member named 'axis' | 37 | this->impl()->node()->nn_param.topk.axis = axis;

If I look at the definition for that vsi_nn-topk_param struct, it does have an axis member, but I'm not sure what the hierarchy of inclusions for the struct is at the moment, so I'm unable to tell if it's actually associated with the Topk class like it appears it should be. I suppose it's also possible something in our yocto build process could be messing something up.
Have you run into this error?

Thanks again

@BralSLA
Copy link

BralSLA commented Aug 4, 2023

Hey @sunshinemyson ,

Update: Needed to update to the latest GPU drivers available, and then clean my build environment. I've gotten the latest version to build, and have it on our system; however, we are experiencing a segfault since upgrading to the latest tim-vx, and updating our GPU drivers to be compatible.
We are seeing a lot of "Create Tensor Fail" messages in the output, followed by a segfault.
At the top of the call stack where this segfault happens, is when CreateOperation() is being called. Below is the output:

Program received signal SIGSEGV, Segmentation fault.
0x0000fffff7ddcca0 in tim::vx::BuiltinOpImpl::SetRoundingPolicy(tim::vx::OverflowPolicy, tim::vx::RoundingPolicy, tim::vx::RoundType, unsigned int) () from /usr/lib/libtim-vx.so
Segmentation fault

Here is the line where we are calling CreateOperation()

auto conv1 = this->graph->CreateOperation<tim::vx::ops::Conv2d>(conv1_weight_shape[3], conv1_pad_type, conv1_ksize, conv1_stride, conv1_dilation, conv1_pad);

Do you know why this may be happening? Let me know if you need anymore information.

Thanks again

@sunshinemyson
Copy link
Contributor

@BralSLA

I suppose you meeting issue with NXP platform. I didn't receive such report internally since we have NXP platform daily test.

Can you provide more version information about system and driver so that i can forward it to NXP ?

@BralSLA
Copy link

BralSLA commented Aug 9, 2023

@sunshinemyson
Thanks for getting back to me. We are running Yocto version 5.15.71, with the GPU driver version imx-gpu-viv-6.4.11.

Let me know if you need anymore info.

@sunshinemyson
Copy link
Contributor

@BralSLA ,

We don't have such issue from internal test or nxp. Since your crash point is strange, i prefer the problem is your build not clean. Please double check if you build tim-vx with external sdk correctly, it seems a binary incompatible issue.

BTW, we have CI verify TIM-VX with NXP imx.8mp silicon board with 6.4.11 driver for each patch. No such issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants