Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report invalid profiles #62

Closed
junotx opened this issue Jul 3, 2024 · 9 comments
Closed

Report invalid profiles #62

junotx opened this issue Jul 3, 2024 · 9 comments

Comments

@junotx
Copy link

junotx commented Jul 3, 2024

I tried to configure it to report profiles to a server which implements the ProfilesServiceServer interface, but it report the error:

INFO[0000] Starting OTEL profiling agent v0.0.0 (revision main-41f251a7, build timestamp 1719993165) 
INFO[0000] Interpreter tracers: perl,php,python,hotspot,ruby,v8,dotnet 
INFO[0000] Automatically determining environment and machine ID ... 
INFO[0000] Environment: hardware, machine ID: 0xd391511e7677f6b0 
INFO[0000] Assigned ProjectID: 1 HostID: 1409997349622118064 
INFO[0000] Found offsets: task stack 0x20, pt_regs 0x3f58, tpbase 0x1528 
INFO[0000] Supports generic eBPF map batch operations   
INFO[0000] Supports LPM trie eBPF map batch operations  
INFO[0000] eBPF tracer loaded                           
INFO[0004] Attached tracer program                      
INFO[0004] Attached sched monitor                       
INFO[0004] Environment variable KUBERNETES_SERVICE_HOST not set 
ERRO[0010] Request failed: rpc error: code = Unknown desc = invalid request: invalid profile: sample value length 1 does not match sample type length 143 
ERRO[0016] Request failed: rpc error: code = Unknown desc = invalid request: invalid profile: sample value length 1 does not match sample type length 70 
ERRO[0020] Request failed: rpc error: code = Unknown desc = invalid request: invalid profile: sample value length 1 does not match sample type length 84 
ERRO[0026] Request failed: rpc error: code = Unknown desc = invalid request: invalid profile: sample value length 1 does not match sample type length 50 
ERRO[0030] Request failed: rpc error: code = Unknown desc = invalid request: invalid profile: sample value length 1 does not match sample type length 99

It looks that reported profiles do not follow the spec: https://github.com/open-telemetry/oteps/blob/main/text/profiles/0239-profiles-data-model.md#message-sample requires that all samples must have the same number of values, the same as the length of Profile.sample_type.

@florianl
Copy link
Contributor

florianl commented Jul 3, 2024

Hi @junotx

Thanks for reaching out! Is the implementation you are using available somewhere? The might be confusion be the shown log message.
Are these logs referencing Sample.value and Profile.sample_type?

@junotx
Copy link
Author

junotx commented Jul 3, 2024

@florianl yes, the reference and returned error in the server code is here

@brancz
Copy link

brancz commented Jul 3, 2024

@junotx the otel agent and Parca aren't compatible with each other yet. There are a few things to work out about how the agent/parca are supposed to interpret the data. I'll start a separate thread about it this specific case.

@brancz
Copy link

brancz commented Jul 3, 2024

Opened a thread here: #63

@florianl
Copy link
Contributor

This issue should be fixed by the merge of #77 - can you confirm this @junotx ?

@brancz
Copy link

brancz commented Jul 11, 2024

This specific error, yes. After a small additional bug fix in Parca, data can now be ingested but some fields are not or are incorrectly set so it doesn't quite work yet (but we're very close!).

  1. Set the duration to be the actual duration over which the data was collected. Parca could (and probably should) fall back to end timestamp - start timestamp but the way the otel-profiling-agent currently sets that in a way where it may result in 0 if there was only one sample collected. So either duration needs to be set "correctly", or start and end timestamp need to be not based on the samples but rather the actual start and end times when collection started/ended. Parca needs either this or (4).
  2. Set all mapping values to 0 (start, limit, file offset) for symbolization to work correctly.
  3. (nice to have) It would be great if .ResourceProfiles.ScopeProfiles.Scope.Name was set, as currently it will say "unknown" as a fallback.
  4. (nice to have) Set aggregation temporality to delta in sample type.
  5. (nice to have) Prefer reporting the GNU build ID when available for native binaries.

@florianl
Copy link
Contributor

  1. Set the duration to be the actual duration over which the data was collected. [..]

The otel-profiling agent does set Profile.duration_nanos with profile.DurationNanos = int64(endTS - startTS). For a profile that contains only a single sample, Profile.duration can be 0. The reporting part of otel-profiling-agent is not aware of the timeframe in which a number of samples were collected. So Duration, as well as StartTimeUnixNano and EndTimeUnixNano, are extracted from the samples that are reported.

  1. (nice to have) It would be great if .ResourceProfiles.ScopeProfiles.Scope.Name was set, as currently it will say "unknown" as a fallback.

This might be my personal interpretation of Instrumentation scope, but it reads to me as full system profiling is not a topic, that is well covered by Instrumentation scope. What value would you set for Scope.Name and Scope.Version?

  1. (nice to have) Prefer reporting the GNU build ID when available for native binaries.

A semantic convention could help to report GNU build IDs. So I assume, this will be possible soonish, once the the pprof compatibility topic is clear.

@brancz
Copy link

brancz commented Jul 12, 2024

The otel-profiling agent does set Profile.duration_nanos with profile.DurationNanos = int64(endTS - startTS). For a profile that contains only a single sample, Profile.duration can be 0. The reporting part of otel-profiling-agent is not aware of the timeframe in which a number of samples were collected. So Duration, as well as StartTimeUnixNano and EndTimeUnixNano, are extracted from the samples that are reported.

My bad, I thought it didn't because I see 0 pretty much all of the time. Any objections to setting aggregation temporality to delta?

This might be my personal interpretation of Instrumentation scope, but it reads to me as full system profiling is not a topic, that is well covered by Instrumentation scope. What value would you set for Scope.Name and Scope.Version?

I kind of see it as the "producer". So I think I'd set Name to otel-profiling-agent (or whatever the new one will be when moved/renamed) and "Version" to the version, which could just be "alpha" for now until there is a tagged release.

A semantic convention could help to report GNU build IDs. So I assume, this will be possible soonish, once the the pprof compatibility topic is clear.

I agree, though it doesn't hurt to set the current mapping's build ID to the GNU build ID since that's he only thing pprof could possibly know what to do with, no?

@junotx
Copy link
Author

junotx commented Jul 17, 2024

This issue should be fixed by the merge of #77 - can you confirm this @junotx ?

yes, the error is gone when i run the build from the latest code, except that some fields in parca doesn't quite work correctly yet, which may be something with parca. so i will close this issue.

@junotx junotx closed this as completed Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants