-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[otlp] Fix panic in dropped count (again!) #3538
Conversation
f3c0849
to
f2fe3f4
Compare
f2fe3f4
to
46f0333
Compare
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
46f0333
to
03cbd10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code change looks good to me.
It's little strange to me that tenant check is only done in case of errors. Should we do it before we even try to convert metrics?
Added unit tests don't seem to be checking for dropped metric at all. In other words, they are unrelated to the PR. 🤔
It is on an authenticated route so the check essentially NEVER fails. It only exists to load the
I first wrote the unit tests to reproduce the panic and then proceeded to fix it. Which is why it seems like the unit test is unrelated. |
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4)
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4) Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4)
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com>
This reverts commit 3744cb5.
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample. But I believe its the best best-effort measurement. Before we used to do `DatapointCount() - samplesInMap()` The problem is the following: 1. target_info is a synthetic metric added in Prometheus, so the final samples could higher. 2. A single histogram datapoint in OTLP corresponds to many samples in Prometheus. Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> Signed-off-by: Goutham Veeramachaneni <gouthamve@gmail.com> (cherry picked from commit 2ad7eb4) Co-authored-by: Goutham Veeramachaneni <gouthamve@gmail.com>
This doesn't accurately count the dropped samples. For example if a single metric with multiple samples is faulty, we get a single error rather than an error per sample.
But I believe its the best best-effort measurement.
Before we used to do
DatapointCount() - samplesInMap()
The problem is the following:
Checklist
Documentation addedCHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]