Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add experimental support for APM latency with Exp Histogram #288

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

srikanthccv
Copy link
Member

@srikanthccv srikanthccv commented Feb 11, 2024

Follow up #279 (comment). This is more accurate (i.e. as close to latency from traces) and faster.

  • Not enabled by default
  • The default max size is 160, and it supports high-resolution tail latency distribution covering 1ms-100s (we need to bump the size if we want more); to achieve the same in explicit bucket histogram we need to define 160 buckets. Which makes the queries slower and samples by 10x of the current samples.

The query would change

From
SELECT
    service_name,
    ts,
    histogramQuantile(arrayMap(x -> toFloat64(x), groupArray(le)), groupArray(value), 0.99) AS value
FROM
(
    SELECT
        service_name,
        le,
        ts,
        sum(rate_value) AS value
    FROM
    (
        SELECT
            service_name,
            le,
            ts,
            If((value - lagInFrame(value, 1, 0) OVER rate_window) < 0, nan, If((ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window) >= 86400, nan, (value - lagInFrame(value, 1, 0) OVER rate_window) / (ts - lagInFrame(ts, 1, toDate('1970-01-01')) OVER rate_window))) AS rate_value
        FROM
        (
            SELECT
                fingerprint,
                service_name,
                le,
                toStartOfInterval(toDateTime(intDiv(timestamp_ms, 1000)), toIntervalSecond(60)) AS ts,
                max(value) AS value
            FROM signoz_metrics.distributed_samples_v2
            INNER JOIN
            (
                SELECT
                    JSONExtractString(labels, 'service_name') AS service_name,
                    JSONExtractString(labels, 'le') AS le,
                    fingerprint
                FROM signoz_metrics.time_series_v2
                WHERE (metric_name = 'signoz_latency_bucket') AND (temporality IN ['Cumulative', 'Unspecified'])
            ) AS filtered_time_series USING (fingerprint)
            WHERE (metric_name = 'signoz_latency_bucket')
            GROUP BY
                fingerprint,
                service_name,
                le,
                ts
            ORDER BY
                fingerprint ASC,
                service_name ASC,
                le ASC,
                ts ASC
        )
        WINDOW rate_window AS (PARTITION BY fingerprint, service_name, le ORDER BY fingerprint ASC, service_name ASC, le ASC, ts ASC)
    )
    WHERE isNaN(rate_value) = 0
    GROUP BY
        GROUPING SETS (
            (service_name, le, ts),
            (service_name, le))
    ORDER BY
        service_name ASC,
        le ASC,
        ts ASC
)
GROUP BY
    service_name,
    ts
ORDER BY
    service_name ASC,
    ts ASC
To
SELECT
    service_name,
    toStartOfInterval(toDateTime(intDiv(unix_milli, 1000)), toIntervalSecond(60)) AS ts,
    quantilesDDMerge(0.01, 0.99)(sketch)[1] AS value
FROM signoz_metrics.distributed_exp_hist
INNER JOIN
(
    SELECT
        JSONExtractString(labels, 'service_name') AS service_name,
        JSONExtractString(labels, 'le') AS le,
        fingerprint
    FROM signoz_metrics.time_series_v2
    WHERE (metric_name = 'signoz_latency')
) AS filtered_time_series USING (fingerprint)
WHERE (metric_name = 'signoz_latency')
GROUP BY
    service_name,
    ts
ORDER BY
    service_name ASC,
    ts ASC

Summary by CodeRabbit

  • New Features
    • Introduced support for exponential histograms in metrics processing, enhancing data representation and accuracy.
  • Enhancements
    • Added new configurations and functionalities to better handle and process exponential histograms in span metrics.

Copy link

coderabbitai bot commented Feb 11, 2024

Walkthrough

The recent changes introduce support for exponential histograms in the span metrics processor. A new configuration option allows users to enable this feature, which is backed by the implementation of a new exponential histogram type, updates to the processor to handle these histograms, and functions to manage and collect histogram metrics. This enhancement aims to improve metric precision and efficiency in representing distribution data.

Changes

Files Change Summary
.../config.go, .../factory.go Added EnableExpHistogram boolean field to Config with default false.
.../processor.go Introduced exponential histogram handling with new types, cache, and functions for metrics collection and processing.

🐰✨
In the realm of code, where data streams flow,
A rabbit hopped in, with a histogram glow.
"Exponential," it said, "is the way to go,
For metrics that scale and beautifully show."
So with a leap and a cheer, let's welcome the change,
For precision and efficiency, now within range.
🌟📊

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@srikanthccv
Copy link
Member Author

There is no apples-to-apples comparison since one has 10x more buckets yet It is roughly ~8x faster with better accuracy (relative error no greater than 4% in worst case) on sample data. But the production data is the real ground to verify the numbers. This is disabled by default and I will enable this for select customers and do some analysis. If there are no concerns, I would like to see this get merged.

Copy link
Member

@nityanandagohain nityanandagohain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am only reviewing the syntactical things as I don't have the expertise. Since you want this to test this on customer instance I have no concerns as of now.

Also would be great if you can do a small session on exp histogram 👍

@srikanthccv srikanthccv merged commit 936dc65 into main Feb 13, 2024
2 checks passed
@srikanthccv srikanthccv deleted the span-metrics-exp-hist branch February 13, 2024 12:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 44c3807 and 5f0e95d.
Files selected for processing (3)
  • processor/signozspanmetricsprocessor/config.go (1 hunks)
  • processor/signozspanmetricsprocessor/factory.go (1 hunks)
  • processor/signozspanmetricsprocessor/processor.go (17 hunks)
Additional comments: 19
processor/signozspanmetricsprocessor/factory.go (1)
  • 51-51: The addition of EnableExpHistogram with a default value of false is aligned with the PR objectives and best practices for introducing experimental features.
processor/signozspanmetricsprocessor/config.go (1)
  • 87-87: The addition of the EnableExpHistogram boolean field to the Config struct is necessary for the feature's configuration and follows good coding practices.
processor/signozspanmetricsprocessor/processor.go (17)
  • 71-73: The exponentialHistogram struct is introduced correctly with a field for the histogram. Ensure that the structure.Histogram[float64] type from the go-expohisto library is the most suitable choice for the intended use case.
  • 88-88: Adding expDimensions to the processorImp struct is appropriate for handling dimensions specific to exponential histograms. This aligns with the existing design pattern for handling dimensions.
  • 100-100: The introduction of expHistograms map in the processorImp struct is consistent with the existing structure for handling histograms. This is necessary for storing exponential histogram data.
  • 116-116: The addition of expHistogramKeyToDimensions cache is consistent with the design pattern of using caches for dimension mapping. This is crucial for performance optimization when handling exponential histograms.
  • 158-185: The function expoHistToExponentialDataPoint correctly converts an exponential histogram to an exponential data point. However, ensure that the handling of positive and negative buckets is correctly implemented according to the expected behavior of the structure.Histogram library.
  • 187-189: The Observe method in the exponentialHistogram struct correctly updates the histogram with a new value. This is a standard approach for histogram observation.
  • 230-233: Initializing the expHistogramKeyToDimensionsCache with the configured cache size is done correctly. This ensures that the cache size is configurable and aligns with the existing pattern for other caches.
  • 259-259: Creating the expHistograms map during processor initialization is necessary for storing exponential histogram data. This aligns with the initialization of other histogram maps.
  • 268-268: Adding expDimensions during processor initialization ensures that exponential histogram-specific dimensions are correctly handled. This is consistent with the handling of other dimensions.
  • 274-274: Initializing the expHistogramKeyToDimensions cache during processor setup is done correctly. This ensures that the cache is ready for use when processing spans.
  • 484-487: The collectExpHistogramMetrics function is correctly implemented to collect and write exponential histogram metrics. Ensure that the metrics are correctly aggregated and reported according to the configured aggregation temporality.
  • 515-519: The logic to delete keys from expHistograms that are no longer present in the expHistogramKeyToDimensions cache is correctly implemented. This helps in managing memory usage effectively.
  • 571-594: The implementation of collectExpHistogramMetrics for collecting and writing exponential histogram metrics is correct. Ensure that the data points are correctly populated and that the dimensions are accurately copied to the metrics.
  • 714-720: The getDimensionsByExpHistogramKey function correctly retrieves dimensions from the expHistogramKeyToDimensions cache. This is consistent with the pattern used for other dimension retrieval functions.
  • 889-895: The conditional logic to handle exponential histograms based on the EnableExpHistogram configuration is correctly implemented. This ensures that the feature is opt-in and does not affect existing functionality unless explicitly enabled.
  • 958-964: Resetting the expHistograms map and purging the expHistogramKeyToDimensions cache during the reset of accumulated metrics is correctly implemented. This is necessary for correctly handling delta metrics and managing memory usage.
  • 988-1004: The updateExpHistogram function correctly updates the exponential histogram with a new latency value. Ensure that the histogram configuration (e.g., max size) is appropriately set for the intended use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants