Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'memory request' feature #11039

Open
mike9421 opened this issue Nov 20, 2023 · 16 comments
Open

Add 'memory request' feature #11039

mike9421 opened this issue Nov 20, 2023 · 16 comments
Labels
enhancement New feature or request

Comments

@mike9421
Copy link

Component(s)

No response

Is your feature request related to a problem? Please describe.

During the use of OTEL, we found that when the system memory is insufficient, the startup of OTEL will cause a system OOM, which in turn causes user processes to exit

Describe the solution you'd like

Before OTEL starts, it reads the available memory of the runtime environment (k8s, docker, etc.) and compares it with the value configured in 'memory request'. If the system memory cannot meet the 'memory request', then OTEL will not start and notify that the system memory is insufficient.

The configurable solution is as follows:

  1. Expose the 'memory request' configuration item to users. OTEL can set the default value for this configuration item.
  2. There are two options for the position of the configuration item in the configuration file:
    a. Place it in the extension.
    reason: 'memory request' does not need to access telemetry data and does not belong to the pipeline.
    b. Add 'config' option in the service configuration and make 'memory request' a subset of it (recommended).
    reason: 'memory request' is an operation that needs to be carried out before OTEL starts. If it runs as an extension, the execution order of the extension may very likely cause the 'memory request' to fail."

Describe alternatives you've considered

No response

Additional context

We found that the 'ballast' component might consume memory, hence the feature of 'memory request' might need to take this into consideration

@mike9421 mike9421 added the enhancement New feature or request label Nov 20, 2023
@crobert-1
Copy link
Member

Hello @10249421, historically we've used the memory ballast extension to limit memory, but it will be deprecated soon in favor of using the GOMEMLIMIT environment variable.

Setting the golang environment variable should be enough to save you from hitting system OOM errors, but please let us know if it's not effective for your use case.

@crobert-1 crobert-1 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 20, 2023
@crobert-1
Copy link
Member

There's also the memory limiter processor if that fits your use case better.

@mike9421
Copy link
Author

Hello @10249421, historically we've used the memory ballast extension to limit memory, but it will be deprecated soon in favor of using the GOMEMLIMIT environment variable.

Setting the golang environment variable should be enough to save you from hitting system OOM errors, but please let us know if it's not effective for your use case.

Excuse me, I have read about the GOMEMLIMIT environment variable. However, it doesn't quite match the feature I expected. The 'memory request' feature should not limit the memory usage of OTEL, but stop OTEL from running when the system's available memory is less than the value specified by 'memory request'.

@crobert-1
Copy link
Member

crobert-1 commented Nov 21, 2023

My apologies, I'm just having a bit of a hard time understanding the value here over existing options.

the startup of OTEL will cause a system OOM, which in turn causes user processes to exit

In this comment are you saying that when the collector starts up in your environment, it ends up OOMing other unrelated processes?

I understand the difference between options I've suggested and your use case, but I'm trying to understand in what scenario we don't even want the collector to run.

@crobert-1 crobert-1 reopened this Nov 21, 2023
@mike9421
Copy link
Author

In this comment are you saying that when the collector starts up in your environment, it ends up OOMing other unrelated processes?

Here's an example:
The total memory of the server is 500MB. Assume that the current peak memory usage of the business process is 400MB (this situation is generally due to the increase in user activity of the business process).

At this time, if the network administrator deploys and starts OTEL, during the process of starting all components in the OTEL configuration file (he components have not all been started at this time, and no telemetry data has been received), the following may occur:

  1. If the memory usage of OTEL exceeds 100MB, the system OOM will be triggered. At this time, the business process is likely to be killed by the system;

  2. If OTEL provides the 'memory request' feature and the user configures the 'memory request' value to 120MB, in the above situation, OTEL will stop creating and starting components and prompt insufficient memory, so that it will not cause system OOM. This can ensure that the business process is still running, and at the same time, it can remind the user that the current available memory is not enough to start OTEL, and they can try to start OTEL again after a while.

The 'memory request' is the minimum requirement for the currently available system memory before all OTEL components are started. This ensures that the business process continues running before OTEL is completely started.

@crobert-1
Copy link
Member

So to summarize, this request is essentially to set a minimum memory usage boundary for the collector. If 100MB is not available (using the example you've provided), don't allow the collector to start, as the collector can't accomplish what it needs to do with less than 100MB of memory.

However, what happens when available memory for the collector dips below 100MB when its running, due to environmental changes (other processes using more memory)? This proposed solution doesn't have any impact, and the situation you're concerned about is not addressed.

Wouldn't it make sense to just do a check independent of the collector to see what your environment's available memory is, and then just not start the collector if it's too low? Instead of it being a collector-specific functionality? Is there some reason why that's insufficient?

@mike9421
Copy link
Author

So to summarize, this request is essentially to set a minimum memory usage boundary for the collector. If 100MB is not available (using the example you've provided), don't allow the collector to start, as the collector can't accomplish what it needs to do with less than 100MB of memory.

this is the case

However, what happens when available memory for the collector dips below 100MB when its running, due to environmental changes (other processes using more memory)? This proposed solution doesn't have any impact, and the situation you're concerned about is not addressed.

'memory reqeust' is used to determine whether the available memory is sufficient before all OTEL components are started; and after startup, the existing 'memory limit' component is used to limit the memory of OTEL runtime.

If environmental changes (other processes using more memory) cause a shortage of available memory, and OTEL has not exceeded the 'memory limit', we are powerless in this situation as we cannot determine how much memory other processes are using.

As a program designed to help users analyze software performance and behavior, I think what we can do is: ensure that before the startup of all OTEL components, the system's available memory meets the minimum requirement (memory request); during the operation of OTEL, try as much as possible not to exceed the set memory upper limit ('memory limit' component).

Wouldn't it make sense to just do a check independent of the collector to see what your environment's available memory is, and then just not start the collector if it's too low? Instead of it being a collector-specific functionality? Is there some reason why that's insufficient?

System free memory detection independent of OTEL is effective, but considering that every user who wants to use this feature needs to do the detection, it might be better for OTEL to provide this feature?

@crobert-1
Copy link
Member

Thanks for clarifying @10249421. I'm going to defer to others here for now to see if anyone else has thoughts here 👍

@atoulme
Copy link
Contributor

atoulme commented Nov 29, 2023

@10249421 sounds like something you'd do outside the otel collector as part of a service initialization. There are a lot of implications to what you're referring to, since just because the collector could have enough memory to start initially doesn't guarantee it won't clash with user processes a minute in. Memory measurement at a point in time is not going to give you enough guarantees that it's safe to run the collector if it's competing with user processes for resources.

You need to size all the processes on your box so it's possible to keep everything running.

@mike9421
Copy link
Author

mike9421 commented Dec 4, 2023

@atoulme Thanks for your answer.

@10249421 sounds like something you'd do outside the otel collector as part of a service initialization. There are a lot of implications to what you're referring to, since just because the collector could have enough memory to start initially doesn't guarantee it won't clash with user processes a minute in. Memory measurement at a point in time is not going to give you enough guarantees that it's safe to run the collector if it's competing with user processes for resources.

Yes, this scenario can happen. As I mentioned above, generally, we cannot determine how much memory other processes occupy.

The current approach of the 'memory limit' component is that when the memory used by OTEL exceeds the limit, 'memory limit' will release the memory through GC and discard the telemetry data. This method is also to prevent the amount of memory used by OTEL from continuing to grow as much as possible, rather than completely limiting the memory usage of OTEL, because the 'memory limit' cannot determine the memory occupied by other components of OTEL.

Similarly, 'memory request' is also a soft request for available system memory, rather than a hard demand.

You need to size all the processes on your box so it's possible to keep everything running.

This is a workaround, but I don't know yet how to limit the memory usage of all processes. It would be nice if it could be done and not so complicated.

Copy link
Contributor

github-actions bot commented Feb 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Jun 10, 2024
Copy link
Contributor

github-actions bot commented Aug 9, 2024

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 9, 2024
@mike9421
Copy link
Author

Hi, @crobert-1 ,please help me reopen this issue, thank you.

@atoulme
Copy link
Contributor

atoulme commented Sep 3, 2024

Moving to core as this is not related to contrib.

I don't think this is the collector responsibility. Please feel free to create systemd scripts and run scripts that set the collector with the right memory settings so it cohabits with your other user processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants