Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Defining Output per integration #143905

Closed
nimarezainia opened this issue Oct 24, 2022 · 31 comments · Fixed by #189125
Closed

[Fleet] Defining Output per integration #143905

nimarezainia opened this issue Oct 24, 2022 · 31 comments · Fixed by #189125
Assignees
Labels
QA:Needs Validation Issue needs to be validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@nimarezainia
Copy link
Contributor

nimarezainia commented Oct 24, 2022

There are many legitimate reasons why an operator may need/want to send data from integrations to different outputs within a policy. Some may even need to send datastream to different outputs. Currently we only allow an output to be defined on a per policy basis. In order to support this request the per policy output definition needs to be over-written by the output defined in the integration. Our config should support this already.

Use Cases:

  1. As an operator, I need my security logs from an agent to be sent to one logstash where as informational logs to be sent to another logstash instance.

  2. We operate multiple beats on a given system and would like to migrate to using Elastic Agent. For historical and operational reasons these beats are writing data to distinct outputs. Once we migrate over to using Agent, we would like to keep the upstream pipeline intact.

@nimarezainia nimarezainia transferred this issue from elastic/elastic-agent Oct 24, 2022
@nimarezainia nimarezainia added the Team:Fleet Team label for Observability Data Collection Fleet team label Oct 24, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@nimarezainia nimarezainia self-assigned this Mar 7, 2023
@willemdh
Copy link

willemdh commented May 3, 2023

There are many reasons there is a need for multiple agents on a host. One example, which is applicable to Elastic ecosystem itself, a customer typically needs to forward the Elasticsearch / Logstash / Kibana logs and metrics to a separate monitoring cluster. This is not possible in general, as there is already a set of agents running on this node to index system logs and metrics..

Elastic should really support multiple outputs per integration or provide a supported way to install and manage multiple identical agents on a system.

@amitkanfer
Copy link

amitkanfer commented May 3, 2023

@nimarezainia what else is needed from you on this one?

@nimarezainia
Copy link
Contributor Author

@amitkanfer definition is fairly self explanatory but I need to create a mock up for the UI.

@amitkanfer
Copy link

Thanks @nimarezainia - once ready let's chat online and pass it to @jlind23 for development.

@nicpenning
Copy link

Another big reason for output per integration is when you have 20+ integrations in a specific policy, there is a good chance that some of those integrations have very different performance requirements.

The biggest need for this feature for me is having the ability to set the amount of works and bulk max size to account for a particular integration that ingest 30K events per second. We have some integrations that only receive 1-5 events per minute so it doesn't make sense to crank up the workers and bulk max size since not all integrations need that performance adjustment.

Here is a sample policy with their respective EPS and need for per integration output selection:

  1. Firewall - 30K EPS
    -12 Workers, 2500 bulk max size
  2. Windows Events - 3000 EPS
    -4 Workers, 500 bulk max size
  3. HTTP Input - .5 EPS
    -Default
  4. API - 20 EPS
    -Default
  5. Web Logs - 300 EPS
    -Default

@jlind23 jlind23 changed the title Defining Output per integration [Fleet]Defining Output per integration Oct 24, 2023
@jlind23 jlind23 changed the title [Fleet]Defining Output per integration [Fleet] Defining Output per integration Oct 24, 2023
@jlind23
Copy link
Contributor

jlind23 commented Oct 24, 2023

@nimarezainia What would be the user experience here?
Shall we display per output/policy a list of integrations that users can check to see which one is using what output?
Or shall we in the UI below offer the option to switch the output for each integration?
Screenshot 2023-10-24 at 16 53 43

@nimarezainia
Copy link
Contributor Author

nimarezainia commented Oct 26, 2023

@jlind23 I propose the following: (@zombieFox we need to discuss this also)

In the integrations settings page we need a drop down which would display the set of outputs available to the user (configured on the Fleet->settings tab). This should default to whatever output is configured in the policy for integration data. We may want to put this in the advanced settings drop down.

image

The agent policy page should be modified also to show a summary of what output is allocated to which integration:

image

@nimarezainia
Copy link
Contributor Author

In scenarios where the user is needing to send different data streams to different outputs, the above model still works as the user can add two instances of the same integration to the policy. For example of the NGINX:

nginx-1 instance:

  • enable access logs datastream
  • disable error logs datastream
  • set integration to send data to output Logstash-A

nginx-2 instance:

  • disable access logs datastream
  • enable error logs datastream
  • set integration to send data to output Logstash-B

@zombieFox
Copy link
Contributor

zombieFox commented Oct 26, 2023

We reviewed this in the UX sync. Looks good to go. The additions indicated above don't require design mocks.

The copy sounds right to me too, but we might want to pass with by David Kilfoyle or Karen Metts.

@nimarezainia nimarezainia removed their assignment Oct 31, 2023
@jen-huang
Copy link
Contributor

Moving this to tech definition for this sprint, if the work identified is a small amount, we'll proceed with implementation.

@nchaulet
Copy link
Member

nchaulet commented Nov 6, 2023

Proposed technical implementation for that

I did a small POC implementing only the API part for it with some shortcuts PR to ensure it will work as expected and it seems it will

Package policy/saved object changes

We will introduce a new property named output_id to the package policy. This property will be added/updated in the following components:

  • Saved object
  • Type and schema for package policy preconfigured package policy and simplified package policy

We will need to validate that creating/editing a package policy output respect the same rules as per agent policy outputs

  • APM and fleet server package policies cannot use non ES output.
  • Licence restriction it should only be available for enterprise licence as multiple output correct ? @nimarezainia

Deleting/Editing output changes

We will have to implement the same rules as we have for agent policy:

  • When an output assigned to a package policy is deleted, the associated package policy will revert to using the default output
  • Furthermore, if an output is updated, we will increment the revision for package policies and agent policies utilizing it

Full agent policy generation changes (aka policy sent to agents)

We need to adapt the policy sent to the agent to reflect our model change, the agent already support this using the use_output property and already support multiple outputs.

I tested this locally with the POC PR it seems to work with multiple logs package policy and it seems to work as expected,

The use_output field as to be populated with the package policy output id or the default data output (code here)

The role permission has to change so we generate a role permission for each output based on the package policy assigned to them instead of one for data and one for monitoring (code here)

fullAgentPolicy.output_permissions = Object.keys(fullAgentPolicy.outputs).reduce<
NonNullable<FullAgentPolicy['output_permissions']>
>((outputPermissions, outputId) => {
const output = fullAgentPolicy.outputs[outputId];
if (
output &&
(output.type === outputType.Elasticsearch || output.type === outputType.RemoteElasticsearch)
) {
const permissions: FullAgentPolicyOutputPermissions = {};
if (outputId === getOutputIdForAgentPolicy(monitoringOutput)) {
Object.assign(permissions, monitoringPermissions);
}
if (outputId === getOutputIdForAgentPolicy(dataOutput)) {
Object.assign(permissions, dataPermissions);
}
outputPermissions[outputId] = permissions;
}
return outputPermissions;
}, {});

Things to verify

  • Ensure that the changes are compatible with all input types. It has been tested with log inputs and seems functionnal cc @cmacknz
  • It seems if one package policy output is broken the input is still reported as healthy in the UI need verify and create a follow up elastic agent issue if it is the case.

@nimarezainia
Copy link
Contributor Author

Package policy/saved object changes

We will introduce a new property named output_id to the package policy. This property will be added/updated in the following components:

* Saved object

* Type and schema for package policy preconfigured package policy and simplified package policy

We will need to validate that creating/editing a package policy output respect the same rules as per agent policy outputs

* APM and fleet server package policies cannot use non ES output.

* Licence restriction it should only be available for enterprise licence as multiple output correct ?  @nimarezainia

thanks @nchaulet - yes this is correct, same licensing restriction as we have for per policy output.

@cmacknz
Copy link
Member

cmacknz commented Nov 7, 2023

Ensure that the changes are compatible with all input types. It has been tested with log inputs and seems functionnal cc @cmacknz

We don't have any special handling for specific input types. The use_output option in the agent supports multiple outputs like this already. The only under the hood effect of multiple outputs is the possibility that the agent will run more processes than before. This will add additional queues increasing the memory usage of the agent.

For example, the following results in one logfile input process (or component in the agent model) named input-default implemented by Filebeat:

outputs:
  default:
     type: elasticsearch
     ...
inputs:
   - id: logfileA
     type: logfile
     use_output: default
     ...
   - id: logfileB
     type: logfile
     use_output: default
     ...

While the configuration below with two distinct outputs will result in two Filebeat processes/components, one named logfile-outputA and one named logfile-outputB:

outputs:
  outputA:
     type: elasticsearch
     ...
  outputB:
     type: elasticsearch
     ...
inputs:
   - id: logfileA
     type: logfile
     use_output: outputA
     ...
   - id: logfileB
     type: logfile
     use_output: outputB
     ...

You should be able to observe this directly in the output of elastic-agent status and in the set of components states reported to Fleet.

@cmacknz
Copy link
Member

cmacknz commented Nov 7, 2023

I should note that you only end up with additional processes when assigning inputs of the same type to different outputs. If in the example of above there was a system/metrics instead of logfileB there would be no change. This is because the agent runs instances of the same input type in the same process, and is already isolating different input types into their process.

@jen-huang
Copy link
Contributor

Thanks @nchaulet, @nimarezainia, @cmacknz for the work & discussion here. Based on recent discussions about priority, I am going to kick this by a few sprints for implementation work.

@BenB196
Copy link

BenB196 commented Dec 13, 2023

One of the biggest drivers from our company's end on this would be APM Server, which can only support the Elasticsearch output. We mainly leverage Logstash output for agents. This requires us to run a second Agent for just APM server, and when you get to scale (100+ APM Server/Elastic Agent deployments across multiple Kubernetes clusters). We end up "wasting" 500MB on each node just operating the second agent for APM rather than being able to use our existing ones that default use Logstash.

Depending on how you look at it, 500MB might not seem like a lot, but when you're having to operate 50-100 deployments, that is 25GB-50GB of memory. This also indirectly generates additional monitoring data from the additional agents that we need to run and be monitored.

@nimarezainia
Copy link
Contributor Author

One of the biggest drivers from our company's end on this would be APM Server, which can only support the Elasticsearch output. We mainly leverage Logstash output for agents. This requires us to run a second Agent for just APM server, and when you get to scale (100+ APM Server/Elastic Agent deployments across multiple Kubernetes clusters). We end up "wasting" 500MB on each node just operating the second agent for APM rather than being able to use our existing ones that default use Logstash.

Depending on how you look at it, 500MB might not seem like a lot, but when you're having to operate 50-100 deployments, that is 25GB-50GB of memory. This also indirectly generates additional monitoring data from the additional agents that we need to run and be monitored.

thanks @BenB196. How would you deploy the agent if you could indeed have the ability to define output per integration?

@BenB196
Copy link

BenB196 commented Dec 14, 2023

Hi @nchaulet currently for each Kubernetes cluster we deploy 2 DaemonSets, one that uses Logstash output and contains all normal integrations, a second which uses the Elasticsearch output and contains just APM Server. If per integration output was supported, we'd switch to deploying a single DaemonSet which uses Logstash as the default, and specifies the Elasticsearch output solely for the APM Server integration.

@nicpenning
Copy link

👋 just checking in on this feature! Any progress or details needed to further get thus implemented?

8.12 added the remote Elasticsearch output which was significant! The ability to do this per integration would be very beneficial as reasons previously stated. Thank you!

@nimarezainia
Copy link
Contributor Author

thanks @nicpenning this is still prioritized but we have other higher impacting issues to resolve. We should get to this one soon as well.

@nicpenning
Copy link

Thank you for the update, Nima!

@jlind23
Copy link
Contributor

jlind23 commented Apr 3, 2024

@nimarezainia I might have missed them but do we have any UI/UX mockup for this?

@nimarezainia
Copy link
Contributor Author

@nimarezainia I might have missed them but do we have any UI/UX mockup for this?

#143905 (comment)

@kpollich
Copy link
Member

Want to bump Nicolas's comment above with the necessary implementation plan as this is coming up soon in our roadmap: #143905 (comment)

@karnamonkster
Copy link

karnamonkster commented Jun 20, 2024

Really need this one to see running in our cluster. As we have made a stupid(not yet) but brave decision to move to a unified agent that would be used for all Infra, security and application specific data logging for different teams with different ECE instances as consumers.
From a data quality perspective, governance of ECS compliance led to this decision. We cannot have anyone sending same data over in different ways.
Of course there are exceptions but we still aim to keep it at minimum.
A sincere request to expedite this enhancement/feature request.

@mbudge
Copy link

mbudge commented Jul 16, 2024

Also need this so we can send system metrics (to avoid the logstash 403 forbidden infinite retry issue crashing logstash) , and firewall security logs and Netflow to different logstash pipeline inputs as they are higher throughput which we don’t want impacting windows security log collection.

@nimarezainia
Copy link
Contributor Author

We will soon have news for you all on this issue with the targeted release. Thanks for your patience.

@supu2
Copy link

supu2 commented Jul 22, 2024

@nimarezainia Is there any ETA for the release? Which release we will get that feature?
Thanks you so much for that integration.

jen-huang added a commit that referenced this issue Aug 13, 2024
## Summary

Resolves #143905. This PR adds support for integration-level outputs.
This means that different integrations within the same agent policy can
now be configured to send data to different locations. This feature is
gated behind `enterprise` level subscription.

For each input, the agent policy will configure sending data to the
following outputs in decreasing order of priority:
1. Output set specifically on the integration policy
2. Output set specifically on the integration's parent agent policy
(including the case where an integration policy belongs to multiple
agent policies)
3. Global default data output set via Fleet Settings

Integration-level outputs will respect the same rules as agent
policy-level outputs:
- Certain integrations are disallowed from using certain output types,
attempting to add them to each other via creation, updating, or
"defaulting", will fail
- `fleet-server`, `synthetics`, and `apm` can only use same-cluster
Elasticsearch output
- When an output is deleted, any integrations that were specifically
using it will "clear" their output configuration and revert back to
either `#2` or `#3` in the above list
- When an output is edited, all agent policies across all spaces that
use it will be bumped to a new revision, this includes:
- Agent policies that have that output specifically set in their
settings (existing behavior)
- Agent policies that contain integrations which specifically has that
output set (new behavior)
- When a proxy is edited, the same new revision bump above will apply
for any outputs using that proxy

The final agent policy YAML that is generated will have:
- `outputs` block that includes:
- Data and monitoring outputs set at the agent policy level (existing
behavior)
- Any additional outputs set at the integration level, if they differ
from the above
- `outputs_permissions` block that includes permissions for each
Elasticsearch output depending on which integrations and/or agent
monitoring are assigned to it

Integration policies table now includes `Output` column. If the output
is defaulting to agent policy-level output, or global setting output, a
tooltip is shown:

<img width="1392" alt="image"
src="https://github.com/user-attachments/assets/5534716b-49b5-402a-aa4a-4ba6533e0ca8">

Configuring an integration-level output is done under Advanced options
in the policy editor. Setting to the blank value will "clear" the output
configuration. The list of available outputs is filtered by what outputs
are available for that integration (see above):

<img width="799" alt="image"
src="https://github.com/user-attachments/assets/617af6f4-e8f8-40b1-b476-848f8ac96e76">

An example of failure: ES output cannot be changed to Kafka while there
is an integration
<img width="1289" alt="image"
src="https://github.com/user-attachments/assets/11847eb5-fd5d-4271-8464-983d7ab39218">


## TODO
- [x] Adjust side effects of editing/deleting output when policies use
it across different spaces
- [x] Add API integration tests
- [x] Update OpenAPI spec
- [x] Create doc issue

### Checklist

Delete any items that are not applicable to this PR.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
bryce-b pushed a commit to bryce-b/kibana that referenced this issue Aug 13, 2024
## Summary

Resolves elastic#143905. This PR adds support for integration-level outputs.
This means that different integrations within the same agent policy can
now be configured to send data to different locations. This feature is
gated behind `enterprise` level subscription.

For each input, the agent policy will configure sending data to the
following outputs in decreasing order of priority:
1. Output set specifically on the integration policy
2. Output set specifically on the integration's parent agent policy
(including the case where an integration policy belongs to multiple
agent policies)
3. Global default data output set via Fleet Settings

Integration-level outputs will respect the same rules as agent
policy-level outputs:
- Certain integrations are disallowed from using certain output types,
attempting to add them to each other via creation, updating, or
"defaulting", will fail
- `fleet-server`, `synthetics`, and `apm` can only use same-cluster
Elasticsearch output
- When an output is deleted, any integrations that were specifically
using it will "clear" their output configuration and revert back to
either `elastic#2` or `elastic#3` in the above list
- When an output is edited, all agent policies across all spaces that
use it will be bumped to a new revision, this includes:
- Agent policies that have that output specifically set in their
settings (existing behavior)
- Agent policies that contain integrations which specifically has that
output set (new behavior)
- When a proxy is edited, the same new revision bump above will apply
for any outputs using that proxy

The final agent policy YAML that is generated will have:
- `outputs` block that includes:
- Data and monitoring outputs set at the agent policy level (existing
behavior)
- Any additional outputs set at the integration level, if they differ
from the above
- `outputs_permissions` block that includes permissions for each
Elasticsearch output depending on which integrations and/or agent
monitoring are assigned to it

Integration policies table now includes `Output` column. If the output
is defaulting to agent policy-level output, or global setting output, a
tooltip is shown:

<img width="1392" alt="image"
src="https://github.com/user-attachments/assets/5534716b-49b5-402a-aa4a-4ba6533e0ca8">

Configuring an integration-level output is done under Advanced options
in the policy editor. Setting to the blank value will "clear" the output
configuration. The list of available outputs is filtered by what outputs
are available for that integration (see above):

<img width="799" alt="image"
src="https://github.com/user-attachments/assets/617af6f4-e8f8-40b1-b476-848f8ac96e76">

An example of failure: ES output cannot be changed to Kafka while there
is an integration
<img width="1289" alt="image"
src="https://github.com/user-attachments/assets/11847eb5-fd5d-4271-8464-983d7ab39218">


## TODO
- [x] Adjust side effects of editing/deleting output when policies use
it across different spaces
- [x] Add API integration tests
- [x] Update OpenAPI spec
- [x] Create doc issue

### Checklist

Delete any items that are not applicable to this PR.

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@kpollich kpollich added the QA:Needs Validation Issue needs to be validated by QA label Aug 13, 2024
@nimarezainia
Copy link
Contributor Author

@nimarezainia Is there any ETA for the release? Which release we will get that feature? Thanks you so much for that integration.

If our testing completes successfully target is 8.16

@amolnater-qasource
Copy link

Hi Team,

We have created 07 testcases under Testmo for this feature under Fleet test suite under below Section:

Please let us know if any other scenario needs to be added from our end.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Needs Validation Issue needs to be validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.