Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion for a new processor: Replacer #669

Open
Malutthias opened this issue Sep 18, 2024 · 1 comment
Open

Suggestion for a new processor: Replacer #669

Malutthias opened this issue Sep 18, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@Malutthias
Copy link
Collaborator

Based on Issue #643 we agreed to implement a new processor to conduct replacements in logs.

Following requirements should be met, every listed point here is open for further discussion:

Pattern Matching: The replacer processor should support various pattern matching techniques to identify and replace specific patterns or strings in the logs. This often involves regular expressions or predefined string patterns; or fuzzy matching in some cases.

Customizable Replacement Rules: It should allow users to define and customize replacement rules. This could include simple string replacements, complex regex-based substitutions, or conditional replacements based on specific log content like severity e.g. Possible replacements with hashes are open for discussion. In case of random replacements or non deterministic ones, if they are even considered; a history for possibly needed rollbacks should be kept.

Rule Prioritization: If wildcard rules (like test.*) and specific rules (like test.subfield) exist, there should be clear prioritization. Typically, specific rules take precedence over wildcard rules.

Granularity in Replacement Operations: The processor should allow users to apply replacements with different granularities, like;

  • Field level: Apply to a single field like test.subfield.
  • Subtree level: Apply to an entire field hierarchy like test.*.
  • Global level: Apply across all fields if needed, for global transformations.
@Malutthias Malutthias added the enhancement New feature or request label Sep 18, 2024
@ekneg54
Copy link
Collaborator

ekneg54 commented Sep 18, 2024

How about a simple solution for the first try:

event

{
"target": {
  "field": "This is the message and this should be replaced"
  }
}

rule format:

filter: message
replacer:
  mapping:
    target.field: "This is the message and %{ this is the replacement string }"

results in:

{
"target": {
  "field": "This is the message and this is the replacement string"
  }
}

I suggest to avoid all kinds of regex cause they are always slow. If we could do 80% of the problem with simple string operations, that should be the way to go.

if you want to replace the same string in every field, you can add the other fields with their replacements in the mapping field of the rule like

filter: message
replacer:
  mapping:
    target.field: "This is the message and %{ this is the replacement string }"
    other.field: "This is another field with another %{ the replacement string }"

The advantage of this solution is, that the interface for the user is already known, because it is like in the dissector. The interface of the processor aligns to to the field_manager, so it is easily implemented in logprep and in the supporting tool chain like the fda. Because the user should avoid editing yaml files for the configuration of rules, all necessary simplifications could then be done by the fda user interface and results in the corresponding rule in logprep. Things like "I want to replace the same string in all subfields" (But you have to know all subfields ;) )

But yes you have to add all fields and their replacements in the mapping if you want to write your yaml files by your own.

Let me know if you are fine with it.

additionally we should avoid to add rules working on all subfields of an event. Because this leads to a potential denial of service surface, if an attacker nests events infinite like deep. this will lead to an everlasting loop or recursion. in my opinion.
Same for global replacements. Because nobody knows how much fields the event will have. This would also result in very poor performance because the traversing over all fields of a dict not knowing how much it is is real slow and already said dangerous

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants