Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting ECS in Logstash #11306

Closed
jsvd opened this issue Nov 14, 2019 · 9 comments
Closed

Adopting ECS in Logstash #11306

jsvd opened this issue Nov 14, 2019 · 9 comments

Comments

@jsvd
Copy link
Member

jsvd commented Nov 14, 2019

The Elastic Common Schema or ECS was introduced by Elastic in February 2019, to facilitate both usage of data across multiple components of the Elastic Stack and to standardize external data source formats.

The schema, along with documentation and examples, is being maintained at elastic/ecs.

While products such as beats and kibana can either produce or consume data in ECS format, Logstash has yet to support it, even though there are plenty community requests for it, such as:

The adoption has been discussed in a few places before, and the current main ideas are to approach it on two fronts:

a) Add toggable "ecs_compatibility" behaviours to all plugins that produce fixed schemas, such as the geoip/useragent filters, and the http/tcp inputs.
b) Create a new filter that outputs ECS compatible events but allows the user to perform most of the wiring between the source schema and ECS

Another idea that was considered was having an elasticsearch template that adds ECS fields as aliases as it was done in elastic/beats. However for logstash the source schema is highly unpredictable which reduces the benefit of this tactic.

@jsvd jsvd added the meta label Nov 14, 2019
@monicasarbu
Copy link

Pinging @tsg @megatrontony @mchopda for awareness. This is targeting 7.x release. cc @jsvd

@webmat
Copy link
Contributor

webmat commented Nov 29, 2019

Ping @webmat ;-)

@webmat
Copy link
Contributor

webmat commented Nov 29, 2019

I was about to open the same issue 😄 Here are a few ideas.

Documentation

I've recently discussed with @karenzone about going over the docs, and I think we could do a few things at that level

  • replacing examples to be more in line with ECS. Could be as simple as changing the field name in the example.
  • Addressing predictable problems head on, perhaps have a Logstash docs page specifically about ECS, covering:
    • how to install the correct template
      • If users are getting a "field data" error, they probably forgot that
    • talking about the two multi-field conventions
      • ES and Logstash do:
        • text on the canonical field (e.g. myfield)
        • keyword on the multi field myfield.keyword
      • ECS went with the reverse convention, to be in line with Beats
        • keyword on the canonical field (e.g. myfield)
        • text on the multi field (e.g. myfield.text).
        • To be precise, ECS doesn't have any multi fields yet, but they are coming
    • mapping conflicts on fields host (Logstash), source (Beats 6) etc
    • Having more examples with nested fields, which are a requirement for ECS. E.g. showing groks with brackets for the nesting
    • Not sure if the docs talk about discuss.elastic.co. If it does, users should still create their posts in the Logstash section, but also consider applying the elastic-common-schema tag to it, if it's a question about ECS :-) (I just tagged 10 of them this way)

Template

Logstash could offer a flag (off by default, for bwc) to install the latest ECS template instead of the default Logstash template.

We also need to think how to name the resulting index, should it still be logstash-* or should we come up with a second convention?

We try to release ECS a few weeks before stack releases, which should leave time to update the Logstash ECS template. Note that the sample templates in the ECS repo are made for experimentation more than prod (see settings and index_patterns). So it shouldn't be used as is. But producing a template ready to use as-is for Logstash is easy. Happy to help with that.

grok patterns

We should not change the existing grok patterns, obviously.

But we should publish new ones for each grok that has field names in it. People are looking for them
Having ECS_SYSLOGBASE right besides SYSLOGBASE would solve the need, for example.

I did some analysis a while ago (use this to extract field names from groks), to figure out which groks had field names in them, and whether they would cause a mapping conflict. At the time I counted 800+ field names in the groks. So this will probably need to be done gradually :-)

plugins

Totally agree we should adjust some plugins to offer the option to output using the ECS field names. 👍

I had laid down some thoughts about creating a "mass rename" plugin here logstash#9768, which would remove the need for long series of mutate/rename + type coercion & so on. This issue is no longer quite up to date, so if/when someone wants to start on this, please ping me :-)

call to action

Please feel free to:

  • loop me in on discussions around this, happy to join the appropriate meetings
  • ping me on any ECS-related issue

Really looking forward to this! ❤️

@webmat
Copy link
Contributor

webmat commented Dec 6, 2019

Related: #10490 (comment)

@Aqualie
Copy link

Aqualie commented Dec 30, 2019

In the beats stack we have two processors: community_id & registered_domain which look for certain criteria and output the relevant ECS fields. Can we have this added to the logstash as well right now I do not see any way of duplicating these processors in logstash without extensive scripting knowledge and it's every beneficial to have especially the community_id field. I have multiple firewall logs being sent to my logstash for parsing which are being outputted to the correct ECS fields.

If we could perhaps use the same terminlogoy and say something like:
`processors:

  • community_id:`

Logstash would then spit out the community_id given the pre-requisite fields are in the correct ECS fields. Else we can simply define them:

processors:
  - community_id:
      fields:
        source_ip: my_source_ip
        source_port: my_source_port
        destination_ip: my_dest_ip
        destination_port: my_dest_port
        iana_number: my_iana_number
        transport: my_transport
        icmp_type: my_icmp_type
        icmp_code: my_icmp_code
      target: network.community_id

As well perhaps we can do the same for the source.bytes and destination.bytes, simplifying this as well would be great as at the moment it require scripting as well in-order to populate the network.bytes field.

@webmat
Copy link
Contributor

webmat commented Jan 3, 2020

@Aqualie Note that in the meantime, if your Logstash sends directly to Elasticsearch, you can configure your ES output to send to an Elasticsearch ingest pipeline. This will let you compute the community ID anyway, until a Logstash filter is created for this.

@tomrade
Copy link

tomrade commented Mar 10, 2020

Ive had this issue with a upstream GROK pattern (RT_FLOW/JUNOS) making an field called "event" , Ive fixed the pattern ( in a fork) however how are we handling GROK patterns and ECS as its not easy toggle ECS to non ECS in a pattern.

im not sure if I should do a merge request or not as my fix was to put everything under a subfolder (ECS style) but that isnt what is done for non ECS patterns.

=>{"type"=>"mapper_parsing_exception", "reason"=>"object mapping for [event] tried to parse field [event] as object, but found a concrete value"}}}}

@webmat
Copy link
Contributor

webmat commented Mar 12, 2020

Users landing on this issue can also look at more recent issues #11623 and #11635 🙂

@roaksoax
Copy link
Contributor

Closing this issue as Logstash now supports ECS.

@flexitrev flexitrev assigned flexitrev and unassigned flexitrev Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants