Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pore over all existing text fields to identify more that should be multi-fields #104

Closed
2 tasks
webmat opened this issue Aug 23, 2018 · 6 comments
Closed
2 tasks
Assignees

Comments

@webmat
Copy link
Contributor

webmat commented Aug 23, 2018

Moving from keyword only indexing to multi-fields is a breaking change, so we need to make sure we're consistent with this.

  • Review all current textual fields with an open mind and consider if they could benefit from being multi-field.
    • Since we currently do keyword by default, the question will mostly be: would we benefit from the ability to do partial searches on these values? E.g. filter for "guest" or "wifi" in field network.name or "prod" in host.name.
  • Document this as an ECS convention: we consider doing multi-fields on any non-trivial text field.
@vbohata
Copy link

vbohata commented Aug 23, 2018

Not just text fields should be multifields. For example file.uid, gid, http.response.status_code is good to be integer (for comparison) but also very useful to be keyword (useful in visualisations, machine learning, ...).

@webmat
Copy link
Contributor Author

webmat commented Aug 23, 2018

Actually discrete numbers like integers can be used for aggregations / visualizations. I don't think that's the case for floats, but integers definitely so. I often do visualizations on port fields, indexed only as integer.

@webmat webmat self-assigned this Aug 23, 2018
@vbohata
Copy link

vbohata commented Aug 23, 2018

I know there are some limits I hit in the past. For example for multi metric machine learning job you can not split data per integer value. So I had to make http response code multi-field to be able to use it for split.

@webmat
Copy link
Contributor Author

webmat commented Aug 23, 2018

Ah, this is great information! I'm not that familiar with ML yet. Thanks for pointing out, @vbohata

@ruflin
Copy link
Member

ruflin commented Aug 24, 2018

I'm starting to think if we should make decisions about text or keyword as the default per field. For example for host.name my take is it should be keyword by default as it's often used for filtering and aggregation and I see this as one of the main goals of ECS. For me in this case text is and addition and nice to have. This means if it is a multi field it would be host.name.text.

If we would add host.name.text to the default fields and someone just does a query without specifying a field, he would also get results for this field.

@webmat
Copy link
Contributor Author

webmat commented Oct 19, 2018

We're moving to almost exclusively keyword indexing at the canonical field name. Two exceptions are message and error.message which remain text on the canonical field name.

A rewrite of the section about text indexing is coming up, where we'll suggest people who need text indexing on other fields add text indexing as a sub-field instead, which will not be a conflict with ECS, and will not cause a breaking change.

So closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants