Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support for object type in Derived Fields #13143

Closed
rishabhmaurya opened this issue Apr 10, 2024 · 6 comments · Fixed by #13592
Closed

[Feature Request] Support for object type in Derived Fields #13143

rishabhmaurya opened this issue Apr 10, 2024 · 6 comments · Fixed by #13592
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@rishabhmaurya
Copy link
Contributor

Is your feature request related to a problem? Please describe

Issue: #12281

With the current implementation of derived fields in OpenSearch, we support various primitive types such as keyword, long, double, geo_point, ip, date, and boolean. However, there are scenarios where users may need to derive fields of type object based on source fields containing JSON data. This proposal aims to enhance derived fields to support nested JSON structures, enabling users to query subfields within derived JSON fields.

Current Scenario
Consider the following index mapping definition:

{
  "mappings": {
    "properties": {
      "product_name": { "type": "keyword" },
      "product_json": { "type": "text", "index": false }
    },
    "derived": {
      "derived_product_json": {
        "type": "keyword",
        "script": {
          "source": "emit(params._source[\"product_json\"])"
        }
      }
    }
  }
}

In this example, product_json contains a JSON object, and derived_product_json derives a field of type keyword representing the entire JSON object. However, querying specific subfields within derived_product_json is currently not possible since subfields are not defined within the derived field context.

Describe the solution you'd like

Introduce a new field type called json or object within the derived fields context to support nested JSON structures. This enhancement will enable users to query subfields of derived JSON fields, providing greater flexibility in data querying and analysis.

Proposed Mapping

{
  "mappings": {
    "properties": {
      "product_name": { "type": "keyword" },
      "product_json": { "type": "text", "index": false }
    },
    "derived": {
      "derived_product_json": {
        "type": "json", // New type introduced: json or object
        "script": {
          "source": "emit(params._source[\"product_json\"])"
        }
      }
    }
  }
}

Example Document
Consider a document with the following structure:

{
  "product_name": "canyon ultimate road bike",
  "product_json": { 
    "brand": "canyon", 
    "model": "ultimate" ,
    "price": 4500
  }
}

Querying Subfields
With the proposed enhancement, users can query subfields within derived_product_json. For instance:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "derived_product_json.brand": "canyon" } },
        { "match": { "derived_product_json.model": "ultimate" } }
      ]
    }
  },
  "fields" : ["derived_product_json.brand, derived_product_json.model"]
}

This query retrieves documents where the derived product_json field contains specific brand and model values.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@rishabhmaurya rishabhmaurya added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 10, 2024
@rishabhmaurya rishabhmaurya self-assigned this Apr 10, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6]
@rishabhmaurya Thanks for filing, we'd love to see a pull request to add this functionality.

@rishabhmaurya
Copy link
Contributor Author

we decided in search community meeting to use object as a field type instead of json

@rishabhmaurya rishabhmaurya changed the title [Feature Request]Support for Nested JSON Fields in Derived Fields [Feature Request]Support for object type in Derived Fields Apr 11, 2024
@rishabhmaurya rishabhmaurya changed the title [Feature Request]Support for object type in Derived Fields [Feature Request] Support for object type in Derived Fields Apr 11, 2024
@rishabhmaurya
Copy link
Contributor Author

rishabhmaurya commented Apr 11, 2024

@msfroh @reta
For users, who want to explicitly specify the type for subfields instead of relying on inferring the field type, what do you think should be the right way of defining these explicit mappings. I propose following -

{
  "mappings": {
    "properties": {
      "product_name": { "type": "keyword" },
      "product_json": { "type": "text", "index": false }
    },
    "derived": {
      "derived_product_json": {
        "type": "object",
        "script": {
          "source": "emit(params._source[\"product_json\"])"
        }
        "properties": {
          "default": "keyword", // inferring logic will use default type if specified instead of inferring the type
          "derived_product_json.brand": "keyword",
          "derived_product_json.price": "long"
        }
      }
    }
  }
}

@reta
Copy link
Collaborator

reta commented Apr 11, 2024

@rishabhmaurya I think it should follow the same pattern as mappings:

"derived_product_json": {
        "type": "object",
        "script": {
          "source": "emit(params._source[\"product_json\"])"
        }
        "properties": {
          "derived_product_json.brand": {
              "type": "keyword"
           },
          "derived_product_json.price":  {
              "type": "long"
           }
        }
      }

I think the

          "default": "keyword", // inferring logic will use default type if specified instead of inferring the type

should not be there but the same logic as applied to dynamic mappings could be used. That would bring consistent behaviour to all variations of mapping.

@rishabhmaurya
Copy link
Contributor Author

that makes sense, thanks @reta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: Done
3 participants