Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Associated record / Store all relations in index. #4912

Merged
merged 2 commits into from
Aug 10, 2020

Conversation

fxprunayre
Copy link
Member

@fxprunayre fxprunayre commented Aug 7, 2020

Store all associated records using a structure like:

"recordLink" : [
    {
      "to" : "792361bb-4cfa-409f-9762-ab42e5a05b39",
      "origin" : "catalog",
      "created" : "bySearch",
      "title" : "Concentration en habitants dans un rayon de 500m en Wallonie - Service de visualisation REST",
      "url" : "http://localhost:8080/geonetwork/srv/api/records/792361bb-4cfa-409f-9762-ab42e5a05b39",
      "type" : "services"
    }, ...

in the document index. Using those information, the list of related records can be directly displayed in search result or record view.
The main drawback is that user privileges are not taken into account. The title of a record not visible by current user may be displayed.
The main advantages of this approach is that it is much faster than the related API.

A couple of issues are identified. This mode is for now disabled by default and marked as experimental.

As we use the index to query for relations (relation not stored in children, bidirectional sibling, dataset operatedBy), the indexing must be 2 steps:

  • First index all records
  • Then index relations

Currently it is hard to only do a partial indexing. In this case, we only need to collect recordLink and update the doc in the index.

While editing a record, all related before and after the editing session needs to be update.
We should collect all UUIDs affected by the current session, and index them following the rule above.

Store all associated records using a structure like:

```json
"recordLink" : [
    {
      "to" : "792361bb-4cfa-409f-9762-ab42e5a05b39",
      "origin" : "catalog",
      "created" : "bySearch",
      "title" : "Concentration en habitants dans un rayon de 500m en Wallonie - Service de visualisation REST",
      "url" : "http://localhost:8080/geonetwork/srv/api/records/792361bb-4cfa-409f-9762-ab42e5a05b39",
      "type" : "services"
    }, ...
```

in the document index. Using those information, the list of related records can be directly displayed in search result or record view.
The main drawback is that user privileges are not taken into account. The title of a record not visible by current user may be displayed.
The main advantages of this approach is that it is much faster than the related API.

A couple of issues are identified.

As we use the index to query for relations (relation not stored in children, bidirectional sibling, dataset operatedBy), the indexing must be 2 steps:
* First index all records
* Then index relations

Currently it is hard to only do a partial indexing. In this case, we only need to collect recordLink and update the doc in the index.

While editing a record, all related before and after the editing session needs to be update.
We should collect all UUIDs affected by the current session, and index them following the rule above.
@fxprunayre fxprunayre added this to the 4.0.0 milestone Aug 7, 2020
fxprunayre added a commit to metadata101/iso19115-3.2018 that referenced this pull request Aug 7, 2020
@fxprunayre fxprunayre marked this pull request as ready for review August 10, 2020 11:03
@fxprunayre fxprunayre modified the milestones: 4.0.0, 4.0.0-alpha.2 Aug 10, 2020
@fxprunayre fxprunayre merged commit cd0009e into geonetwork:4.0.x Aug 10, 2020
MichelGabriel pushed a commit to MichelGabriel/core-geonetwork that referenced this pull request Aug 10, 2020
* Associated record / Store all relations in index.

Store all associated records using a structure like:

```json
"recordLink" : [
    {
      "to" : "792361bb-4cfa-409f-9762-ab42e5a05b39",
      "origin" : "catalog",
      "created" : "bySearch",
      "title" : "Concentration en habitants dans un rayon de 500m en Wallonie - Service de visualisation REST",
      "url" : "http://localhost:8080/geonetwork/srv/api/records/792361bb-4cfa-409f-9762-ab42e5a05b39",
      "type" : "services"
    }, ...
```

in the document index. Using those information, the list of related records can be directly displayed in search result or record view.
The main drawback is that user privileges are not taken into account. The title of a record not visible by current user may be displayed.
The main advantages of this approach is that it is much faster than the related API.

A couple of issues are identified.

As we use the index to query for relations (relation not stored in children, bidirectional sibling, dataset operatedBy), the indexing must be 2 steps:
* First index all records
* Then index relations

Currently it is hard to only do a partial indexing. In this case, we only need to collect recordLink and update the doc in the index.

While editing a record, all related before and after the editing session needs to be update.
We should collect all UUIDs affected by the current session, and index them following the rule above.

* Update en-admin.json
fxprunayre added a commit to metadata101/iso19115-3.2018 that referenced this pull request Aug 10, 2020
josegar74 added a commit that referenced this pull request Jan 8, 2024
fxprunayre added a commit that referenced this pull request Feb 9, 2024
* Update to Elasticsearch 8. Use of Elasticsearch Java API Client instead of Java High Level REST Client

* Update to Elasticsearch 8 / WFS indexing draft. (#88)

* Update to Elasticsearch 8 / remove TODOs

* Update Elasticsearch client to version 8.11.3

* Elasticsearch / Update maven plugin.

* Associated record / Store all relations in index / Remove experimental feature, not used. Related to #4912

* Elasticsearch / Update maven plugin configuration. Avoid error like ERROR: Elasticsearch exited unexpectedly, with exit code 143

* Elasticsearch / Update MetadataUtils.getAssociated to retrieve scripted overview field

* Elasticsearch / Update MetadataUtils.getAssociated remove TODO comment

* Elasticsearch / Fix and refactor index readonly health check

* Elasticsearch / Log query error details

* Elasticsearch / Sonarlint improvements

* Elasticsearch / WrapperQuery use base64 encoded JSON string query.

* Elasticsearch / Remove unused commented code from EsSearchManager

* Elasticsearch / More strict Xlink query based on UUID and fix check on hits. A request may return no hits but can be used to check number of hits. In such case we should avoid using hits.hits.size and use hits.total.value to get number of match.

* Elasticsearch / Health check / Fix number of hits info.

* Elasticsearch / Cleaning / No need to retrieve hits to only get matches.

* Elasticsearch / Deprecated field [include] used, expected [includes] instead.

* Elasticsearch / Remove 'Clear XLink cache' from Administration > Tools, clear the Xlink cache automatically before indexing and remove non-implemented code to retrieve metadata with XLink (not required anymore)

* Kibana / Update install instruction

Related to elastic/kibana#82521.

* Elasticsearch / Remove unused imports

* Kibana / Update default dashboards.

* Elasticsearch / Documentation / Update Elasticsearch version

* Elasticsearch / Fix logger module name typo

---------

Co-authored-by: François Prunayre <fx.prunayre@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant