Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a generic convert function strategy #163

Merged
merged 23 commits into from
Sep 21, 2023

Conversation

jesper-friis
Copy link
Contributor

Description:

Added a generic convert function strategy.

It calls a Python function to convert zero or more input instances to zero or more output instances.

The module containing the Python function must exists in your PYTHONPATH.

Type of change:

  • Bug fix.
  • New feature.
  • Documentation update.

Checklist for the reviewer:

This checklist should be used as a help for the reviewer.

  • Is the change limited to one issue?
  • Does this PR close the issue?
  • Is the code easy to read and understand, including clearly named variables?
  • Do all new feature have an accompanying new test?
  • Has the documentation been updated as necessary?

@codecov-commenter
Copy link

codecov-commenter commented Aug 29, 2023

Codecov Report

Patch coverage: 87.30% and project coverage change: +19.21% 🎉

Comparison is base (e10eaf2) 68.13% compared to head (58bfc73) 87.35%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the GitHub App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #163       +/-   ##
===========================================
+ Coverage   68.13%   87.35%   +19.21%     
===========================================
  Files          15       15               
  Lines         408      419       +11     
===========================================
+ Hits          278      366       +88     
+ Misses        130       53       -77     
Flag Coverage Δ
linux 87.35% <87.30%> (+19.21%) ⬆️
windows 87.25% <87.30%> (+19.35%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
oteapi_dlite/utils/utils.py 69.81% <0.00%> (ø)
oteapi_dlite/strategies/generate.py 78.43% <50.00%> (ø)
oteapi_dlite/strategies/parse.py 89.13% <71.42%> (+89.13%) ⬆️
oteapi_dlite/strategies/convert.py 92.15% <92.15%> (ø)
oteapi_dlite/strategies/mapping.py 100.00% <100.00%> (ø)
oteapi_dlite/strategies/parse_excel.py 89.85% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@francescalb francescalb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time understanding the purpose of this converter strategy, and how to use it. What are the imput instances, what are the output instances?

Can you be more explicit in the documentation? I have not properly reviewed the code yet bacuase I am confused about what is supposed to do what here.

@jesper-friis jesper-friis mentioned this pull request Sep 1, 2023
9 tasks
@@ -119,6 +126,8 @@ jobs:
strategy:
fail-fast: false
matrix:
# There seems to be an issue with module search in Python 3.11
# python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.9", "3.10"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue for addressing the 3.11 problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added issue #168

@@ -135,7 +144,8 @@ jobs:
run: |
python -m pip install -U pip
pip install -U setuptools wheel
pip install -e .[dev]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are 'requirements.*' preferred over setup.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy of my answer to Francesca:

I splittet it into two lines, because I wanted to install the requirements with the update (-U) option, which doesn't make sense for development (-e .) installation.

But it might not be necessary. I tried to change a lot of things before the CI on GitHub finally went through. It could very well be the change in line 91-92 that did the work...

Returns:
SessionUpdate instance.
"""
config = self.convert_config.configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to update the config with the relevant fields from the session

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. That is needed if we want a later filter to have an effect.

I was thinking about encouraging not specifying labels in the configuration. But fetching the label from the session could be useful. However, a single label in the session will not do, since each input and output has an optional label, so we need to be more specific when specifying labels in the session.

Would it make sense to allow variable substitutions in the configuration of a partial pipeline, like

  https://www.ntnu.edu/physmet/data#image_analyser:
    function:
      functionType: application/vnd.dlite-convert
      configuration:
        module_name: temdata.image_analyser
        function_name: image_analyser
        inputs:
          - datamodel: http://onto-ns.com/meta/0.1/TEMImage
            label: ${temimage}
        outputs:
          - datamodel: http://onto-ns.com/meta/0.1/PrecipitateStatistics
            label: ${precipitate_statistics}
    mapping:
      mappingType: mappings
      prefixes:
       ...
      triples:
        ...

where ${temimage} and ${precipitate_statistics} are substituted from the session.

On the pros side, this would allow templated partial pipelines with improved re-usability.

On the conc side, it will be an extra layer of complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?

Copy link
Contributor Author

@jesper-friis jesper-friis Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point. I agree that a partial pipeline should document a single data source or sink and in general not contain variables. However, there are cases where fetching parameters from the configuration are useful. One case is a partial pipeline documenting a SQL database. The documentation of the database should be fix, while the query may vary each time we execute the pipeline. Another possible use case is to specify the label a parser should use when storing an newly created instance into the collection and correspondingly, the label a generator should use when fetching an instance from the collection. In this case the labels may be variables when documenting the partial pipelines, but must be assigned and internally consistent before executing the full pipeline.

While variables is an easy and flexible way to assign consistent labels across partial pipelines, it may also open a can of worms of potential misuse.

Furthermore, in the common case that we only have one instance of a given entity in the collection, we don't need labels, since we can refer to the instance by specifying the entity in the configuration.

So if we solve the issue of assigning the labels in strategies that may refer to multiple instances (like the convert strategy) without variables, it might be a good idea to avoid variables in the partial pipelines stored in the knowledge base.

However, for populate the knowledge base with partial pipelines of a set of similar data sources, I think that templates with variables would be very useful. In this case, all substitutions should be done before storing into the knowledge base. Such a template utility may in this case live outside oteapi.

@francescalb francescalb requested review from francescalb and removed request for francescalb September 18, 2023 10:42
Copy link
Contributor

@quaat quaat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced we need labels or variables in the declarative pipelines, but this might be a discussion for the strategy meeting. Otherwise, merge as you wish

Returns:
SessionUpdate instance.
"""
config = self.convert_config.configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sold on the idea of labels. Also, if we start creating declarative partial pipelines with variables, they become hard to exchange/share between instances. Why not refer to a resource directly?

@jesper-friis jesper-friis merged commit 813c824 into master Sep 21, 2023
8 checks passed
@jesper-friis jesper-friis deleted the convert-function-strategy branch September 21, 2023 19:03
jesper-friis added a commit to SINTEF/dlite that referenced this pull request Sep 29, 2023
# Description
Example with OTEAPI and OTELib using TEM data.

This example currently depends on a set of other PRs:

* #633 (already merged into this
branch)
* EMMC-ASBL/oteapi-core#318
* EMMC-ASBL/tripper#129
* EMMC-ASBL/oteapi-dlite#163

## Type of change
- [ ] Bug fix & code cleanup
- [ ] New feature
- [x] Documentation update
- [ ] Test update

## Checklist for the reviewer
This checklist should be used as a help for the reviewer.

- [ ] Is the change limited to one issue?
- [ ] Does this PR close the issue?
- [ ] Is the code easy to read and understand?
- [ ] Do all new feature have an accompanying new test?
- [ ] Has the documentation been updated as necessary?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants