Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typecode fixes on IDs for ChromatographyConfiguration, StorageProcess, and MassSpectrometryConfiguration class ids #225

Merged
merged 10 commits into from
Jul 24, 2024

Conversation

kheal
Copy link

@kheal kheal commented Jul 16, 2024

PR Information

What type of PR is this? (check all applicable)

  • Bug Fix
  • Documentation

Description

This PR aims to

  1. Shorten id typecodes to adhere to nmdc standards for two berkeley-schema specific classes (aka, no data in mongo for these).
    - ChromatographyConfiguration typecode changed from chrocon to chrcon
    - StorageProcess typecode changed from storpro to storpr
    All tests and references to these old typecodes
  2. Update CONTRIBUTING guide to reflect the typecode guidelines
  3. Remove extraneous parentheses for ids on definition of MassSpectrometryConfiguration and ChromatographyConfiguration.

Related Issues

Did you add/update any tests?

  • Yes I updated the example data that is tested

Could this schema change make it so any valid data becomes invalid?

  • Yes (A migrator is required)
    there are no existing data in mongo that are part of these classes (we discovered these issues when trying to mint IDs in order to load these classes). @eecavanna are you OK with no migrator for this change? I understand that this is not best practice, but making a migrator for theoretical records with ids that could not have been minted seems a bit....unnecessary.

If you answered "Yes", does this PR branch include that migrator?

  • No, but we've decided it it unnecessary, see conversation below

Does this PR have any downstream implications?

  • No

Copy link

github-actions bot commented Jul 16, 2024

PR Preview Action v1.4.7
🚀 Deployed preview to https://microbiomedata.github.io/berkeley-schema-fy24/pr-preview/pr-225/
on branch gh-pages at 2024-07-17 23:14 UTC

@kheal kheal changed the title 2122 typecode fix Typecode fixes on IDs for ChromatographyConfiguration, StorageProcess, and MassSpectrometryConfiguration class ids Jul 16, 2024
@kheal kheal marked this pull request as ready for review July 16, 2024 21:36
@kheal kheal requested a review from mslarae13 July 16, 2024 21:36
@kheal kheal self-assigned this Jul 16, 2024
@kheal kheal requested review from aclum and turbomam July 16, 2024 21:37
@eecavanna
Copy link

eecavanna commented Jul 16, 2024

Hi @kheal, thanks for including all that info and for tagging me in the migration part.

I have a question:

  1. Were those fancy patterns present in the legacy schema? If not, then no migrator is necessary, from my perspective. Elaboration is below.

In the Berkeley schema (at this stage, where we haven't yet merged it back into the upstream repo), I consider all schema changes to be part of a single set of "legacy-to-Berkeley" schema changes that get applied at once. So, if those fancy patterns did not exist in the legacy schema, exist in the current Berkeley schema, and will not exist in the Berkeley schema after you merge this PR in, I will effectively act as though they never existed.

Out sick this afternoon and typing this from my phone. Hope that helps a bit.

CONTRIBUTING.md Outdated
@@ -109,6 +109,9 @@ Core developers should read the material on the [LinkML site](https://linkml.io/
- Follow the naming conventions of the parent class
- Descriptions of child classes may reference parent classes in a genus-differentia definition structure (e.g. "A workflow execution activity that...")
- Inheritance should be monotonic: `slot_usage` should refine rather than override
- ID patterning and checks
- ID patterns for new classes should follow conventions found [here](https://microbiomedata.github.io/nmdc-schema/identifiers/)
- Class-linking slots (i.e. `has_input`) should have slot_usage declared that limit range to only instances of the expecting linked class (i.e. `syntax: "{id_nmdc_prefix}:chrcon-{id_shoulder}-{id_blade}$"`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're talking about the structured_patterns here, right? The range is limited by the range metaslot, and not anything else. This is also intended as a short term fix, just until the LinkML developers come up with a more general solution that doesn't require us to make repetitive assertions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is in reference to the structured_pattern use. I understand that its a temporary fix, but it's been a big pain for the re-IDing team so I wanted to document it somewhere until linkML can verify ranges.

I will update language to be more explicit.

@turbomam
Copy link
Member

turbomam commented Jul 16, 2024

thanks, good issues and good PR

@kheal
Copy link
Author

kheal commented Jul 16, 2024

@eecavanna

Re:

Were those fancy patterns present in the legacy schema? If not, then no migrator is necessary, from my perspective. Elaboration is below.

No they were not. @turbomam nicely documented that for us here: microbiomedata#2122 (comment) .

I agree re: no migrator, thanks!

@kheal kheal requested a review from mslarae13 July 17, 2024 20:23
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>
Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>
@eecavanna
Copy link

eecavanna commented Jul 18, 2024

Remove extraneous parentheses for ids on...

FYI: The Minter in the Berkeley environment has been updated to (a) tolerate one layer of extraneous parentheses and (b) to use the first typecode — foo — in a (foo|bar|baz) list.

@mslarae13
Copy link
Collaborator

approve from non-technical review.
Does this relate to microbiomedata#2125 at all?

@kheal
Copy link
Author

kheal commented Jul 22, 2024

@mslarae13

approve from non-technical review. Does this relate to microbiomedata#2125 at all?

Yes it is related in that it is dealing with typecodes for NMDC ids, but it does not specifically address changes in the schema related to supporting legacy typecodes.

@kheal kheal merged commit a74d8d6 into main Jul 24, 2024
3 checks passed
@kheal kheal deleted the 2122_typecode_fix branch August 2, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants