Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix example validation issues before merging nmdc-schema/main into berkeley-schema-fy24/main #73

Merged
merged 226 commits into from
Feb 2, 2024

Conversation

turbomam
Copy link
Member

merged a local copy of the nmdc-schema into a local duplicate of berkeley-schema-fy24

now comparing against berkeley-schema-fy24's main

will need to review the example data files that have been moved into the problem sub-directory

@eecavanna who else should we invite as reviewers?

aclum and others added 30 commits November 7, 2023 16:00
Updating slots to required for class FunctionalAnnotationAggMember, adding new class and slots for the metap aggregation collection.
Add is_best_protein slot
Update slots for MetaPFunctionAggMember
Updated to use new is_best_protein slot
Update range for is_best_protein to Boolean from BooleanValue.
Updating to Boolean instead of BooleanValue.
Make slots for KEGG aggregation tables required.
Updating to failed data can have no has_output.
Also, fix underscore-ness of other pseudo-"private" variables.
That way, we don't have to instantiate the class in order to invoke them.
Removing rule to have has_output be optional if quality_control_report.status=fail
Testing removing invalid data of quality_control_report.status=pass mixed with failure_categorization.
snakecasing permissible values for FailureWhatEnum
Updating failure_what values to snakecase permissible values
Updating failure_where value to snakecase permissible value.
@eecavanna eecavanna changed the title Review merge conflict resolution (nmdc-schema/main -> berkeley-schema-fy24/main) Fix example validation issues before merging nmdc-schema/main into berkeley-schema-fy24/main Jan 20, 2024
Resolving merge conflicts, updating format of qc information to what is in the nmdc-schema.
Slot to store additional comments about laboratory or workflow output. For workflow output
it may describe the particular workflow stage that failed. (ie Failed at call-stage due to a malformed fastq file).

instrument_name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to get removed, re-adding this is why Database-data-generation-instrument_name-invalid.yaml is now passing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turbomam not sure if you want to use the new deprecation way we talked about or if this should just be deleted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is just deleted since it's the fork & we just deleted everything else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean remove it from nmdc-schema production (main), then deprecation protocol we decided on. I think it would be a good spot for Sierra to help (if so)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I already deleted it! but we can reconstruct a deprecation if you want @aclum

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting is fine with me.

@aclum
Copy link
Collaborator

aclum commented Jan 22, 2024

@turbomam Database-img_mg_annotation_objects.yaml is failing b/c for most documents everything but type slot is commented out, edits in nmdc-schema for this file are from about a month ago. Recall what you were testing or can things be uncommented or deleted to resolve this?

@aclum
Copy link
Collaborator

aclum commented Jan 23, 2024

@turbomam i need help with Extraction-NEON.yaml, if I strip this down to just id, type, has_input, has_output and conver to a extraction-set record I can get data to validate properly but not as an Extraction record itself.
That is, this validates:
cat Database-extraction.test.yaml
extraction_set:

  • id: nmdc:extrp-11-00r2pk65
    type: nmdc:Extraction
    has_input:

    • nmdc:procsm-11-9gjxns61
      has_output:
    • nmdc:procsm-11-0wxpzf07

    but this does not
    cat Extraction-NEON.yaml
    id: nmdc:extrp-11-00r2pk65
    type: nmdc:Extraction
    has_input:

  • nmdc:procsm-11-9gjxns61
    has_output:

  • nmdc:procsm-11-0wxpzf07

Copy link
Collaborator

@aclum aclum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things that need to be addressed:

  • Database-Extraction-invalid-sample_mass.yaml is failing but for unexpected reasons
  • slot instrument_name needs to be removed to make Database-data-generation-instrument_name-invalid.yaml
  • I have fixed a number of the valid example data files in the problem folder, those fixes are in branch https://github.com/microbiomedata/berkeley-schema-fy24/tree/berkeley-schema-replica-feedback/src/data/valid as discussed over slack. I'll leave it to @turbomam to decide how to get these back in
  • I would like to change the permissible value names for FailureWhereEnum to match the berkeley-schema-fy24 class names, dropping 'Activity'. I'm willing to do this and update the example data but we might want to wait until after we've worked through the other issues.

@turbomam
Copy link
Member Author

turbomam commented Jan 24, 2024

I fixed all of the tests except src/data/problem/valid/Database-img_mg_annotation_objects.yaml but I haven't read all of the comments here yet.

  • Database-Extraction-invalid-sample_mass.yaml: there's no sample_mass slot any more. Thre is an input_mass slot that serves the same purpose
  • some of the unexpected behavior you were seeing in your validations may come from the fact that you we'rent using the --target-class option.
  • I did remove instrument_name

I still have to work on the permissible value names for FailureWhereEnum

@turbomam
Copy link
Member Author

turbomam commented Jan 25, 2024

I'm really not motivated to hand-fix Database-img_mg_annotation_objects.yaml. @brynnz22 I think all of the commented sections need t be uncommented, and then it needs the type-injection migration. If it needs a little manually tweaking after that, I'm happy to do it. Can we work on this together?

@brynnz22
Copy link
Collaborator

I'm really not motivated to hand-fix Database-img_mg_annotation_objects.yaml. @brynnz22 I think all of the commented sections need t be uncommented, and then it needs the type-injection migration. If it needs a little manually tweaking after that, I'm happy to do it. Can we work on this together?

@turbomam can you point to this file? I'm not sure what you are referring to

@turbomam
Copy link
Member Author

turbomam commented Jan 25, 2024

@brynnz22
Copy link
Collaborator

Thanks @turbomam that sounds good to me.

@mslarae13
Copy link
Collaborator

@turbomam @brynnz22 @aclum what is left to do here?

@turbomam
Copy link
Member Author

turbomam commented Feb 1, 2024

Somebody said that Database-img_mg_annotation_objects.yaml was all commented out in nmdc-schema, so we shouldn't worry about making it fully valid in berkeley-schema-fy24. But it isn't commented out at all in nmdc-schema. If it has unique patterns in it that aren't validated anywhere else, then I would like to fix this file eventually, even if we do leave it in src/data/problem.

@aclum Database-biosamples-infiltrations.yaml has become invalid since the last time I tried to validate this branch.

I still think we should merge now. The build and test work.

@turbomam
Copy link
Member Author

turbomam commented Feb 1, 2024

@aclum @brynnz22 Database-biosamples-infiltrations.yaml is failing the new pattern for relating a Biosample to a Study

poetry run linkml-validate 
  --schema src/schema/nmdc.yaml \
  --target-class Database src/data/problem/valid/Database-biosamples-infiltrations.yaml 
[ERROR] [src/data/problem/valid/Database-biosamples-infiltrations.yaml/0] Additional properties are not allowed ('part_of' was unexpected) in /biosample_set/0
[ERROR] [src/data/problem/valid/Database-biosamples-infiltrations.yaml/0] 'associated_studies' is a required property in /biosample_set/0
[ERROR] [src/data/problem/valid/Database-biosamples-infiltrations.yaml/0] Additional properties are not allowed ('part_of' was unexpected) in /biosample_set/1
[ERROR] [src/data/problem/valid/Database-biosamples-infiltrations.yaml/0] 'associated_studies' is a required property in /biosample_set/1
[ERROR] [src/data/problem/valid/Database-biosamples-infiltrations.yaml/0] Additional properties are not allowed ('part_of' was unexpected) in /biosample_set/2
[ERROR] [src/data/problem/valid/Database-biosamples-infiltrations.yaml/0] 'associated_studies' is a required property in /biosample_set/2

@brynnz22
Copy link
Collaborator

brynnz22 commented Feb 1, 2024

@turbomam I think we just need to change part_of in the Database-biosamples-infiltrations.yaml to associated_studies and this should validate properly.

@turbomam
Copy link
Member Author

turbomam commented Feb 2, 2024

@turbomam I think we just need to change part_of in the Database-biosamples-infiltrations.yaml to associated_studies and this should validate properly.

Thanks @brynnz22 . I was hoping that if I acted clueless then somebody else would do it for me :-)

@turbomam turbomam merged commit c673555 into main Feb 2, 2024
2 checks passed
@turbomam turbomam deleted the berkeley-main-replica branch February 2, 2024 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants