-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement migrator that removes used
slot from WorkflowExecution
(file: migrator_from_X_to_PR31.py
)
#139
Conversation
used
slot from WorkflowExecution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @anastasiyaprymolenna, thanks for making this migrator. I reviewed migrator_from_X_to_PR31.py
only (I'm not fluent in LinkML yet). The structure of the migrator looks good to me and I understand what the methods of the Migrator
class do. I have some suggestions regarding a couple variable names, which I left as review comments. Also, I have suggestions regarding referring to things as "DataGeneration" in comments while the code still refers to the omics_processing_set
collection (I put those suggestions in review comments also).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend combining these functions into one, because this check_instrument_function
does not appear to do anything. According to line 34 check_instrument_name
is being given an omics document, not a workflow_execution
doc and this function is checking to see if the slot used
exists in an omics_document (which it does not - used
exists on a WorkflowExecution
). So the first if
statement does not work for any document being passed into this function, and there is no else
statement. It is then getting an omics doc (data_generation_doc
should be omics_doc
since we are referring to the old collection) and searching for instrument_name
, which could technically return a lot of documents since an intstrument name exists in more than one omics doc. What we want instead:
- Loop through workflow execution docs and retrieve the corresponding omics doc id from the
was_informed_by
slot - use the
get_document_having_value_in_field
adapter to search theomics_processing_set
on theid
field using that value from thewas_informed_by
slot from workflow execution doc. - Check to see if the
omics_doc
["instrument_name"] matches theworkflow_execution
["used"] and if it does remove theused
slot from the workflow execution doc. - If it doesn't, log error.
My recommendation is to do this all in one function. I don't see the need to do this in two functions since you need to iterate through the workflow executions anyway to get the was_informed_by
to search for the corresponding omics doc.
I tried to make individual comments. I hope this is clear! Let me know if you need more clarity!
I added using |
Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>
@eecavanna I made all of your requested changes. I changed those variable names for processed strings so they make more sense, added a doctest to that function, removed the white space. Let me know if you see anything else that needs adjusting. |
The migrator looks good to me. I'm not too familiar with the YAML files, so I only reviewed them in terms of their YAML syntax, which looks good to me. Based on that, I'm comfortable with this branch being merged into the main branch. There is one YAML file that has a bunch of lines commented out. In case you want the lines to be disabled but still be accessible to readers, another option is to delete them and then add a single-line comment with the Git commit hash; something like: |
Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>
I'm not sure why those lines are commented out. I will fix it. |
This PR removes used slot from all
WorfklowExecution
subclasses. This info. should already be on the correspondingDataGeneration
instances in theinstrument_name
slot (soon to be changed toinstrument_used
so will need to come before that migration).The migrator also checks that the value in the used slot on the
WorkflowExecution
classes matches the value on theDataGeneration
(currentlyomics_processing_set
) instances in theinstrument_name
slot.Also included as a part of this PR, is a minor schema change - moving
instrument_used
slot off ofPlannedProcess
parent class and put directly onMaterialProcessing
andDataGenerationclasses
so it is no longer an option to be used onWorkflowExecution
.In testing, the
data_generation_set
inDatabase-neon_Biosample_to_DataObject_NEON.yaml
did not match any known schema. I did not see anything that would cause this, I commented this out for the time being and the tests passed, but I need a second pair of eyes on it.