Implement migrator that removes `used` slot from `WorkflowExecution` (file: `migrator_from_X_to_PR31.py`) #139

anastasiyaprymolenna · 2024-04-23T18:29:30Z

This PR removes used slot from all WorfklowExecution subclasses. This info. should already be on the corresponding DataGeneration instances in the instrument_name slot (soon to be changed to instrument_used so will need to come before that migration).

The migrator also checks that the value in the used slot on the WorkflowExecution classes matches the value on the DataGeneration (currently omics_processing_set) instances in the instrument_name slot.

Also included as a part of this PR, is a minor schema change - moving instrument_used slot off of PlannedProcess parent class and put directly on MaterialProcessing and DataGenerationclasses so it is no longer an option to be used on WorkflowExecution.

In testing, the data_generation_set in Database-neon_Biosample_to_DataObject_NEON.yaml did not match any known schema. I did not see anything that would cause this, I commented this out for the time being and the tests passed, but I need a second pair of eyes on it.

eecavanna

Hey @anastasiyaprymolenna, thanks for making this migrator. I reviewed migrator_from_X_to_PR31.py only (I'm not fluent in LinkML yet). The structure of the migrator looks good to me and I understand what the methods of the Migrator class do. I have some suggestions regarding a couple variable names, which I left as review comments. Also, I have suggestions regarding referring to things as "DataGeneration" in comments while the code still refers to the omics_processing_set collection (I put those suggestions in review comments also).

nmdc_schema/migrators/migrator_from_X_to_PR31.py

brynnz22

I recommend combining these functions into one, because this check_instrument_function does not appear to do anything. According to line 34 check_instrument_name is being given an omics document, not a workflow_execution doc and this function is checking to see if the slot used exists in an omics_document (which it does not - used exists on a WorkflowExecution). So the first if statement does not work for any document being passed into this function, and there is no else statement. It is then getting an omics doc (data_generation_doc should be omics_doc since we are referring to the old collection) and searching for instrument_name, which could technically return a lot of documents since an intstrument name exists in more than one omics doc. What we want instead:

Loop through workflow execution docs and retrieve the corresponding omics doc id from the was_informed_by slot
use the get_document_having_value_in_field adapter to search the omics_processing_set on the id field using that value from the was_informed_by slot from workflow execution doc.
Check to see if the omics_doc["instrument_name"] matches the workflow_execution["used"] and if it does remove the used slot from the workflow execution doc.
If it doesn't, log error.

My recommendation is to do this all in one function. I don't see the need to do this in two functions since you need to iterate through the workflow executions anyway to get the was_informed_by to search for the corresponding omics doc.

I tried to make individual comments. I hope this is clear! Let me know if you need more clarity!

nmdc_schema/migrators/migrator_from_X_to_PR31.py

…_set

brynnz22 · 2024-05-10T17:53:25Z

I added using difflib library's SequenceMatcher to account for 'used' slots that don't exactly match the values of the instrument_name slot in the OmicsProcessing docs. E.g. 7T-FT ICR MS and 7T_FTICR_MS. These should still be considered the same, so this migrator still drops the used slot if they are a match. It sets a match threshold to 0.8. Anything lower than this will throw an error. I've double checked and everything lines up as it should - removing all the used slots from the WorkflowExecution docs.

Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>

"

brynnz22 · 2024-05-13T18:28:28Z

@eecavanna I made all of your requested changes. I changed those variable names for processed strings so they make more sense, added a doctest to that function, removed the white space. Let me know if you see anything else that needs adjusting.

nmdc_schema/migrators/migrator_from_X_to_PR31.py

eecavanna · 2024-05-14T04:01:50Z

The migrator looks good to me. I'm not too familiar with the YAML files, so I only reviewed them in terms of their YAML syntax, which looks good to me. Based on that, I'm comfortable with this branch being merged into the main branch.

There is one YAML file that has a bunch of lines commented out. In case you want the lines to be disabled but still be accessible to readers, another option is to delete them and then add a single-line comment with the Git commit hash; something like: The example of bla bla was removed in commit #1234abcd. People will be able to search the Git history for that commit and see the deleted lines.

Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>

brynnz22 · 2024-05-14T15:56:24Z

I'm not sure why those lines are commented out. I will fix it.

src/schema/basic_classes.yaml

create migrator and schema changes to remove used slot

3bf56ca

anastasiyaprymolenna requested review from turbomam, brynnz22 and eecavanna April 23, 2024 18:29

anastasiyaprymolenna changed the title ~~Create migrator and schema changes to remove used slot~~ Migrator to remove used slot from WorkflowExecution Apr 23, 2024

Prymolenna, Anastasiya V added 3 commits April 23, 2024 11:36

revert core.yaml and basic_classes.yaml

917bcf9

recommit moving instrument_used slot

b5faa03

Removed a modified nmdc.yaml from pull request, no change

2e1b8b5

eecavanna reviewed Apr 23, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

eecavanna reviewed Apr 23, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

eecavanna reviewed Apr 23, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

Prymolenna, Anastasiya V added 4 commits April 25, 2024 10:31

Merge branch 'main' into migrate-PR31

c084845

add variable name changes

4a3c40f

update doc string

d18b335

add testing sets

399ed81

brynnz22 requested changes Apr 30, 2024

View reviewed changes

brynnz22 reviewed Apr 30, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

brynnz22 reviewed Apr 30, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

brynnz22 reviewed Apr 30, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

brynnz22 reviewed Apr 30, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

Prymolenna, Anastasiya V and others added 7 commits May 6, 2024 08:45

stash changes to remote

987d0e4

most recent updates

b9afcc7

update doc strings

2c6236e

add separate function to add instrument_name slot to omics_processing…

1af3e6e

…_set

update doc string

862660c

passing the batton

55c8a7c

finish up migrator to use difflib SequenceMatcher

fc133cd

remove doc string

3ea8d49

brynnz22 and others added 9 commits May 13, 2024 09:16

add backticks

088fa36

Update nmdc_schema/migrators/migrator_from_X_to_PR31.py

3fa1be4

Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>

update variable names

558fd83

Remove white space

2d0e923

change elif to else

0e3d0cc

add doc test

fdbe33f

close paranthese;

55ae53e

"

remove quotes from doctest

31ba535

add quotes

c05961a

eecavanna reviewed May 14, 2024

View reviewed changes

nmdc_schema/migrators/migrator_from_X_to_PR31.py Outdated Show resolved Hide resolved

eecavanna self-requested a review May 14, 2024 04:01

eecavanna approved these changes May 14, 2024

View reviewed changes

Update nmdc_schema/migrators/migrator_from_X_to_PR31.py

81b0e96

Co-authored-by: eecavanna <134325062+eecavanna@users.noreply.github.com>

brynnz22 added 6 commits May 14, 2024 08:58

umcomment lines

be0f002

try removing instrument

6811f0e

make instrument_used inlined:false

51d6396

remove inlined: false

6adb530

add regex pattern for instrument_used

26640cb

move instrument_used out of aliases

0661278

turbomam reviewed May 14, 2024

View reviewed changes

src/schema/basic_classes.yaml Outdated Show resolved Hide resolved

remove instrument_used regex pattern

fa31add

brynnz22 requested a review from turbomam May 14, 2024 17:07

turbomam approved these changes May 14, 2024

View reviewed changes

brynnz22 self-requested a review May 14, 2024 18:49

brynnz22 approved these changes May 14, 2024

View reviewed changes

brynnz22 merged commit b38da6a into main May 14, 2024
2 checks passed

brynnz22 deleted the migrate-PR31 branch May 14, 2024 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement migrator that removes `used` slot from `WorkflowExecution` (file: `migrator_from_X_to_PR31.py`) #139

Implement migrator that removes `used` slot from `WorkflowExecution` (file: `migrator_from_X_to_PR31.py`) #139

anastasiyaprymolenna commented Apr 23, 2024

eecavanna left a comment •

edited

Loading

brynnz22 left a comment •

edited

Loading

brynnz22 commented May 10, 2024

brynnz22 commented May 13, 2024

eecavanna commented May 14, 2024

brynnz22 commented May 14, 2024

Implement migrator that removes used slot from WorkflowExecution (file: migrator_from_X_to_PR31.py) #139

Implement migrator that removes used slot from WorkflowExecution (file: migrator_from_X_to_PR31.py) #139

Conversation

anastasiyaprymolenna commented Apr 23, 2024

eecavanna left a comment • edited Loading

Choose a reason for hiding this comment

brynnz22 left a comment • edited Loading

Choose a reason for hiding this comment

brynnz22 commented May 10, 2024

brynnz22 commented May 13, 2024

eecavanna commented May 14, 2024

brynnz22 commented May 14, 2024

Implement migrator that removes `used` slot from `WorkflowExecution` (file: `migrator_from_X_to_PR31.py`) #139

Implement migrator that removes `used` slot from `WorkflowExecution` (file: `migrator_from_X_to_PR31.py`) #139

eecavanna left a comment •

edited

Loading

brynnz22 left a comment •

edited

Loading