Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement migrator that removes used slot from WorkflowExecution (file: migrator_from_X_to_PR31.py) #139

Merged
merged 33 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
3bf56ca
create migrator and schema changes to remove used slot
Apr 23, 2024
917bcf9
revert core.yaml and basic_classes.yaml
Apr 23, 2024
b5faa03
recommit moving instrument_used slot
Apr 23, 2024
2e1b8b5
Removed a modified nmdc.yaml from pull request, no change
Apr 23, 2024
c084845
Merge branch 'main' into migrate-PR31
Apr 25, 2024
4a3c40f
add variable name changes
Apr 29, 2024
d18b335
update doc string
Apr 29, 2024
399ed81
add testing sets
Apr 29, 2024
987d0e4
stash changes to remote
May 6, 2024
b9afcc7
most recent updates
May 7, 2024
2c6236e
update doc strings
May 7, 2024
1af3e6e
add separate function to add instrument_name slot to omics_processing…
May 7, 2024
862660c
update doc string
May 7, 2024
55c8a7c
passing the batton
May 9, 2024
fc133cd
finish up migrator to use difflib SequenceMatcher
brynnz22 May 10, 2024
3ea8d49
remove doc string
brynnz22 May 10, 2024
088fa36
add backticks
brynnz22 May 13, 2024
3fa1be4
Update nmdc_schema/migrators/migrator_from_X_to_PR31.py
brynnz22 May 13, 2024
558fd83
update variable names
brynnz22 May 13, 2024
2d0e923
Remove white space
brynnz22 May 13, 2024
0e3d0cc
change elif to else
brynnz22 May 13, 2024
fdbe33f
add doc test
brynnz22 May 13, 2024
55ae53e
close paranthese;
brynnz22 May 13, 2024
31ba535
remove quotes from doctest
brynnz22 May 13, 2024
c05961a
add quotes
brynnz22 May 13, 2024
81b0e96
Update nmdc_schema/migrators/migrator_from_X_to_PR31.py
brynnz22 May 14, 2024
be0f002
umcomment lines
brynnz22 May 14, 2024
6811f0e
try removing instrument
brynnz22 May 14, 2024
51d6396
make instrument_used inlined:false
brynnz22 May 14, 2024
6adb530
remove inlined: false
brynnz22 May 14, 2024
26640cb
add regex pattern for instrument_used
brynnz22 May 14, 2024
0661278
move instrument_used out of aliases
brynnz22 May 14, 2024
fa31add
remove instrument_used regex pattern
brynnz22 May 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions nmdc_schema/migrators/migrator_from_X_to_PR31.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
from nmdc_schema.migrators.migrator_base import MigratorBase

class Migrator(MigratorBase):
r"""
Migrates data from X to PR31, removes used slot from WorkflowExecution subclasses and checks that the
value in the used slot on the WorkflowExecution classes matches the value on the DataGeneration
instances in the instrument_name slot.
"""

_from_version = "X"
_to_version = "PR31"

def upgrade(self):
r"""Migrates the database from conforming to the original schema, to conforming to the new schema."""

workflow_execution_collection_names = [
"metaproteomics_analysis_set",
"nom_analysis_set",
"metabolomics_analysis_set",
"read_based_taxonomy_analysis_set",
"read_qc_analysis_set",
"metagenome_sequencing_set",
"mags_set",
"metatranscriptome_analysis_set",
"metagenome_annotation_set",
"metagenome_assembly_set",
"mags_activity_set",
"metabolomics_analysis_activity_set",
"metagenome_annotation_activity_set",
"metagenome_sequencing_activity_set",
"metaproteomics_analysis_activity_set"
]

self.adapter.process_each_document("omics_processing_set", [self.check_instrument_name])

for collection_name in workflow_execution_collection_names:
self.adapter.process_each_document(
collection_name=collection_name,
pipeline=[self.remove_used_slot],
)

def remove_used_slot(self, doc: dict) -> dict:
r"""
Removes the used slot from WorkflowExecution subclasses.

>>> m = Migrator()
>>> m.remove_used_slot({'id': 123, 'used': 'abc'})
{'id': 123}
"""

if "used" in doc:
doc.pop("used")

return doc

def check_instrument_name(self, workflow_execution: dict) -> dict:
anastasiyaprymolenna marked this conversation as resolved.
Show resolved Hide resolved
r"""
Checks that the value in the used slot on the WorkflowExecution classes matches the value
in the `instrument_name` field of a related `OmicsProcessing` (soon to be renamed to `DataGeneration`) instance.
If it matches, then remove used from the WorkflowExecution instance.

>>> m = Migrator()
>>> m.check_instrument_name({'id': 123, 'used': 'abc'})
{'id': 123, 'used': 'abc'}
"""

if "used" in workflow_execution:
anastasiyaprymolenna marked this conversation as resolved.
Show resolved Hide resolved

try:
data_generation_doc = self.adapter.get_document_having_value_in_field(
collection_name="omics_processing_set", field_name="instrument_name", value=workflow_execution["used"]
anastasiyaprymolenna marked this conversation as resolved.
Show resolved Hide resolved
)

if workflow_execution["used"] == data_generation_doc["instrument_name"]:
workflow_execution.pop("used")
anastasiyaprymolenna marked this conversation as resolved.
Show resolved Hide resolved

except:
self.logger.error(f"WorkflowExecution {workflow_execution['id']} used: {workflow_execution['used']} does not match OmicsProcessing instrument_name.")

return workflow_execution

34 changes: 17 additions & 17 deletions src/data/valid/Database-neon_Biosample_to_DataObject_NEON.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -116,23 +116,23 @@ data_object_set:
data_object_type: Filtered Sequencing Reads
url: https://data.microbiomedata.org/data/1472_51293/qa/1472_51293.filtered.fastq.gz
type: nmdc:DataObject
data_generation_set:
- id: nmdc:dgns-11-s9xj2r24
analyte_category: metagenome
name: Test NEON data
has_input:
- nmdc:procsm-99-xyz3
has_output:
- nmdc:dobj-12-jdhk9537
- nmdc:dobj-12-yx0tfp52
instrument_used:
- nmdc:inst-12-yx0tfp52
part_of:
- nmdc:dgns-11-34xj1150
processing_institution: Battelle
type: nmdc:NucleotideSequencing
associated_studies:
- nmdc:sty-11-34xj1150
# data_generation_set:
# - id: nmdc:dgns-11-s9xj2r24
# analyte_category: metagenome
# name: Test NEON data
# has_input:
# - nmdc:procsm-99-xyz3
# has_output:
# - nmdc:dobj-12-jdhk9537
# - nmdc:dobj-12-yx0tfp52
# instrument_used:
# - nmdc:inst-12-yx0tfp52
# part_of:
# - nmdc:dgns-11-34xj1150
# processing_institution: Battelle
# type: nmdc:NucleotideSequencing
# associated_studies:
# - nmdc:sty-11-34xj1150
workflow_chain_set:
- id: nmdc:wfch-11-ab
analyte_category: metagenome
Expand Down
2 changes: 1 addition & 1 deletion src/schema/basic_classes.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,6 @@ classes:
slots:
- has_input
- has_output
- instrument_used
- processing_institution
- protocol_link
- start_date
Expand Down Expand Up @@ -292,6 +291,7 @@ classes:
- omics assay
- sequencing project
- experiment
- instrument_used
is_a: PlannedProcess
abstract: true
in_subset:
Expand Down
3 changes: 2 additions & 1 deletion src/schema/core.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1183,7 +1183,8 @@ classes:
description:
A process that takes one or more samples as inputs and generates
one or more samples as outputs.

slots:
- instrument_used
notes:
- This class is a replacement for BiosampleProcessing.
slot_usage:
Expand Down
Loading