Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add StorageProcess #126

Merged
merged 15 commits into from
Apr 24, 2024
Merged

Add StorageProcess #126

merged 15 commits into from
Apr 24, 2024

Conversation

brynnz22
Copy link
Collaborator

@brynnz22 brynnz22 commented Apr 9, 2024

Addresses issue: microbiomedata#1908

@anastasiyaprymolenna anastasiyaprymolenna marked this pull request as ready for review April 10, 2024 15:55
Copy link
Collaborator

@anastasiyaprymolenna anastasiyaprymolenna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StorageProcess class is in line with the rest of the MaterialProcessing classes, and will fulfill the need of capturing the process that samples that samples go through when they are stored in a solution/container before analysis. The data example shows that.

A follow up issue to this PR should include an expansion of the ContainerCategoryEnum.

@anastasiyaprymolenna
Copy link
Collaborator

move slots contained_in , temperature , duration to MaterialProcessing instead, so we don't create a separate class.

@turbomam
Copy link
Member

turbomam commented Apr 11, 2024

Thanks for working on this. I like the thought process that went into the PR. I have several comments, but I think there's a good chance we can close thsi PR with just a few minutes of discussion.

@turbomam
Copy link
Member

Does storage satisfy the spirit of the textual description?

A process that takes one or more samples as inputs and generates one or more samples as outputs.

How about the definition of the broad mapping to OBI:0000094, 'material processing'

A planned process which results in physical changes in a specified input material

@turbomam
Copy link
Member

Maybe storage should be instantiated as a PlannedProcess and not a MaterialProcessing

@turbomam
Copy link
Member

In any case, the title of the PR should be changed

@turbomam
Copy link
Member

Well, my suggestion to instantiate PlannedProcess wasn't a solution, since PlannedProcess is abstract. I'm inclined to say that MaterialProcessing should be abstract, too. But that doesn't help close this PR.

I would like to move the ownership of duration and temperature up to PlannedProcess. (That doesn't help close this PR either)

@turbomam
Copy link
Member

turbomam commented Apr 11, 2024

I'm still not really happy with the semantics of MaterialProcess contained_in something. It seems like we're trying to say that something was contained_in something else during the course of the MaterialProcess, or at the end of the MaterialProcess

@turbomam
Copy link
Member

I'm looking for instantiations of MaterialProcess or usages of contained_in in MongoDB now.

@turbomam
Copy link
Member

I have a strong urge to say that we should have a Storage class, and that it should be a subclass of PlannedProcess.

What are the use cases for this class in terms of querying or informing workflows?

@anastasiyaprymolenna
Copy link
Collaborator

Replying to the most recent comment @turbomam .

The primary use case for capturing information about sample storage comes from the derivatization protocol directly for GC-MS analysis because plastic or vinyl type containers can generate ghost peaks in the chromatograms due to solvent leaching.

Another use cause that came up in discussion was capturing time and temp because samples can degrade during storage depending on how long they are stored before analysis (though an argument could be made that that data would be low quality and we wouldn't accept it into the database anyway).

@anastasiyaprymolenna
Copy link
Collaborator

anastasiyaprymolenna commented Apr 11, 2024

I have a strong urge to say that we should have a Storage class, and that it should be a subclass of PlannedProcess.

What are the use cases for this class in terms of querying or informing workflows?

Did you happen to check out the initial commit on this PR that modeled a "StorageProcess"?
ba06bca

@anastasiyaprymolenna
Copy link
Collaborator

@mslarae13 Based on the discussion, I think it is best to revert the commit that removed the StorageProcess class.

@mslarae13
Copy link
Collaborator

"spirit"? I'm not sure what you mean here @turbomam. I think yes? You have 1 or more samples, and when you store them you're noting that it's a stored sample. Is it the generates? Cuz maybe aside from being 'later in time' there may be no changes?

Does storage satisfy the spirit of the textual description?

A process that takes one or more samples as inputs and generates one or more samples as outputs.

I don't like "results in physical changes" there's not always a physical change. You can air dry your soil and store it for years. The re-use it.

How about the definition of the broad mapping to OBI:0000094, 'material processing'

A planned process which results in physical changes in a specified input material

@mslarae13
Copy link
Collaborator

mslarae13 commented Apr 17, 2024

Maybe storage should be instantiated as a PlannedProcess and not a MaterialProcessing

Well, my suggestion to instantiate PlannedProcess wasn't a solution, since PlannedProcess is abstract. I'm inclined to say that MaterialProcessing should be abstract, too. But that doesn't help close this PR.
I would like to move the ownership of duration and temperature up to PlannedProcess. (That doesn't help close this PR either)

Why do you think that? Can you provide an example of when storage wouldn't be material and relevant for NMDC?

@mslarae13
Copy link
Collaborator

In any case, the title of the PR should be changed

To? "Modify Material Process to account for storage"?

@mslarae13
Copy link
Collaborator

I'm still not really happy with the semantics of MaterialProcess contained_in something. It seems like we're trying to say that something was contained_in something else during the course of the MaterialProcess, or at the end of the MaterialProcess

So, maybe we should put these on processed sample? but we still need something to indicate HOW we go from sample1 to sample2, which requires a process. I don't think NMDC needs the level of granularity that 'stored in' provides as a process. The use of this slot will be to identify "the data from this sample looks weird, oh because they stored it in a plastic that leeches" that's the only use I can find for this field.
So with how NMDC needs to use this metadata, what is the right choice?

@mslarae13 mslarae13 closed this Apr 17, 2024
@mslarae13 mslarae13 reopened this Apr 17, 2024
@mslarae13
Copy link
Collaborator

I still disagree with a whole class for storage, because of how NMDC will use this data... It seems like added complication. But I will defer to @cmungall , @sierra-moxon , and @turbomam .

Additional class, or provide slots on existing class.

@anastasiyaprymolenna
Copy link
Collaborator

A problem from a data retrieval standpoint in moving the slots to MaterialProcesing and not creating a separate class is instantiating an entry for a MaterialProcess to indicate storage is not explicit, all we have to go off of is a material process that used a container for a certain amount of time. Is it good data handling to store data that is specifically indicating storage as an entry of a non-distinct class?

@anastasiyaprymolenna
Copy link
Collaborator

anastasiyaprymolenna commented Apr 24, 2024

To do based on metadata meeting

  • provide a slot usage for substances_used
  • make this class a child of PlannedProcess instead of MaterialProcessing
  • use start_date instead of duration

@anastasiyaprymolenna anastasiyaprymolenna merged commit d76afa0 into main Apr 24, 2024
2 checks passed
@eecavanna eecavanna deleted the storage_process_class branch June 17, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants