Skip to content
This repository has been archived by the owner on Jul 12, 2024. It is now read-only.

Consider if "DNA Collection Site" is redundant and remove #156

Closed
pvangay opened this issue Feb 9, 2022 · 28 comments
Closed

Consider if "DNA Collection Site" is redundant and remove #156

pvangay opened this issue Feb 9, 2022 · 28 comments
Assignees
Labels
backlog documentation Improvements or additions to documentation enhancement New feature or request

Comments

@pvangay
Copy link

pvangay commented Feb 9, 2022

Based on the JGI template, the help states: "Provide information on the specific tissue type or growth conditions."

This term feels redundant and I think this can be derived from other terms we already collect (we should reduce the repetitive information that the user has to enter). For example, for soil, we could fill this in with values from "growth facility" and "storage conditions".

If @dehays @mslarae13 agree, we could remove this from the DH interface.

@pvangay pvangay added documentation Improvements or additions to documentation enhancement New feature or request labels Feb 9, 2022
@ssarrafan
Copy link

@pvangay is this still an issue that needs to be resolved? I'm removing David from this but if should be assigned to someone else besides Montana let me know.

@pvangay
Copy link
Author

pvangay commented Jul 6, 2022

Feels like David is still probably the best person to comment on this since he's working on the mapping for JGI terms?

@mslarae13
Copy link

Looking at the example for "DNA collection site" they have "untreated pond water" this seems really similar to "sample isolated from" in the context of environmental samples.
Because it's growth conditions, I think if we were doing cultures it would be very different. But overall, I agree & unless @dehays sees an issue with calling them the same, in relation to NMDC environmental samples, we can probable map it to a different term.

@ssarrafan
Copy link

Adding to sprint based on subport squad task list. FYI @mslarae13

@mslarae13
Copy link

mslarae13 commented Dec 22, 2022

@turbomam curious on your thoughts here as we're adding a "Site".
Site in the context of the JGI template means "pond water" or "peat bog" a general way of describing the site.
We could probably remove this and use whatever someone puts into "environmental medium"

I honestly don't know. I understand Pajau's point that it's repetitive, but it's a GOLD/JGI slot. SO I hesitate to make a change and risk no adhering to their requirements.

@emileyfadrosh & @aclum also interested in your thoughts on this. Should we just map it to another column & remove it? Or keep it & deal with the redundancy.

@mslarae13
Copy link

@ssarrafan overdue, please add to January 2023 sprint

@aclum
Copy link

aclum commented Jan 3, 2023

@emileyfadrosh wants to keep all JGI terms for now.

@turbomam
Copy link
Member

turbomam commented Jan 4, 2023

I think part of our responsibility is to normalize 3rd party terms into a smaller number of NMDC terms and explicitly state which 3rd party terms ours map too.

@emileyfadrosh do you literally mean that all terms used by our partners should be kept as-is, even if they have the same semantics as some other 3rd party term, or an existing NMDC term? We don't have emsl_depth and jgi_depth terms.

As opposed to committing to gathering all of the datatypes the partner needs, even if their label doesn't appear anywhere on the data collection form (DH/submission portal)

@ssarrafan
Copy link

@turbomam @emileyfadrosh @mslarae13 @aclum can we consider this done per Emiley's comment from Alicia?

@aclum
Copy link

aclum commented Jan 17, 2023 via email

@ssarrafan
Copy link

@emileyfadrosh this needs to be discussed with you. I'm adding this to the agenda for the sync on Feb 8th when you're back.
FYI @aclum
Moving to the next sprint

@mslarae13
Copy link

Marking this as "in review" as it's pending a larger discussion

@mslarae13
Copy link

@ssarrafan we never got to discuss this at sync. I'm meeting with Emiley on 02/13 & I'll bring it up there.

@ssarrafan
Copy link

@emileyfadrosh this issue needs feedback from you and was determined to be a blocker at the squad leads meeting today. Can you please weigh in on this when you're back.

@ssarrafan
Copy link

Moving this to the current sprint for review from @emileyfadrosh

@mslarae13
Copy link

Emiley will check with Tjiana and Alicia if it's required for the sample types we have. If not, then non-issue remove. If it is required, we need to leave it as is for JGI requirments

@emileyfadrosh
Copy link

This term should be dropped as it is optional for metagenomes since there is no asterisk. The help link confirms this, the field is required for RNA or methylation but not environmental samples.

Thanks @aclum.

@turbomam @mslarae13 @ssarrafan -- can this be easily removed from the schema? Thanks.

@mslarae13
Copy link

mslarae13 commented Feb 28, 2023

Thanks!! That help link is "Forbidden" for me, so I can't see it ... but anyway

@turbomam ... let's remove "dna_collect_site" & "rna_collect_site" from the schema. I believe these are in the jgi.yaml file

Based on Alicia's comment below, dna_organisms & rna_organisms can also be removed

"dna_organisms" ..."Known / Suspected Organisms" in the JGI template also does not have an * ... does that mean it can be removed @aclum ?

@aclum
Copy link

aclum commented Feb 28, 2023

@mslarae13 "Known / Suspected Organisms" is not a required field for metagenomes for JGI

@mslarae13 mslarae13 assigned turbomam and unassigned mslarae13 Mar 1, 2023
@mslarae13
Copy link

Will continue into next sprint meeting with Mark and Patrick to hopefully get a plan to wrap this up (03/08)

@ssarrafan
Copy link

@turbomam @pkalita-lbl any update on this? Still active? Move to post-GSP? Move to next sprint?

@pkalita-lbl
Copy link
Collaborator

Sorry I don't know. It hasn't been on my plate.

@ssarrafan
Copy link

@mslarae13 can you determine if this is still high priority or if it can wait till post GSP? I'm removing from sprint and adding the backlog label for now.

@turbomam
Copy link
Member

This is easy. I'll do it now.

@turbomam
Copy link
Member

turbomam commented Mar 28, 2023

slots currently in the schema with "organism" in the name/title

  • dna_organisms/DNA expected organisms
  • rna_organisms:/RNA expected organisms
  • organism_count/organism count

slots with 'site' in the name/title

  • dna_collect_site
  • rna_collect_site

I am going to de-associate the following slots from all submission schema classes. That means they won't appear in the DH interfaces.

  • dna_organisms/DNA expected organisms
  • rna_organisms:/RNA expected organisms
  • dna_collect_site
  • rna_collect_site

I will leave them defined in the schema.

@turbomam
Copy link
Member

turbomam commented Mar 28, 2023

I apologize for dragging my feet so long with this issues and others like it.

Here are some things we can all do to make it easier to act on issues requiring changes to the nmdc-schema and the submission schema, for as long at that remains a separate resource.

  1. include very focused tasks in the first post in the issue. If it requires some discussion, it's OK to edit the first post and add the tasks in later.
  2. the tasks should refer to the slots by their slot names or tiles, exactly as they appear in the schema. If there's a need to refer to them by some other terms, then we should add those terms to the schema. In that case, please provide a link to a document where I can see the desired alias in situ. Once aliases are added, please refer to the slots by names, titles or aliases, exactly as they appear in the schema.
  3. please provide valid and invalid data to illustrate the change you want

I am 100% committed to helping us succeed with those policies and will help in any way I can.

@ssarrafan
Copy link

I apologize for dragging my feet so long with this issues and others like it.

Here are some things we can all do to make it easier to act on issues requiring changes to the nmdc-schema and the submission schema, for as long at that remains a separate resource.

  1. include very focused tasks in the first post in the issue. If it requires some discussion, it's OK to edit the first post and add the tasks in later.
  2. the tasks should refer to the slots by their slot names or tiles, exactly as they appear in the schema. If there's a need to refer to them by some other terms, then we should add those terms to the schema. In that case, please provide a link to a document where I can see the desired alias in situ. Once aliases are added, please refer to the slots by names, titles or aliases, exactly as they appear in the schema.
  3. please provide valid and invalid data to illustrate the change you want

I am 100% committed to helping us succeed with those policies and will help in any way I can.

@turbomam Would it make sense to create a template for schema related feature requests? Also, as requested last week I'm adding this to the Wed sync this week.

@turbomam
Copy link
Member

turbomam commented Apr 4, 2023

@turbomam Would it make sense to create a template for schema related feature requests? Also, as requested last week I'm adding this to the Wed sync this week.

Will add to submission schema repo and link here

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
backlog documentation Improvements or additions to documentation enhancement New feature or request
Projects
Status: ✅ SubPort 1 - Done
Development

No branches or pull requests

8 participants