Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine if and how link_class_info will be used in the submission schema #678

Open
1 of 3 tasks
Tracked by #382
mslarae13 opened this issue Jan 27, 2023 · 18 comments
Open
1 of 3 tasks
Tracked by #382
Assignees
Labels
backlog Issue not assigned to a sprint or not completed during a sprint. Needs to be reprioritized. nmdc-schema-mixs-submission

Comments

@mslarae13
Copy link
Contributor

mslarae13 commented Jan 27, 2023

slot link_class_info has no guidance or examples.

criteria for completion

  • identify how the slot has been used (if at all) in the past, @turbomam provide a query of NCBI?
  • Update nmdc.yaml (or mixs.yaml?) to add more context and examples
  • update NMDC submission portal to match schema slot usage (Mike)
@mslarae13
Copy link
Contributor Author

? Leave it out? Not really used... non-actionable DB field

@turbomam
Copy link
Member

turbomam commented May 21, 2023

link_class_info values in the BBOP SQLite version of NCBI biosample_set.xml as of 2023-05-18

value count link notes
not collected 432
not applicable 415
NA 114
http://geointa.inta.gov.ar/visor/?p=model_suelos 108 This site can’t be reached;Check if there is a typo in geointa.inta.gov.ar.;DNS_PROBE_FINISHED_NXDOMAIN
http://doi.org/10.1002/jpln.200521814 71 Chernozem—Soil of the Year 2005
missing 65
http://www.fao.org/nr/land/sols/soil/wrb-soil-maps/reference-groups 64 Page not found
https://www.fao.org/3/i3794en/I3794en.pdf 60 World reference base for soil resources 2014
Chromic Haploxernt 48
na 19
doi:10.1016/j.fcr.2011.09.019 6 https://www.sciencedirect.com/science/article/abs/pii/S0378429011003340, "Evidence of improved water uptake from subsoil by spring wheat following lucerne in a temperate humid climate"
http://esdac.jrc.ec.europa.eu/resource-type/european-soil-database-maps 1 live

@turbomam
Copy link
Member

turbomam commented May 21, 2023

Specification from https://github.com/GenomicsStandardsConsortium/mixs/blob/main/mixs/excel/mixs_v6.xlsx

Environmental package agriculture soil
Structured comment name link_class_info link_class_info
Package item link to classification information link to classification information
Definition Link to digitized soil maps or other soil classification information Link to digitized soil maps or other soil classification information
Expected value PMID,DOI or url PMID,DOI or url
Value syntax {PMID|DOI|URL} {PMID}|{DOI}|{URL}
Example    
Requirement X X
Preferred unit    
Occurrence 1 1
MIXS ID MIXS:0000329 MIXS:0000329

@turbomam
Copy link
Member

It seems like link_class_info might function like one of the many x_meth fields, which state how the value for slot x was determined. But I can't tell what slot link_class_info might provide context for.

As shown above, the number of biosamples that are annotated with an informative link_class_info is in the low hundreds, out of 35 million. (I am defining informative as something other than a synonym for "not available" or a live web link.)

@mslarae13 I propose that we omit this field from NMDC submission templates. A longer term action could be filing an issue at https://github.com/GenomicsStandardsConsortium/mixs/issues asking for documentation or examples of this term's use. I hope that that request wouldn't be construed as invitation for open-ended discussion about the subject. I suppose it could also be useful to be put in touch with the person who requested that term in the first place.

@turbomam
Copy link
Member

link_class_info has not been used for any biosamples in the NMDC production MongoDB yet

db.getCollection("biosample_set").find( { link_class_info : { $exists : true } } );

0

@turbomam turbomam changed the title link to classification information Determine if and how link_class_info will be used in the submission schema May 21, 2023
@ssarrafan
Copy link
Collaborator

Adding to current sprint per Mark. Need feedback from @mslarae13

@mslarae13
Copy link
Contributor Author

mslarae13 commented May 24, 2023

I propose that we omit this field from NMDC submission templates. A longer term action could be filing an issue at https://github.com/GenomicsStandardsConsortium/mixs/issues asking for documentation or examples of this term's use.

-- I agree with this @turbomam

@ssarrafan
Copy link
Collaborator

@turbomam can this be closed now that Montana has provided feedback?

@mslarae13
Copy link
Contributor Author

@ssarrafan I don't think we can close it yet. Steps for resolution are remove from NMDC submission portal. & submit issue to GSC

@ssarrafan
Copy link
Collaborator

@ssarrafan I don't think we can close it yet. Steps for resolution are remove from NMDC submission portal. & submit issue to GSC

Ah ok. I thought this ticket was just to "determine if and how..."
I'll move to the next sprint

@mslarae13
Copy link
Contributor Author

mslarae13 commented Jun 6, 2023

Chris Hunter suggest deprecating this term in GSC. Montana will put an issue into GSC.

@turbomam
Should we remove from NMDC now, or wait for GSC update?

@mslarae13
Copy link
Contributor Author

@turbomam
Copy link
Member

turbomam commented Jun 7, 2023

I'll remove it in 7.6.1

@ssarrafan ssarrafan added the backlog Issue not assigned to a sprint or not completed during a sprint. Needs to be reprioritized. label Jun 20, 2023
@mslarae13
Copy link
Contributor Author

@turbomam did this get removed?

@mslarae13
Copy link
Contributor Author

mslarae13 commented Apr 29, 2024

Currently manually managing. We do not ingest all MIxS slots, see https://github.com/microbiomedata/nmdc-schema/blob/main/assets/import_mixs_slots_regardless.tsv

MIxS that we pull into NMDC schema is 6.0 ..Should pull in 6.2, but some slot changes have been made (names, presence/absence)

  • Will require migration
  • How/ when with respect to berk? Does it affect postgres?
  • There are some no impact changes. samp_collec_meth & device

@ssarrafan
Copy link
Collaborator

I'm moving this to the current sprint based on @mslarae13 last comment on 4/29.

@ssarrafan
Copy link
Collaborator

@turbomam @mslarae13 should this just be in the backlog? I can remove it from the sprint.

@ssarrafan
Copy link
Collaborator

Sprint over, removing from sprint.

@mslarae13 mslarae13 assigned mslarae13 and unassigned turbomam Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Issue not assigned to a sprint or not completed during a sprint. Needs to be reprioritized. nmdc-schema-mixs-submission
Projects
None yet
Development

No branches or pull requests

3 participants