Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit storage and validation of temporal data #384

Open
Tracked by #587
turbomam opened this issue Aug 4, 2022 · 9 comments
Open
Tracked by #587

revisit storage and validation of temporal data #384

turbomam opened this issue Aug 4, 2022 · 9 comments

Comments

@turbomam
Copy link
Member

turbomam commented Aug 4, 2022

This is a component of microbiomedata/sample-annotator#90

Background

nmdc-schema has a TimestampValue class, based on the AttributeValue class.

In fact the only real data slot for TimestampValue is the very generic, inherited has_raw_value, whose range is string.

TimestampValue's description does say

A value that is a timestamp. The range should be ISO-8601

But that's not enforced anywhere in the schema

Objective

In my understanding, NMDC submitters should be able to enter partial datetimes for things like collection_date. Ie 2022-08 should be accepted as meaning that the sample was collected some time in August of 2022. The day-of-month is not known, and should not be fudged as 2022-08-01

Current solution

So we have configured our DH templates to validate values of 2022-08 from slots like collection_date with heavyweight regular expressions like ^[12]\d{3}(?:(?:-(?:0[1-9]|1[0-2]))(?:-(?:0[1-9]|[12]\d|3[01]))?)?$

And providing examples like "2021-04-15; 2021-04; 2021"

(BTW: GenomicsStandardsConsortium/mixs#446)

You can check those examples at regexr

(BTW see #385)

Proposed solution

It should be possible to at least validate these has_raw_values of TimestampValues against a proper datetime parser. Most Python datetime parsers will silent add 1s to the missing datetime parts. We don't have to use that parsed value, but it should at least parse. I think that will rule out dates that match the regular expression, but don't exist, like 2022-02-31

iso8601 seems to require pretty strict templates, but I think some of these other ones don't

I'll consult with LinkML colleagues and most likely try arrow and pendulum. Will post conclusions here.

@mslarae13
Copy link
Contributor

@turbomam what is the to do on this?

@ssarrafan
Copy link
Collaborator

@turbomam moving this to Sept but please let me know if you're not actively working on it for the next 2 weeks

@ssarrafan ssarrafan added this to the Sprint 18 milestone Sep 1, 2022
@ssarrafan
Copy link
Collaborator

Checked in with @turbomam and moving this out of the sprint and adding the backlog label.

@ssarrafan ssarrafan removed this from the Sprint 18 milestone Sep 19, 2022
@mslarae13
Copy link
Contributor

mslarae13 commented Dec 9, 2022

@ssarrafan This will start on the sprint from Dec26-6th
& has a due date of Jan20th for the submission portal squad. Can you plan to add this to those Sprint boards?

@ssarrafan
Copy link
Collaborator

I don't think the next sprint will start till January since LBL is closed for the holidays Dec 23-Jan 2. I can add it to that sprint. Are you planning to work the week between December 26 and January @mslarae13?

@ssarrafan ssarrafan removed the backlog Issue not assigned to a sprint or not completed during a sprint. Needs to be reprioritized. label Dec 9, 2022
@mslarae13
Copy link
Contributor

I don't think the next sprint will start till January since LBL is closed for the holidays Dec 23-Jan 2. I can add it to that sprint. Are you planning to work the week between December 26 and January @mslarae13?

I am working that week! PNNL doesn't close :(
So it'll be a sprint of 1 ;) but you can just put it in the sprint starting after the 2nd & it'll (hopefully) be done fast :)

@mslarae13
Copy link
Contributor

@turbomam I think working on this today would be helpful. In relation to the updates I've made to the soil package relevant slots. Does the validation still hold, do we need to add additional validation rules anywhere?

@ssarrafan
Copy link
Collaborator

Due date is Jan 20th so moving to next sprint
@mslarae13 @turbomam

@mslarae13 mslarae13 mentioned this issue Jan 18, 2023
99 tasks
@ssarrafan
Copy link
Collaborator

Looks like this is in the backlog now so I'll remove from the sprint. @mslarae13 if you plan to work on this next sprint let me know. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants