Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cal2dec not robust to leap years ! #35

Open
gmaze opened this issue Jul 8, 2020 · 4 comments
Open

cal2dec not robust to leap years ! #35

gmaze opened this issue Jul 8, 2020 · 4 comments
Labels
priority-low Low priority for next release refactoring improve the design, structure, and/or implementation

Comments

@gmaze
Copy link
Member

gmaze commented Jul 8, 2020

The cal2dec function reads:

def cal2dec(pa_month, pa_day, pa_hour=0, pa_minute=0):
    """ Converts a calendar date (month, day, hour, minute) to a decimal date (float)

        Parameters
        ----------
        pa_month: Month in the year (where 0 is Janurary and 11 is Decemeber)
        pa_day: Day in the month
        pa_hour: Hour in the day
        pa_minute: Minute in the hour

        Returns
        -------
        decimalised version of the date
    """

    ln_cumulative_months = np.cumsum([0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])

    try:
        dec_date = float(
            ln_cumulative_months[pa_month] + pa_day - 1 + pa_hour / 24 + pa_minute / 60 / 24
        )
        if dec_date > 366:
            raise ValueError("Day is out of scope of the year")
        return dec_date

    except IndexError:
        raise ValueError('Month is out of scope')

and the decimal year is then computed in change_dates with:

day = year + (cal2dec(month - 1, day, hour, minute) / 365)

But I'm not sure this function is robust to leap years.
For instance here we can find this implementation:

from datetime import datetime as dt
import time

def toYearFraction(date):
    def sinceEpoch(date): # returns seconds since epoch
        return time.mktime(date.timetuple())
    s = sinceEpoch

    year = date.year
    startOfThisYear = dt(year=year, month=1, day=1)
    startOfNextYear = dt(year=year+1, month=1, day=1)

    yearElapsed = s(date) - s(startOfThisYear)
    yearDuration = s(startOfNextYear) - s(startOfThisYear)
    fraction = yearElapsed/yearDuration

    return date.year + fraction

which takes the year into account, contrary to cal2dec.

So the difference will read:

date = pd.to_datetime('20200301')
print(toYearFraction(date)) 
# 2020.1639344262296

dec = cal2dec(int(date.strftime("%m"))-1, int(date.strftime("%d")), int(date.strftime("%H")), int(date.strftime("%M")))
int(date.strftime("%Y")) + dec / 365 
# 2020.1616438356164

This may not be very significant for the calibration, but this is simply not right and we should fix this.

@gmaze gmaze added the refactoring improve the design, structure, and/or implementation label Jul 8, 2020
@edsmall-bodc
Copy link
Collaborator

I remember having this exact discussion with @kamwal before I had written any code at all! I then spoke to someone else, I cannot remember who, and was told it wasn't n issue, but I cannot remember who or why.

Perhaps we should reopen investigations, and when I have the answer I can post it here

@edsmall-bodc
Copy link
Collaborator

I couldn't find the discussions we had, but @kamwal remembers that we decided to convert the code as is initially, and then look to change this in the future. So we can keep this task on here.

@edsmall-bodc
Copy link
Collaborator

edsmall-bodc commented Jul 9, 2020

@kamwal found the discussion:

"It is true that leap years are not taken into account in the calculation of DATES (decimal year).
The date format in the reference databases is (yyyymmddhhmmss). These reference dates are then converted to decimal years when they are read by ow ( in get_region_ow.m and retr_region_ow.m routines).
Dates of the analysed float must be given in decimal years in the source.mat file.

OW uses DATES in decimal year to compute the time difference between the float date and the date of the reference data, that gives a "weight" to the reference data in the mapping procedure (The piece-wise linear fit does not uses DATES but N-CYCLE).
Obviously, the time difference will not be accurate each time one or more 29/02 are between the float date and the date of the reference data. I suppose the most significant impact would be when the reference and float data are very close in time, around February 29. A 1-day error over a few days would have a greater impact on weights than a 5-day error over 20 years. However, due to the spatial and temporal coverage of the reference data, I wonder if we would see the difference. That being said, given that you are making the effort to translate the code in python, I would recommend to be as accurate as possible."

So the above comment still stands: Not a pressing issue, but certainly something worth changing eventually.

@gmaze
Copy link
Member Author

gmaze commented Jul 9, 2020

ok, the function above can be used to replace the old one

@kamwal kamwal added the priority-low Low priority for next release label Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-low Low priority for next release refactoring improve the design, structure, and/or implementation
Projects
None yet
Development

No branches or pull requests

3 participants