You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using this dataset to validate some hierarchical forecasting techniques. Applying it to the labour dataset I noticed some odd results.
The summation matrix that is present for the Labour dataset is incorrect. This can be validated with 2 simple examples:
The second row for Australia Capital Territory has 1's for NSW, VIC etc
The diagonal component of the matrix does not have the index and columns lining up correctly. E.g. ['Employed part-time', 'Males', 'Western Australia'] = 1x ['Employed part-time', 'Females', 'Australian Capital Territory']
Ex1
Ex2
Min code to replicate
fromdatasetsforecast.hierarchicalimportHierarchicalDatadf, S, tags=HierarchicalData.load("datasetforecast_data_dir", 'Labour')
# Note that ACT is summed from NSW and VicS.iloc[:2,:5]
Code to generate above matrix from forecast df (wide format)
defgenerate_labor_S_Matrix_from_raw(df: pd.DataFrame) ->pd.DataFrame:
base_ts_columns=df.columns[[len(s) ==3forsindf.columns.str.split(',')]]
S=np.empty((len(df.columns), len(base_ts_columns)))
fori,col_nameinenumerate(df.columns.to_list()):
# Construct arrays of summing values by searching the base time series columns# Works for all rows but totalsearchable_terms=col_name.strip("[]").replace("'", '').split(',')
searchable_terms= [t.strip() fortinsearchable_terms]
search_str=".*".join(searchable_terms)
S[i,:] =base_ts_columns.str.contains(search_str,regex=True)
# Finally fix up the total / first rowS[0,:] =np.ones((len(base_ts_columns),))
S_df=pd.DataFrame(data=S, columns=base_ts_columns, index=df.columns)
returnS_df
Given that the raw dataset is actually not part of the repo what's the path towards updating this?
The text was updated successfully, but these errors were encountered:
Hey @hewsond! Thanks for letting us know about this issue. We have updated the datasets to include the summing matrix you shared. We've also added a test (#18) to ensure that S is the correct summing matrix associated with Y_df. To get the latest (right) data, remove the previously downloaded files (datasetforecast_data_dir/hierarchical). :)
I've been using this dataset to validate some hierarchical forecasting techniques. Applying it to the labour dataset I noticed some odd results.
The summation matrix that is present for the Labour dataset is incorrect. This can be validated with 2 simple examples:
Ex1
Ex2
Min code to replicate
The correct S matrix is attached.
S_labour.csv
Code to generate above matrix from forecast df (wide format)
Given that the raw dataset is actually not part of the repo what's the path towards updating this?
The text was updated successfully, but these errors were encountered: