The data checker relies on the following libraries:
numpy, xarray, argparse, dateutil.relativedelta, datetime, json, sys, os, pathlib, re, stat, logging, typing
Install requirements with:
pip install numpy xarray python-dateutil
- Add
${checkerdir}/src
toPYTHONPATH
in~/.bashrc
, where${checkerdir}
is the full path to the checker directory:
export PYTHONPATH="${PYTHONPATH}:${checkerdir}/src"
- Configure the file
config_lu.json
which contains the following settings:directory
: the directory with the files requiring checking;log_path
: where to save logs (relative path inside the checker directory);base_path
: full path to the checker directory;required_file_types
: for the landuse files there are "multiple-management", "multiple-states", "multiple-transitions";required_variables
: variables which are mandatory to be in the files (for each file type independently)required_coords
: coordinates which are mandatory to be in the files (for each file type independently);required_attributes
: general attributes which are mandatory for the files;required_attributes_in_vars
: variable-specific attributes which are mandatory for the files.
- Configure the file
${checkerdir}/src/variable-info.json
which contains the variable ranges requirements (for each file type independently).
- Run:
python run_script.py config_lu.json
.
FileNameChecker: ${checkerdir}/src/checkers/checker_00_file_name.py
Check filetype ("multiple-management", "multiple-states", or "multiple-transitions") and the filename (it should match a pattern multiple-<...>_input4MIPs_landState_<...>_gn_YYYY-YYYY.nc
).
It uses functions from ${checkerdir}/src/utils/misc_utils.py
.
StandardComplianceChecker: ${checkerdir}/src/checkers/checker_01_standard_compliance.py
Check file permissions, dimension variables, compulsory attributes, _FillValue
.
SpatialCompletenessChecker: ${checkerdir}/src/checkers/checker_02_spatial_completeness.py
Create the reference mask based on the reference file and check the presence of missing values.
It uses functions from ${checkerdir}/src/utils/misc_utils.py
.
SpatialConsistencyChecker: ${checkerdir}/src/checkers/checker_03_spatial_consistency.py
Check that the lon/lat grid points correspond to the reference file.
TemporalConsistencyChecker: ${checkerdir}/src/checkers/checker_04_temporal_consistency.py
Check timesteps for consistency.
It uses functions from ${checkerdir}/src/utils/path_utils.py
.
ValidRangesChecker: ${checkerdir}/src/checkers/checker_05_valid_ranges.py
Check that data values are in the required range (defined in ${checkerdir}/src/variable-info.json
).
It uses functions from ${checkerdir}/src/utils/misc_utils.py
.
StatesTransitionsChecker: ${checkerdir}/src/checkers/checker_06_states_transitions.py
-
For each
multiple-states_<XXX>
: check that the sum of all variables is close to 1. -
For each
multiple-transitions_<XXX>
: take the corresponding filemultiple-states_<XXX>
(with the same<XXX>
) and check that the sum of the gross landuse transitions matches the difference in states between two consecutive years (except for the variablessecdf, primf, secdn, primn
).
Algorithm for (2)
:
-
In
multiple-states_<...>
, we have variables'c3ann' 'c3nfx' 'c3per' ...
, so for each variablevar
we take its value for the year Y:var_states_Y
, and its value for the year Y+1:var_states_(Y+1)
. -
In
multiple-transitions_<...>
, we have'c3ann_to_c3nfx' 'c3ann_to_c3per' 'c3ann_to_c4ann' ...
, i.e.X_to_var
andvar_to_X
withvar
frommultiple-states_<...>
.
We calculate (for every year Y):
sum(X_to_var)
- the sum of all variables inmultiple-transitions_<...>
for the year Y with namesto_{var}
, and
sum(var_to_X)
- the sum of all variables inmultiple-transitions_<...>
for the year Y with names{var}_to
,
e.g. forc3ann
at the year Y:
sum(X_to_var) = sum ['c3nfx_to_c3ann', 'c3per_to_c3ann', 'c4ann_to_c3ann', 'c4per_to_c3ann', 'primf_to_c3ann', 'primn_to_c3ann', 'secdf_to_c3ann', 'secdn_to_c3ann', 'urban_to_c3ann', 'pastr_to_c3ann', 'range_to_c3ann']
sum(var_to_X) = sum ['c3ann_to_c3nfx', 'c3ann_to_c3per', 'c3ann_to_c4ann', 'c3ann_to_c4per', 'c3ann_to_secdf', 'c3ann_to_secdn', 'c3ann_to_urban', 'c3ann_to_pastr', 'c3ann_to_range']
-
We want this equation to be true:
sum(X_to_var) - sum(var_to_X) = var_states_(Y+1) - var_states_Y
,
so for each variable we calculatedelta
which should be close to 0:
delta = [ sum(X_to_var) - sum(var_to_X) ] - [ states_(Y+1) - states_Y) ]
${checkerdir}/run_script.py
: run the "main" function;${checkerdir}/src/checkers/directory_checker.py
and${checkerdir}/scripts/check_file.py
: configure the parameters and run all checkers;${checkerdir}/src/utils
: functions which are used by checkers.
For each run, the checker creates a new logging directory (its name includes the dataset name, current date and time) in ${checkerdir}/logs
(the "logs" name can be modified in config_lu.json
in "log_path").
There are files:
<...>_errors.log
- only errors;<...>_output.log
- all information about the checking.