Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<bety><write> tag not respected by get.trait.data() #2968

Closed
Aariq opened this issue Jul 18, 2022 · 7 comments · Fixed by #3065
Closed

<bety><write> tag not respected by get.trait.data() #2968

Aariq opened this issue Jul 18, 2022 · 7 comments · Fixed by #3065

Comments

@Aariq
Copy link
Collaborator

Aariq commented Jul 18, 2022

Bug Description

get.trait.data() seems to write to BETY (not in documentation), but it doesn't seem to check for settings$database$bety$write.

To Reproduce

run PEcAn workflow with settings$database$bety$write <- FALSE and check to see if file.path(settings$database$dbfiles, "posterior", settings$pfts$pft$posteriorid) exists

Expected behavior

Nothing should be written to settings$database$dbfiles if settings$database$bety$write == FALSE

@Aariq
Copy link
Collaborator Author

Aariq commented Jul 22, 2022

Just to clarify, this means that every time get.trait.data() is run it writes new posteriors to BETY. Here's an example of running runModule.get.trait.data() twice in a row with setting$database$bety$write <- FALSE:

> # Query trait database ----------------------------------------------------
> settings <- runModule.get.trait.data(settings)
2022-07-22 13:29:02 DEBUG  [PEcAn.DB::get.trait.data] : 
   `trait.names` is NULL, so retrieving all traits that have at least one 
   prior for these PFTs. 
2022-07-22 13:29:03 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : stomatal_slope 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   Median stomatal_slope : 3.79 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : SLA 
2022-07-22 13:29:04 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:04 INFO   [query.trait.data] : Median SLA : 43 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : Vcmax 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   Median Vcmax : 24.367 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : cuticular_cond 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   Median cuticular_cond : 30546 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [query.trait.data] : quantum_efficiency 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   Median quantum_efficiency : 0.062 
2022-07-22 13:29:04 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:04 INFO   [FUN] : 
 Number of observations per trait for PFT  'SetariaWT' :
                trait  n
1     cuticular_cond 33
2 quantum_efficiency 27
3                SLA 52
4     stomatal_slope 33
5              Vcmax 33 
2022-07-22 13:29:04 INFO   [FUN] : 
 Summary of prior distributions for PFT  'SetariaWT' :
                         distn parama  paramb  n
mort2                   gamma  1.470  0.0578  0
growth_resp_factor       beta  2.630  6.5200  0
leaf_turnover_rate      gamma  2.900  0.6300 40
leaf_width              gamma  6.530  1.4900 17
nonlocal_dispersal       beta 20.300 76.1000 30
fineroot2leaf           lnorm  0.811  0.8430  0
root_turnover_rate    weibull  1.670  0.6570 66
seedling_mortality       beta  3.610  0.4330  0
stomatal_slope        weibull  3.630  3.8100  4
quantum_efficiency       norm  0.057  0.0060 56
Vcmax                   lnorm  3.750  0.3000 12
r_fract                  beta  2.000  4.0000  0
cuticular_cond          lnorm  8.400  0.9000  0
root_respiration_rate weibull  2.660  6.2900 35
Vm_low_temp              norm 10.000  1.0200  0
SLA                   weibull  5.000 50.0000  0 
2022-07-22 13:29:04 DEBUG  [FUN] : The following posterior files found in PFT outdir  ( '/data/tests/ed2_testout/pft/SetariaWT' ) will be registered in BETY  under posterior ID  9000001246 :  'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' .  The following files (if any) will not be registered because they already existed:   
2022-07-22 13:29:05 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:05 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:05 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:05 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:05 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : c2n_leaf 
2022-07-22 13:29:06 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:06 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   Median c2n_leaf : 32.877 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : SLA 
2022-07-22 13:29:06 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:06 INFO   [query.trait.data] : Median SLA : 15.713 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   leaf_respiration_rate_m2 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   Median leaf_respiration_rate_m2 : 1.015 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : Vcmax 
2022-07-22 13:29:06 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   Median Vcmax : 43.212 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [query.trait.data] : quantum_efficiency 
2022-07-22 13:29:06 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   Median quantum_efficiency : 0.052 
2022-07-22 13:29:06 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:29:06 INFO   [FUN] : 
 Number of observations per trait for PFT  'ebifarm.c3grass' :
                      trait  n
1                 c2n_leaf 57
2 leaf_respiration_rate_m2  2
3       quantum_efficiency  6
4                      SLA 19
5                    Vcmax  3 
2022-07-22 13:29:06 INFO   [FUN] : 
 Summary of prior distributions for PFT  'ebifarm.c3grass' :
                            distn parama  paramb   n
mort2                      gamma  1.470  0.0578   0
growth_resp_factor          beta  2.630  6.5200   0
fineroot2leaf              lnorm  0.811  0.8430   0
root_turnover_rate       weibull  1.670  0.6570  66
seedling_mortality          beta  3.610  0.4330   0
Vcmax                      lnorm  4.510  0.6400  19
stomatal_slope             lnorm  2.590  0.2600  11
r_fract                     beta  2.000  4.0000   0
c2n_leaf                   gamma  4.180  0.1300  95
root_respiration_rate    weibull  2.660  6.2900  35
SLA                      weibull  2.060 19.0000 125
water_conductance          lnorm -5.400  3.0000   0
quantum_efficiency       weibull  3.320  0.0800   0
leaf_respiration_rate_m2   lnorm  0.632  0.6500  32 
2022-07-22 13:29:06 DEBUG  [FUN] : The following posterior files found in PFT outdir  ( '/data/tests/ed2_testout/pft/ebifarm.c3grass' ) will be registered in BETY  under posterior ID  9000001247 :  'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' .  The following files (if any) will not be registered because they already existed:   
> # Query trait database ----------------------------------------------------
> settings <- runModule.get.trait.data(settings)
2022-07-22 13:32:31 DEBUG  [PEcAn.DB::get.trait.data] : 
   `trait.names` is NULL, so retrieving all traits that have at least one 
   prior for these PFTs. 
2022-07-22 13:32:32 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : stomatal_slope 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   Median stomatal_slope : 3.79 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : SLA 
2022-07-22 13:32:33 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:33 INFO   [query.trait.data] : Median SLA : 43 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : Vcmax 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   Median Vcmax : 24.367 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : cuticular_cond 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   Median cuticular_cond : 30546 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [query.trait.data] : quantum_efficiency 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   Median quantum_efficiency : 0.062 
2022-07-22 13:32:33 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:33 INFO   [FUN] : 
 Number of observations per trait for PFT  'SetariaWT' :
                trait  n
1     cuticular_cond 33
2 quantum_efficiency 27
3                SLA 52
4     stomatal_slope 33
5              Vcmax 33 
2022-07-22 13:32:33 INFO   [FUN] : 
 Summary of prior distributions for PFT  'SetariaWT' :
                         distn parama  paramb  n
mort2                   gamma  1.470  0.0578  0
growth_resp_factor       beta  2.630  6.5200  0
leaf_turnover_rate      gamma  2.900  0.6300 40
leaf_width              gamma  6.530  1.4900 17
nonlocal_dispersal       beta 20.300 76.1000 30
fineroot2leaf           lnorm  0.811  0.8430  0
root_turnover_rate    weibull  1.670  0.6570 66
seedling_mortality       beta  3.610  0.4330  0
stomatal_slope        weibull  3.630  3.8100  4
quantum_efficiency       norm  0.057  0.0060 56
Vcmax                   lnorm  3.750  0.3000 12
r_fract                  beta  2.000  4.0000  0
cuticular_cond          lnorm  8.400  0.9000  0
root_respiration_rate weibull  2.660  6.2900 35
Vm_low_temp              norm 10.000  1.0200  0
SLA                   weibull  5.000 50.0000  0 
2022-07-22 13:32:33 DEBUG  [FUN] : The following posterior files found in PFT outdir  ( '/data/tests/ed2_testout/pft/SetariaWT' ) will be registered in BETY  under posterior ID  9000001248 :  'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' .  The following files (if any) will not be registered because they already existed:   
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : c2n_leaf 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   Median c2n_leaf : 32.877 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : SLA 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 INFO   [query.trait.data] : Median SLA : 15.713 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   leaf_respiration_rate_m2 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   Median leaf_respiration_rate_m2 : 1.015 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : Vcmax 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   Median Vcmax : 43.212 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [query.trait.data] : quantum_efficiency 
2022-07-22 13:32:35 ERROR  [PEcAn.utils::transformstats] : 
   data contains untransformed statistics 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   Median quantum_efficiency : 0.052 
2022-07-22 13:32:35 INFO   [query.trait.data] : 
   --------------------------------------------------------- 
2022-07-22 13:32:35 INFO   [FUN] : 
 Number of observations per trait for PFT  'ebifarm.c3grass' :
                      trait  n
1                 c2n_leaf 57
2 leaf_respiration_rate_m2  2
3       quantum_efficiency  6
4                      SLA 19
5                    Vcmax  3 
2022-07-22 13:32:35 INFO   [FUN] : 
 Summary of prior distributions for PFT  'ebifarm.c3grass' :
                            distn parama  paramb   n
mort2                      gamma  1.470  0.0578   0
growth_resp_factor          beta  2.630  6.5200   0
fineroot2leaf              lnorm  0.811  0.8430   0
root_turnover_rate       weibull  1.670  0.6570  66
seedling_mortality          beta  3.610  0.4330   0
Vcmax                      lnorm  4.510  0.6400  19
stomatal_slope             lnorm  2.590  0.2600  11
r_fract                     beta  2.000  4.0000   0
c2n_leaf                   gamma  4.180  0.1300  95
root_respiration_rate    weibull  2.660  6.2900  35
SLA                      weibull  2.060 19.0000 125
water_conductance          lnorm -5.400  3.0000   0
quantum_efficiency       weibull  3.320  0.0800   0
leaf_respiration_rate_m2   lnorm  0.632  0.6500  32 
2022-07-22 13:32:35 DEBUG  [FUN] : The following posterior files found in PFT outdir  ( '/data/tests/ed2_testout/pft/ebifarm.c3grass' ) will be registered in BETY  under posterior ID  9000001249 :  'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' .  The following files (if any) will not be registered because they already existed:  

The first run registers 'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' under posterior IDs 9000001246 and 9000001247 (one for each PFT), and the second run registers under IDs 9000001248 and 9000001249

@Aariq
Copy link
Collaborator Author

Aariq commented Jul 22, 2022

Actually, maybe the above comment is a separate bug? I'm not exactly sure what is supposed to happen here. Any insight @dlebauer?

@Aariq
Copy link
Collaborator Author

Aariq commented Jul 22, 2022

Ok, tracked this down a bit more. runModule.get.trait.data() is passing settings$meta.analysis$update to the forceupdate argument of get.trait.data(). If settings$meta.analysis$update is TRUE it will write to BETY, if anything else (e.g. "AUTO" or "FALSE") it will not. It does not check setting$database$bety$write. Is this the correct behavior?

@mdietze
Copy link
Member

mdietze commented Jul 22, 2022

So I think you've hit on a bit of code that's given us trouble for a long time. In terms of desired behavior, the trait query and MA should NOT be running every time the workflow is run. The fact that it tends to has resulted in a massive overproliferation of Posteriors records, hugh numbers of which are virtually identical. In the early days of the project, when David had a whole team of folk populating the trait database it made more sense to update the posteriors more frequently, but at this point it should probably only occur when the user explicitly asks for an update (i.e. the default for forceupdate should be FALSE). The AUTO mode, which aimed to only run the MA when the data has changed, never did this correctly and tended to always run.

@Aariq
Copy link
Collaborator Author

Aariq commented Jul 22, 2022

Yeah, the get.trait.data() doesn't do anything with "AUTO", it converts anything other than "TRUE" to FALSE. But I'm confused about something---isn't the MA run by runModule.run.meta.analysis()? Why is get.trait.data() using the settings$meta.analysis$update at all? Also get.trait.data.pft() seems have code to print messages that indicate the MA is getting re-run, but I don't see where there is code in that function to actually run the meta analysis.

@Aariq
Copy link
Collaborator Author

Aariq commented Jul 22, 2022

IMO a function called get_* shouldn't write anything or do any analysis. None of this behavior is documented, which is partly why this is taking me so long to figure out.

@Aariq
Copy link
Collaborator Author

Aariq commented Nov 8, 2022

The short-sighted fix is to give get.trait.data.pft() a write argument and have it inherit that from runModule.get.trait.data(settings).

A maybe better solution is to have read.settings() store the write tag as an environment variable or an option and have all the relevant PEcAn.DB functions check for that option / env variable before doing anything.

mdietze added a commit that referenced this issue Jan 11, 2023
Fix for #2968 to add `write` argument to `get.trait.data()` and friends
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants