Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: "Error in abs(x) : non-numeric argument to mathematical function" #14

Closed
achetverikov opened this issue Aug 17, 2023 · 5 comments

Comments

@achetverikov
Copy link

achetverikov commented Aug 17, 2023

Hi! Thanks for a nice package! I'm trying to use it for my data with the following specs:

── jlmer specification ─────────────────────────────────────────────── <jlmer_spec> ──
Formula: bias_c ~ 1 + distr_sd20 + target_sd20 + distr_sd20__target_sd20
Predictors:
  distr_sd: distr_sd20
  target_sd: target_sd20
  distr_sd:target_sd: distr_sd20__target_sd20
Groupings:
  Subject: prolific_pid
  Trial: trial_block
  Time: abs_td_dist
Data:
# A tibble: 9,613 × 7
  bias_c distr_sd20 target_sd20 distr_sd20__target_sd20 prolific_pid trial_block
   <dbl>      <dbl>       <dbl>                   <dbl> <chr>        <fct>      
1 -32.6           0           0                       0 293960       2.1        
2  -1.02          1           0                       0 293960       5.1        
3  16.0           0           0                       0 293960       8.1        
# ℹ 9,610 more rows
# ℹ 1 more variable: abs_td_dist <dbl>

! Sampling rate for the `time` column "abs_td_dist" is not constant - may affect interpretability of results.

I then do

empirical_statistics <- compute_timewise_statistics(model_specs)

empirical_clusters <- extract_empirical_clusters(empirical_statistics, threshold = 2.5)

but get the error "Error in abs(x) : non-numeric argument to mathematical function".

When I do:

> str(empirical_statistics)
'timewise_statistics' num [1:3, 1:8565] -Inf -Inf -Inf -Inf -Inf ...
- attr(*, "dimnames")=List of 2
..$ Predictor: chr [1:3] "distr_sd20" "target_sd20" "distr_sd20__target_sd20"
..$ Time     : chr [1:8565] "0.0199999999999988" "0.0299999999999981" "0.0399999999999975" "0.0599999999999963" ...
- attr(*, "statistic")= chr "t"
- attr(*, "term_groups")=List of 3
..$ distr_sd          : chr "distr_sd20"
..$ target_sd         : chr "target_sd20"
..$ distr_sd:target_sd: chr "distr_sd20__target_sd20"

I see that the 'time' variable is converted to a character vectir for some reason.

@yjunechoe
Copy link
Owner

yjunechoe commented Aug 17, 2023

Thanks for the report!

I'm not entirely sure what went wrong without looking at the data (if you want me to help debug this better, you can send the data to jchoe001@gmail.com), but I can at least say that it's not an issue with Time being character vector (that's just the label for each timepoint in the "dimnames" attribute of the matrix, though to your point it's best practice to make Time an evenly-spaced sequence of integers to avoid the floating point madness)

What's more worrying is the fact that the timewise statistics values are seemingly all -Inf. Can you fit a regular lm() to the subset of the data at timepoint 0.02 and copy-paste the output here? Something like;

lm(bias_c ~ distr_sd20 * target_sd20, data = df[df$abs_td_dist == 0.02, ])

My guess is that the response variable is constant at that timepoint (and possibly at others) and so it's getting flagged internally for being ill-formed. To demonstrate:

x <- mtcars
x$mpg <- 1
mod <- lm(mpg ~ hp, x)
summary(mod)
#> Warning in summary.lm(mod): essentially perfect fit: summary may be unreliable

I can think about ways of handling this more graciously but you can confirm whether my suspicion is correct first

@achetverikov
Copy link
Author

Yeah, I guess your intuition is correct. My time variable is uniformly and randomly sampled within the limits, so there's almost always just one observation for a given time point. Is there a way to handle it better than just binning?

@yjunechoe
Copy link
Owner

My time variable is uniformly and randomly sampled within the limits, so there's almost always just one observation for a given time point.

As you guessed, this is likely the problem. The kind of data that CPA expects is one where the timepoints are known (e.g., instead of time being randomly sampled, it should be follow a fixed sampling rate) and there are multiple observations within each timepoint. Essentially, you're fitting a linear model at each time point and collecting the t-statistics of the predictors. As a first check, these timewise linear models must make sense.

Without knowing the context of the study, I don't know what kind of assumptions you violate when you choose to bin the data. But indeed, binning will get you closer to the kind of data that CPA expects. In my opinion, binning is actually not a huge problem so I would give it a try - you lose some sensitivity in the temporal resolution but it doesn't bias the test any more than the choice of a threshold. For an example, see the Geller et al. 2020 case study vignette which runs a CPA on data binned by time.

@achetverikov
Copy link
Author

Thanks! I was thinking that maybe some kind of a moving window approach could be used, or perhaps a weighted regression where weights are based on the temporal distance. Now that I think of it, it should be straightforward to implement a permutation test for this. Anyhow, thanks for your feedback! It's very helpful, and the issue could be closed.

@yjunechoe
Copy link
Owner

Great - cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants