Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple random effects in addition to subject in mixed-effect model #6

Closed
XiaoyuZeng opened this issue Jul 14, 2023 · 5 comments
Closed

Comments

@XiaoyuZeng
Copy link

XiaoyuZeng commented Jul 14, 2023

I was trying to build a mixed-effect model related to an intracranial EEG study, thus I have a more complex random effect structure, that is, recording sites nested in electrodes that nested in the subject.

Example model specification: seeg~1+data_beh+(1|chid:elecid:subjinfo)+(1|elecid:subjinfo)+(1|subjinfo)

However, I ran into an error when I tried to specify this complex LME in make_jlmer_spec:

Error in [.data.frame(df, !na_rows, names(re_bars), drop = FALSE) : An undefined column is selected

I dug into the make_jlmer_spec function, and found limited random effect specifications such as subject = NULL, trial = NULL,time = NULL.

After removing the random effect of chid and elecid and only keeping the subject random effect, it worked well to run the make_jlmer_spec and clusterpermute.
But meanwhile I got a warning:

! Grouping columns "subjinfo", "trial", and "time" do not uniquely identify rows in the data

My question is, how can I add additional random effects in make_jlmer_spec?
I think this is necessary not only for my intracranial EEG study but also for other studies involving more complex random effects, for example, subjects from different schools, which would additionally involve random effects of schools.
My other question is, how exactly did the permute_timewise_statistics shuffle data? shuffle trial-wise label? or shuffle time-points? and how did it handle random-effect labels? keeping them unchanged?

@yjunechoe
Copy link
Owner

yjunechoe commented Jul 14, 2023

Thanks for the issue! I'm not an expert in EEG so please take my advice with a grain of salt, but I'll link to some references and hopefully I can at least clear up how jlmerclusterperm works.

  1. First, to address your question about adding additional random effects - currently make_jlmer_spec() does not support nested random effects, but there's no limit to the number of random effect groups you can add. This is independent of the fact that make_jlmer_spec() has the arguments subject = NULL, trial = NULL, time = NULL - that information is relevant to the permutation procedure, not model fitting. I don't have plans to support conversion for nested groups to Julia formula syntax because I think this is seldom useful in the context of a CPA due to the constraints of the permutation algorithm, as I'll explain more in my other points. But if you really want to, you can create a column representing the nested groups before hand with something like like data$chid_elecid_subjinfo <- interaction(data$chid, data$elecid, data$subjinfo) and replace (1|chid:elecid:subjinfo) in your formula with (1|chid_elecid_subjinfo). These are equivalent for model fitting purposes. You can further consult the MixedModels.jl github repo's issue 74 (not linking to it here so I don't clutter up that issue), where they suggest:

    Yes. In tact the preferred approach is to create the interaction factor in the Frame before fitting the model because A:B in Julia doesn't mean what it does in R.

  2. The shuffling logic behind permute_timewise_statistics() depends on whether it's a within-subject or between-subject study, which it infers from the presence of the trial column specified in make_jlmer_spec(). The permute_by_predictor() function lets you see how your <jlmer_spec> would get shuffled in the CPA procedure. It's a bit difficult to explain over text - you can play around with permute_by_predictor() first and let me know if you have specific questions about how your spec shuffles.

But more importantly, I want to note that the permutation procedure is not well defined for multiple nested groups, beyond subject and trial. So frankly I'm not sure how the simulating the null should look like when you have something like Site > Electrodes > Subjects. I think that this depends a lot on your research question and what you consider to be a random group. Whatever is the case, I'd recommend that your data follows the constraints set by make_jlmer_spec(), which is: "data should be stratified by subject (and further by trial, if it's within-participants experiment), and the combination of columns for subject, trial, and time should uniquely identify each row". You may need to partition or average over your data to get there, which means you lose some precision and break some assumptions, but that's frequent in practice due to aforementioned constraints. I also highly recommend consulting the documentation of the procedure in FieldTrip, a matlab tool that implements CPA designed by the authors of the original paper introducing CPA for EEG (Maris & Oostenveld).

That being said, here's a suggestion: How about splitting the data by electrode, and within each electrode fit a CPA. So a CPA per electrode where the formula is now reduced to 1 + data_beh + (1|subjinfo)) and each row of the data is a unique combination of subject and time. Then, you can do some post-hoc analysis of the spatial dimension that lets you make statements like "Effects are observed mostly in frontal cortex and less in temporal cortex". Note that CPA itself is not a test for where in the spatial/temporal region the effect emerges - it can only detect the existence of a "difference unlikely to be due to chance" (see - https://onlinelibrary.wiley.com/doi/10.1111/psyp.13335). This of course ignores the "recording sites" layer, but how many unique recording sites do you have in your data? If it's just a few, I doubt that you can get any meaningful estimates for variance at the recording-site-level anyways.

Sorry for the wall of text. The level of complexity for your case is actually beyond what I had in mind for the package (and partly for the CPA procedure itself) but hopefully this gives you some options. Cheers!

@XiaoyuZeng
Copy link
Author

XiaoyuZeng commented Jul 15, 2023

Thank you, June, for your time and effort to clarify this problem. I highly appreciate it.

As you correctly pointed out, this problem, the nested random effect structure in intracranial EEG, is beyond the scope of your package. Indeed this is a challenging problem in the field of intracranial EEG. Although some progress has been made in the field of scalp EEG regarding combining LME and CBA, this problem is even more challenging in intracranial EEG as patients often have varying numbers of electrodes that carry varying numbers of recording sites.

Nevertheless, I first followed your first suggestion to create data$chid_elecid_subjinfo and set subject = "chid_elecid_subjinfo". However, this model specification leads to a singular fit, suggesting potential overfitting. Therefore, I simplified the model to set subject = "chid_subjinfo".

It worked now. However, I ran into another problem after successfully running the empirical_statistics.
image
The cluster-level p value could be viewed when displaying the output of calculate_clusters_pvalues. However, I failed to export the cluster p value in a csv file.

I've tried:
1.transforming the list variable the output of calculate_clusters_pvalues into a data frame/data table. However, when being transformed into data frame, the output of calculate_clusters_pvalues lost the cluster p value information.
2. unlist the the output of calculate_clusters_pvalues. Still, it lost the p value information.

I guess this problem is due to the way you stored the p value information in the output of calculate_clusters_pvalues. Any idea to export the cluster p value?
Again, many thanks for your effort to answer my questions and maintain this amazing package.

@yjunechoe
Copy link
Owner

Good to hear you've settle on an approach!

As for your question, the package provides a tidy() method to convert any non-data frame object into a data frame representation. So tidy(cluster_p) should give you a data frame that you can save to csv.

More info in the Tidying Output vignette: https://yjunechoe.github.io/jlmerclusterperm/articles/tidying-output.html

@XiaoyuZeng
Copy link
Author

Good to hear you've settle on an approach!

As for your question, the package provides a tidy() method to convert any non-data frame object into a data frame representation. So tidy(cluster_p) should give you a data frame that you can save to csv.

More info in the Tidying Output vignette: https://yjunechoe.github.io/jlmerclusterperm/articles/tidying-output.html

Thanks, June!
tidy(cluster_p) indeed solves the problem!
Many thanks for your time and effort to guide me to use this amazing package. It saves a lot of time for me!

@yjunechoe
Copy link
Owner

Great - and no problem! I'll mark this as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants