Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem paths for diffusion models #32

Closed
Lestropie opened this issue Nov 22, 2021 · 8 comments
Closed

Filesystem paths for diffusion models #32

Lestropie opened this issue Nov 22, 2021 · 8 comments

Comments

@Lestropie
Copy link
Collaborator

This Issue addresses the fundamental question of the file names (& potentially sub-directory placement) of diffusion model derivatives should be specified.

This is highly important for DWI models since the outcomes of fitting a model to the data typically does not result in the generation of a single output file, but many files, and quite often the individual files are not very interpretable in isolation.

  1. Single fixed file suffix

    This is the strategy that was initially proposed for BEP016 before I took a major hacksaw to things in [ENH] Restructuring of DWI derivatives #5, but this particular question was not exhaustively considered due to the large sweeping changes there.

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                <source_keywords>[_space-<space>][_desc-<label>]_model-<label>[_parameter-<parameter>]_diffmodel.nii[.gz]
                <source_keywords>[_space-<space>][_desc-<label>]_model-<label>_diffmodel.json
    

    Advantage: very easy for the BIDS validator to check derivative data for conformity, since there's a single suffix across all possible diffusion models.

    Disadvantage: file names do become a bit longer than for other options, since for multiple files in the directory you will have "_model-<label> ... _diffmodel" appearing.

  2. Unique file suffix per model

    This is what is currently shown as of [ENH] Restructuring of DWI derivatives #5.

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                <source_keywords>[_space-<space>][_desc-<label>]_parameter-<intparam1>_<model>.nii[.gz]
                <source_keywords>[_space-<space>][_desc-<label>]_parameter-<intparam2>_<model>.nii[.gz]
                <source_keywords>[_space-<space>][_desc-<label>]_<model>.json
    
                [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam1>_<model>.nii[.gz]]
                [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam1>_<model>.json]
                [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam2>_<model>.nii[.gz]]
                [<source_keywords>[_space-<space>][_desc-<label>]_parameter-<extparam2>_<model>.json]
    

    Advantage: Primary identifying data contents of files are shown at end of file basename, whereas if as per 1. above this suffix is simply "diffmodel" you then have to read back through the prior key-value fields to find the "_model-" item to find out what you're looking at; fine for a machine, slightly annoying for a human.

    Disadvantage: Slightly harder for the validator. The current text suggests that the specified model suffixes are compulsory only if it is one of those models that is used. This means that if the validator encounters a suffix it doesn't know, it can only produce a warning, not an error. This therefore has potentially very large ramifications for BIDS derivatives more generally (assuming it is intended that such be used for BIDS derivatives).

  3. Use filesystem hierarchy

    It may be possible for a BIDS derivatives dwi/ directory to become exceptionally full if one performs a lot of operations and generates a lot of derivative data. The ability to "group" those files corresponding to a specific model fit into a sub-directory of the modality directory would make things easier to navigate for users.

    Consider an example App that both performs DWI pre-processing and fits the tensor model. Currently this might look like:

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>_desc-preproc_dwi.nii[.gz]
                sub-<participant_label>_desc-preproc_dwi.bvals
                sub-<participant_label>_desc-preproc_dwi.bvecs
                sub-<participant_label>_desc-preproc_dwi.json
                sub-<participant_label>_parameter-all_dti.nii[.gz]
                sub-<participant_label>_parameter-bzero_dti.nii[.gz]
                sub-<participant_label>_parameter-fa_dti.nii[.gz]
                sub-<participant_label>_parameter-md_dti.nii[.gz]
                sub-<participant_label>_parameter-ad_dti.nii[.gz]
                sub-<participant_label>_parameter-rd_dti.nii[.gz]
    

    The model derivatives could instead be placed into their own sub-directory. For example (noting that there are a number of possible alternatives here):

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>_desc-preproc_dwi.nii[.gz]
                sub-<participant_label>_desc-preproc_dwi.bvals
                sub-<participant_label>_desc-preproc_dwi.bvecs
                sub-<participant_label>_desc-preproc_dwi.json
                dti/
                    sub-<participant_label>_dti.nii[.gz]
                    sub-<participant_label>_bzero.nii[.gz]
                    sub-<participant_label>_fa.nii[.gz]
                    sub-<participant_label>_md.nii[.gz]
                    sub-<participant_label>_ad.nii[.gz]
                    sub-<participant_label>_rd.nii[.gz]
    

    This might be beneficial in the longer term as derivatives become more widespread & complex, in order to not clog up a basic modality directory structure with large amounts of unrelated data.

    As far as the validator is concerned, this would mean that it would know to give freedom to file basename suffixes where a file resides in a sub-directory of the modality directory. It could also check that the names of those sub-directories conform. E.g. Consider the following where there are multiple tensor fits performed using different methods:

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>_desc-preproc_dwi.nii[.gz]
                sub-<participant_label>_desc-preproc_dwi.bvals
                sub-<participant_label>_desc-preproc_dwi.bvecs
                sub-<participant_label>_desc-preproc_dwi.json
                desc-linear_dti/
                    sub-<participant_label>_dti.nii[.gz]
                    sub-<participant_label>_bzero.nii[.gz]
                    sub-<participant_label>_fa.nii[.gz]
                    sub-<participant_label>_md.nii[.gz]
                    sub-<participant_label>_ad.nii[.gz]
                    sub-<participant_label>_rd.nii[.gz]
                desc-loglinear_dti/
                    sub-<participant_label>_dti.nii[.gz]
                    sub-<participant_label>_bzero.nii[.gz]
                    sub-<participant_label>_fa.nii[.gz]
                    sub-<participant_label>_md.nii[.gz]
                    sub-<participant_label>_ad.nii[.gz]
                    sub-<participant_label>_rd.nii[.gz]
                desc-iwls_dti/
                    sub-<participant_label>_dti.nii[.gz]
                    sub-<participant_label>_bzero.nii[.gz]
                    sub-<participant_label>_fa.nii[.gz]
                    sub-<participant_label>_md.nii[.gz]
                    sub-<participant_label>_ad.nii[.gz]
                    sub-<participant_label>_rd.nii[.gz]
    

The sub-directories here should not I think include "sub-<participant_label>", but should at least confirm to the basic key-value structure with a final identifier (please let me know if the BIDS community has settled on nomenclature for such things so that we can speak the same language).

@Lestropie
Copy link
Collaborator Author

One possibility to consider regarding option 1 is that if suffix "_model" were used, this would potentially be applicable across a very wide range of model derivatives across many modalities (potentially even beyond MRI).

@Lestropie
Copy link
Collaborator Author

Another thought regarding option 1: With the current state of the proposal (which is option 2), there is no stringent need to separate between intrinsic and extrinsic parameters; the only difference is whether or not the data are compulsory in order to interpret the model. However if option 1 were instead adopted, one would then have to question whether it makes sense to have something like an FA parametric map stored with a file name "_model.nii[.gz]", or whether an explicit separation between intrinsic and extrinsic parameters would be necessary at the suffix.

(P.S. "Intrinsic" vs. "Extrinsic" isn't necessarily the best nomenclature and could be changed; it's maybe something more like "model parameters" vs. "model-derived parameters")

@Lestropie
Copy link
Collaborator Author

Will consider implementing a PR (following merge of #40) showing what option 1 might look like in reality.

@Lestropie
Copy link
Collaborator Author

Making an attempt at reverting back to solution 1 reminded me of one of the issues faced there.

With each unique model having a pre-defined suffix and set of parameters (i.e. option 2 & the current state of the draft), the data representation of each image is established in the specification, and so just a single JSON file can be used to store all metadata related to the model.

With option 1, with every DWI model falling under the suffix of "_model", there are two options:

  1. Dictate in the specification the names of parameters for specific models, and those parameter names would need to appear in the file names somewhere (e.g. "_param-"). The specification would then inform as to the data representation used for each of those parameters.

  2. Each parameter image has associated with it a JSON file, and within that JSON file there is a field that specifies from a dictionary the data representation of that image.

Personally I would prefer option 2. That would actually allow a specification that is agnostic to the nature of specific DWI models, and (with adequate foresight) any new DWI model could be stored in a BIDS Derivatives compliant way without ever having to modify the specification to integrate that model.

The problem however is the inheritance principle. The data representation being unique for each model parameter means that there needs to be a unique JSON for each model parameter image. There is however other information that apply to the model as a whole, rather than to specific parameters; e.g. model fitting parameters, mask image, that sort of thing. Rather than duplicate this information across all model parameter JSONs, it would be preferable to instead have one JSON for the model as a whole that provides this information, and one JSON per parameter that provides information relevant to that parameter only. But that would require multiple applicable JSONs at one level of the filesystem hierarchy, which is not permitted.

The way I see it, there are two possible solutions to this:

  1. Modify the inheritance principle. My suspicion is that the reason behind not permitting multiple JSONs to be applicable in a single directory is about forbidding a situation where there is ambiguity about either which JSONs are applicable to a given image, or in what order they should be loaded (and therefore in the presence of conflicting contents, which takes precedence). I do however think that the mandate here could be more specific to the problem, in a way that prevents such ambiguity but that permits what we would require here.

    For instance, here's a DWI model with two applicable JSONs in a single directory:

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param1>_model.nii[.gz]
                <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param1>_model.json
                <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param2>_model.nii[.gz]
                <source_keywords>[_space-<space>][_desc-<label>]_parameter-<param2>_model.json
                <source_keywords>[_space-<space>][_desc-<label>]_model.json
    

    There is no ambiguity here: for either image, <source_keywords>[_space-<space>][_desc-<label>]_model.json should be loaded first (if desired), and contains all information relating to the fitting of the model to the DWI data; <source_keywords>[_space-][_desc-]_parameter-<param#>_model.json` is loaded second and contains information relating to the derivation of that parameter from the model, and to the data representation of that image.

    A hypothetical case like this would however be ambiguous:

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                <source_keywords>_space-X_desc-Y_model.nii[.gz]
                <source_keywords>_space-X_desc-Y_model.json
                <source_keywords>_space-X_model.json
                <source_keywords>_desc-Y_model.json
    

    If interested in all JSON contents related to the image above, there is no reasonable rule regarding which of the two other JSONs should be loaded first.

    So the appropriate rule would be that, for any image, multiple JSONs within a single directory must have a strict ordering defined by subsets of the filename entities, with the load order being the reverse of such.

  2. Utilize directory structure, as per Option 3 in the first post of this thread.

    This might look something like:

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>_desc-preproc_dwi.nii[.gz]
                sub-<participant_label>_desc-preproc_dwi.bvals
                sub-<participant_label>_desc-preproc_dwi.bvecs
                sub-<participant_label>_desc-preproc_dwi.json
                sub-<participant_label>_desc-tensor_model/
                    sub-<participant_label>_parameter-all_model.nii[.gz]
                    sub-<participant_label>_parameter-all_model.json
                    sub-<participant_label>_parameter-bzero_model.nii[.gz]
                    sub-<participant_label>_parameter-bzero_model.json
                    sub-<participant_label>_parameter-fa_mdp.nii[.gz]
                    sub-<participant_label>_parameter-fa_mdp.json
                    sub-<participant_label>_parameter-md_mdp.nii[.gz]
                    sub-<participant_label>_parameter-md_mdp.json
                    sub-<participant_label>_parameter-ad_mdp.nii[.gz]
                    sub-<participant_label>_parameter-ad_mdp.json
                    sub-<participant_label>_parameter-rd_mdp.nii[.gz]
                    sub-<participant_label>_parameter-rd_mdp.json
                sub-<participant_label>_desc-tensor_model.json
    

    Or, as per option 3, it could instead be tarballed:

    <pipeline_name>/
        sub-<participant_label>/
            dwi/
                sub-<participant_label>_desc-preproc_dwi.nii[.gz]
                sub-<participant_label>_desc-preproc_dwi.bvals
                sub-<participant_label>_desc-preproc_dwi.bvecs
                sub-<participant_label>_desc-preproc_dwi.json
                sub-<participant_label>_desc-tensor_model.tar[.gz]
                sub-<participant_label>_desc-tensor_model.json
    

@Lestropie
Copy link
Collaborator Author

I originally posted about the above issue in bids-standard/bids-specification#259.

I think what I need to do is first write a proposal PR for bids-standard/bids-specification#482 to reduce ambiguity about the current state of the inheritance principle, and then write a second PR proposing specific language for bids-standard/bids-specification#259, i.e. what the language would need to be in order to facilitate what I'm looking for here.

@francopestilli
Copy link
Collaborator

It looks like this one is a priority. If we move on with this we are left with a single PR is that right?

@Lestropie
Copy link
Collaborator Author

This is the primary one on which I want to get feedback from BIDS maintainers before proceeding much further. bids-standard/bids-specification#946 is ready for review; I then need to, on top of that, propose a solution for bids-standard/bids-specification#259 in order to facilitate option 1, which I think is my personal long-term preference. It was indicated in a comment here that maintainers may be open to doing so as a minor rather than major revision, though this would require dialogue with validator maintainers as well.

@Lestropie
Copy link
Collaborator Author

With rejection of bids-standard/bids-specification#1003 and bids-standard/bids-specification#1280 and acceptance of bids-standard/bids-specification#1602, proceeding with option 1 "single file suffix" "_dwimap" and no modification to either inheritance or directory structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants