Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Inheritance principle: Relaxation allowing multiple files per directory #1003

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
174 changes: 54 additions & 120 deletions src/02-common-principles.md
Original file line number Diff line number Diff line change
Expand Up @@ -559,52 +559,54 @@ for more information.

## The Inheritance Principle

1. Any metadata file (such as `.json`, `.bvec` or `.tsv`) MAY be defined at any directory level.
In many BIDS datasets, any data file and its corresponding metadata file
will possess an identical set of entities and suffix,
differing only in their file extensions.
This implicitly associates each data file with one metadata file,
and one metadata file per data file.
More advanced usages are however permissible:
any data file may "inherit" metadata information from one or more metadata files,
provided that the location of the metadata file in the file system
makes it applicable to that data file.

Where more than one metadata file is applicable to a data file,
the contents of the file closest in filesystem position to the data file
(lowest in the directory hierarchy, most entities in the file name)
takes precedence.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already a specific rule, which is incomplete (no description for order at the same level) and somewhat ambiguous -- "takes precedence" as the only file loaded, or loaded last (as what it is currently, but not explained here). Thus I would just remove it from this "IP intro"

Copy link
Collaborator Author

@Lestropie Lestropie Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already a specific rule

Not sure what is intended here: it is an enumerated rule in the appendix, but the text here is intended as a more reader-friendly description.

which is incomplete (no description for order at the same level)

This is tough to get the right balance between being reader-friendly and yet accurate.

I'll try a complete alternative:
"Where more that one metadata file is applicable to the data file, all of those metadata files MUST be loaded, in a specific order dictated by the Inheritance Principle. This ordering influences the contents of sidecar information in the circumstance where the same key is defined in multiple files. That ordering is from the highest to lowest level in the filesystem hierarchy, with ordering from least to most number of entities within each directory. Therefore, if such a common key exists, it is the value in the file that is "closest" to the data file under consideration (lowest in the filesystem hierarchy, greatest number of shared entities) that takes precedence."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the text here is intended as a more reader-friendly description.

I do understand a motivation here, but IMHO it causes duplication and due to different wording thus potential ambiguity. I do not think any rewording would help as it would keep that duplication in place.

That is why my suggestion is to remove this "reader-friendly" version (starting from "That ordering is from the highest...") from here and just refer to a singular version in the appendix specification. Prior sentence might also want to be adjusted since I do not think BIDS defines "sidecar information" (metadata? ;)) .

Re "reader-friendly": some BIDS starter or blog posts could provide "user-friendly" description for IP if there is need, while leaving "specification" to provide condensed motivation (ideally) a singular clear version of how it works.

Just my 1c.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the idea of essentially "deferring" the reader to the IP appendix for any concept that is too difficult to convey in friendly text and/or runs the risk of imprecise duplication. I'll give this a try.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c9c767d.

In the case of [JSON files](#key-value-files-dictionaries), this involves
loading *all* key-values from *all* applicable files;
where a key is present in multiple files,
only the value for that key in the highest precedence file will be preserved.

The precise rules governing the behaviour of the Inheritance Principle
are described in detail in [Appendix XV](#99-appendices/15-inheritance-principle.md).

### Examples

Example 1: Single metadata file applicable to multiple data files

1. For a given data file, any metadata file is applicable to that data file if:
1. It is stored at the same directory level or higher;
1. The metadata and the data filenames possess the same suffix;
1. The metadata filename does not include any entity absent from the data filename.

1. A metadata file MUST NOT have a filename that would be otherwise applicable
to some data file based on rules 2.b and 2.c but is made inapplicable based on its
location in the directory structure as per rule 2.a.

1. There MUST NOT be multiple metadata files applicable to a data file at one level
of the directory hierarchy.

1. If multiple metadata files satisfy criteria 2.a-c above:

1. For [tabular files](#tabular-files) and other simple metadata files
(for instance, [`bvec` / `bval` files for diffusion MRI](#04-modality-specific-files/01-magnetic-resonance-imaging#required-gradient-orientation-information)),
accessing metadata associated with a data file MUST consider only the
applicable file that is lowest in the filesystem hierarchy.

1. For [JSON files](#key-value-files-dictionaries), key-values are loaded
from files from the top of the directory hierarchy downwards, such that
key-values from the top level are inherited by all data files at lower
levels to which it is applicable unless overridden by a value for the
same key present in another metadata file at a lower level
(though it is RECOMMENDED to minimize the extent of such overrides).

Corollaries:

1. As per rule 3, metadata files applicable only to a specific participant / session
MUST be defined in or below the directory corresponding to that participant / session;
similarly, a metadata file that is applicable to multiple participants / sessions
MUST NOT be placed within a directory corresponding to only one such participant / session.

1. It is permissible for a single metadata file to be applicable to multiple data
files at that level of the hierarchy or below. Where such metadata content is consistent
across multiple data files, it is RECOMMENDED to store metadata in this
way, rather than duplicating that metadata content across multiple metadata files.
{{ MACROS___make_filetree_example(
{
"sub-01": {
"anat": {
"sub-01_part-real_T2starw.nii.gz": "",
"sub-01_part-imag_T2starw.nii.gz": "",
"sub-01_T2starw.json": "",
}
}
}
) }}

1. Where multiple applicable JSON files are loaded as per rule 5.b, key-values can
only be overwritten by files lower in the filesystem hierarchy; the absence of
a key-value in a later file does not imply the "unsetting" of that field
(indeed removal of existing fields is not possible).
For file `sub-01_T2starw.json`, there does not exist an immediately corresponding data file
with the same basename but different file extension; for instance `sub-01_T2starw.nii.gz`.
It is however applicable to both data files `sub-01_part-real_T2starw.nii.gz` and
`sub-01_part-imag_T2starw.nii.gz`, since it possesses the same suffix "`T2starw`"
and its entities are a subset of those present in the data filename in both cases.
This storage structure is appropriate for such data given that the real and imaginary
components of complex image data would be yielded by a single execution of an MRI sequence,
with a fixed common set of acquisition parameters.

Example 1: Demonstration of inheritance principle
Example 2: Single data file with multiple applicable metadata files

<!-- This block generates a file tree.
A guide for using macros can be found at
Expand Down Expand Up @@ -640,84 +642,16 @@ Contents of file `sub-01/func/sub-01_task-rest_acq-longtr_bold.json`:
}
```

When reading image `sub-01/func/sub-01_task-rest_acq-default_bold.nii.gz`, only
metadata file `task-rest_bold.json` is read; file
`sub-01/func/sub-01_task-rest_acq-longtr_bold.json` is inapplicable as it contains
entity "`acq-longtr`" that is absent from the image path (rule 2.c). When reading image
`sub-01/func/sub-01_task-rest_acq-longtr_bold.nii.gz`, metadata file
`task-rest_bold.json` at the top level is read first, followed by file
`sub-01/func/sub-01_task-rest_acq-longtr_bold.json` at the bottom level (rule 5.b);
When reading image `sub-01/func/sub-01_task-rest_acq-default_bold.nii.gz`,
only metadata file `task-rest_bold.json` is read;
file `sub-01/func/sub-01_task-rest_acq-longtr_bold.json` is inapplicable as it contains
entity "`acq-longtr`" that is absent from the image path.
When reading image `sub-01/func/sub-01_task-rest_acq-longtr_bold.nii.gz`,
metadata file `task-rest_bold.json` at the top level is read first,
followed by file `sub-01/func/sub-01_task-rest_acq-longtr_bold.json` at the bottom level;
the value for field "`RepetitionTime`" is therefore overridden to the value `3.0`.
The value for field "`EchoTime`" remains applicable to that image, and is not unset by its
absence in the metadata file at the lower level (rule 5.b; corollary 3).

Example 2: Impermissible use of multiple metadata files at one directory level (rule 4)

<!-- This block generates a file tree.
A guide for using macros can be found at
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
-->
{{ MACROS___make_filetree_example(
{
"sub-01": {
"ses-test":{
"anat": {
"sub-01_ses-test_T1w.nii.gz": "",
},
"func": {
"sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_bold.json": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.json": "",
}
}
}
}
) }}

Example 3: Modification of filesystem structure from Example 2 to satisfy inheritance
principle requirements

<!-- This block generates a file tree.
A guide for using macros can be found at
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
-->
{{ MACROS___make_filetree_example(
{
"sub-01": {
"ses-test":{
"sub-01_ses-test_task-overtverbgeneration_bold.json": "",
"anat": {
"sub-01_ses-test_T1w.nii.gz": "",
},
"func": {
"sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz": "",
"sub-01_ses-test_task-overtverbgeneration_run-2_bold.json": "",
}
}
}
}
) }}

Example 4: Single metadata file applying to multiple data files (corollary 2)

<!-- This block generates a file tree.
A guide for using macros can be found at
https://github.com/bids-standard/bids-specification/blob/master/macros_doc.md
-->
{{ MACROS___make_filetree_example(
{
"sub-01": {
"anat": {},
"func": {
"sub-01_task-xyz_acq-test1_run-1_bold.nii.gz": "",
"sub-01_task-xyz_acq-test1_run-2_bold.nii.gz": "",
"sub-01_task-xyz_acq-test1_bold.json": "",
}
}
}
) }}
The value for field "`EchoTime`" remains applicable to this image,
and is not unset by its absence in the metadata file at the lower level.

## Participant names and other labels

Expand Down
Loading