Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Inheritance principle: Relaxation allowing multiple files per directory #1003

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

Lestropie
Copy link
Collaborator

@Lestropie Lestropie commented Feb 7, 2022

Closes #259.
Was raised in discussion more recently in #946, the changes contained within were intended to facilitate this new set of changes.
For anyone curious about the underlying motivation for the change, see bids-standard/bids-bep016#32.
Listing as a draft PR in search of feedback or alternative proposals.

This proposal would necessitate changes to the validator. BIDS Apps would also need to be altered to properly load metadata; though I have attempted to have the specification explain the principle in a way that can translate fairly directly to software implementations.


I have a hypothesis regarding the origins of a particular detail of the inheritance principle, being the requirement that only one file may be applicable at any given level of the filesystem hierarchy (listed as rule 4 following merge of #946). When there are multiple JSON files deemed applicable to a given data file, and some of those JSON files may contain different values for the same key, then the order in which you load those files will influence the final contents of the dictionary, since any pre-existing entries will naturally be overwritten during the sequential file loading & merging process. It is therefore necessary for the order in which those files are to be loaded to be unambiguous. Currently, the specification enforces this ordering solely through the filesystem hierarchy structure. This is simple, but prohibitive and in some cases counter-intuitive.

  • The "prohibitive" aspect may have not been of particular consequence thus far, but I have found in my pursuit of BEP016 that it completely precludes my preferred structure for the storage of diffusion MRI models.
    It also theoretically limits the scope of complexity of JSON inheritance, since one cannot have more applicable JSON files than there are sub-directories.

  • For the "counter-intuitive" point, I can demonstrate this using the current example as presented in the specification:

    Impermissible structure:

    sub-01/
        ses-test/
            anat/
                sub-01_ses-test_T1w.nii.gz
            func/
                sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz
                sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz
                sub-01_ses-test_task-overtverbgeneration_bold.json
                sub-01_ses-test_task-overtverbgeneration_run-2_bold.json
    

    Proposed alternative permissible structure:

        sub-01/
            ses-test/
                sub-01_ses-test_task-overtverbgeneration_bold.json
                anat/
                    sub-01_ses-test_T1w.nii.gz
                func/
                    sub-01_ses-test_task-overtverbgeneration_run-1_bold.nii.gz
                    sub-01_ses-test_task-overtverbgeneration_run-2_bold.nii.gz
                    sub-01_ses-test_task-overtverbgeneration_run-2_bold.json
    

    While the alternative is unambiguously clear regarding the order of metadata file loading, the current principle rules have necessitated moving data exclusively applicable to func/ data out of the func/ directory to do so. Therefore I personally subjectively prefer the first example, even though it's currently an invalid BIDS dataset.


This PR proposes removal of the requirement that only one metadata file be applicable at any directory level, but establishes the more general logic that must be satisfied for the order of metadata loading to be unambiguous, which I argue above was the original underlying purpose of the one-file-per-directory rule. The examples are further revised to demonstrate the consequences of the new proposed rule.

Edit: Note that under the proposed principle, the first example above becomes permissible (and indeed IMO should be preferred over the second).

Am open to feedback of any form and magnitude.

Potential points for discussion:

  1. Originally my plan was to enforce a strict ordering of all metadata basenames, where each successive file must strictly contain a superset of the entities of the previous file. However on reflection I think this would itself be unnecessarily prohibitive, specifically because when it comes to the ordering of these files, the filesystem hierarchy can be made "of higher precedence" than the relative ordering of multiple files within a single directory. As a demonstration, see Example 2, where a data file can inherit contents from both /task-rest_bold.json and /sub-01/sub-01_bold.json: when considering file basenames only, the latter is not a superset of the former, but because it resides within a sub-directory of the location of the former, their relative ordering can still be unambiguous.

  2. I have seen examples where, in the BIDS root directory, there may be a file such as task-rest_bold.json, presumably containing content that applies to all resting-state fMRI acquisitions across all subjects (& sessions). But if there are metadata key-values that apply to all fMRI acquisitions in the BIDS dataset, regardless of task, is it permissible to have a file in the root directory simply called bold.json? The number of entities in the filename is zero, but it seems to me to otherwise be adequately formed, and there's a reasonable justification for its use.

Relates to discussion in bids-standard#259, and was also raised in bids-standard#946.
Instead of enforcing a unique order of metadata file inheritance via filesystem hierarchy only, the logic behind the inheritance principle is generalised to permit multiple files per directory whilst still ensuring that the order of metadata file loading is still unique.
@Lestropie
Copy link
Collaborator Author

Lestropie commented Feb 15, 2022

Occurred to me that another potentially useful exemplar to show for inheritance would be two images split across "_part-mag" and "_part-phase" with a single JSON file being applicable to both of them.

Edit: See 233c0b4 for proposal; though had to use "_part-real" and "_part-imag" in order to keep the example simple.

New example serves only to demonstrate how one metadata file can be applicable to multiple data files; this is of a lower "complexity" of logic than what was previously the first example.
Note that "_part-(real|imag)" was used for this example instead of the more common "_part-(mag|phase)" as the latter necessitates the specification of units for only the phase component, which would therefore require more than one metadata file to exploit the inheritance principle.
@Lestropie
Copy link
Collaborator Author

Hi @bids-standard/maintainers,

For ongoing work on BEP016, it would be useful to get some indication of whether or not this proposal is supported in principle. It drastically changes the landscape of possible derivative data structures (bids-standard/bids-bep016#32), which then has further downstream consequences for derivatives specification, and so is somewhat of a block point. Even if the content of the PR requires revision, or topics such as inclusion in major vs. minor version increment or corresponding changes to validator require more discussion, basic feedback on acceptability of the concept would be much appreciated.

Thanks
Rob

@effigies
Copy link
Collaborator

I believe you are right about your determination that the cause of the restriction was an ambiguity in resolution, and that the solution aimed for simplicity. It should be noted that the inheritance principle as a whole was not universally popular and so simple to the point of prohibitive was preferred even to an unambiguous algorithm.

I am personally of two minds on this. On the one hand, I agree aesthetically with your assessment of the counter-intuitive example and think that we could implement your proposed algorithm (or another similar one) without much difficulty; I think it could be done in a backwards-compatible way so that the interpretation of a currently valid dataset would not change.

On the other, I suspect that we're going to find some resistance to the idea of any increase in complexity, and there will be a watershed in all of the packages that implement this algorithm where datasets that follow the new principle will be correctly interpreted in the new versions and incorrectly interpreted in the old.

I don't think that this is a point where the maintainers can/should be the deciding vote. I think we need two things:

  1. Community buy-in. Do people see this as clarifying, and is the clarity worth the complexity? It would be good to ping the mailing list (if you haven't already) to get feedback.
  2. Do tool developers/maintainers agree that the algorithm proposed would be readily implemented in their packages? I know that the BIDS validator and PyBIDS implement the inheritance principle; I haven't checked BIDS-MATLAB (cc @Remi-Gau) and I know that BIDSTools.jl does not implement inheritance at all.

If these criteria are met, then we should coordinate migration for the tools including possibly handling datasets differently based on BIDSVersion.

@Lestropie
Copy link
Collaborator Author

I suspect that we're going to find some resistance to the idea of any increase in complexity

This is a tough one. I agree that the policies of the principle itself are more complex. However I think that there is a certain intuition to the logic here, in terms of what applies to what and in what order; the text is complex by necessity in order to systematize it, but for most users it's the exemplars that convey the principle.

there will be a watershed in all of the packages that implement this algorithm where datasets that follow the new principle will be correctly interpreted in the new versions and incorrectly interpreted in the old.

Ideally, a dataset that exploits the capabilities of the new principle would fail validation if tested based on the old principle, since it explicitly forbids multiple applicable files per directory level. Correct diagnosis of such a failure would however be predicated on the dataset correctly specifying the BIDS version to which it conforms, and the validator checking against the maximal BIDS version for which any relevant software tool to be executed has been written. At the very least, any tool should probably issue a general-purpose warning if the BIDS version of the dataset it is executing against is greater than that for which the tool has been written / tested. But if no such appropriate checking is done, and a tool is nevertheless executed against this new data, it is wholly implementation-dependent as to what will actually happen; that is indeed a concern, and is part of why I suspected that even if accepted it may have to wait for BIDS 2.0. rather than 1.x.0.

Do people see this as clarifying, and is the clarity worth the complexity?

I doubt that many would see the revised principle as "clarifying", and indeed that's not its intention. It's primarily about facilitation of derivative data structures, which in many cases will be inherently more complex than raw data and therefore support for complexity becomes requisite. But, as above, I do think that there's a certain intuitiveness about the proposal, and indeed it can resolve counter-intuitive structures necessitated under the current principle (as per the example in the OP; note I noticed that I forgot to explicitly say that with the proposed revised principle, the first example becomes permissible).

Another factor I'm thinking about as I write this is adding some information to the proposed Example 3 (complex inheritance scenario). Currently the text only explains, for each data file, the applicable metadata files and the order in which they should be read. I could conversely define in that example, for each metadata file, the list of data files to which it applies. This may help not only to clarify the principle in the specification, but also to highlight the power provided by the proposal in defining information applicable to multiple data files only once rather than duplicating. This IMO is the fundamental intent of the inheritance principle, but its utility is handicapped by the constraint of one file per directory level.

It would be good to ping the mailing list (if you haven't already) to get feedback.

Yes, I'll write something for that audience. The consequences of the proposed revision for derivative data structures (bids-standard/bids-bep016#32) is likely too much of a read to engage many people (even maintainers); most there will "just want BIDS to work for derivative data in the future", and won't care how it works. So I'll try to tailor something specific.

Thanks for the thoughts 👍

- Previously it was shown, for each data file, the order in which the set of applicable metadata files would be loaded. This commit introduces the converse case: for each metadata file, the set of data files to which its contents are applicable.
- Further added to this example case a second subject with one series in one session, as this was necessary to provide disambiguation in applicability between files bold.json and sub-01/sub-01_bold.json.
@sappelhoff sappelhoff added opinions wanted Please read and offer your opinion on this matter discussion ongoing discussion labels Mar 25, 2022
@Remi-Gau
Copy link
Collaborator

2. Do tool developers/maintainers agree that the algorithm proposed would be readily implemented in their packages? I know that the BIDS validator and PyBIDS implement the inheritance principle; I haven't checked BIDS-MATLAB (cc @Remi-Gau) and I know that BIDSTools.jl does not implement inheritance at all.

My 2 cents on this (very quickly so there might be some subtleties I have missed):

  • The inheritance principle is partially implemented in bids matlab.

There is an old bug related to it that has been on the to do list for ages and the recent clarification in the wording of the inheritance principle (thanks again for that @Lestropie) gave me some motivation to try to fix it (though not enough motivation to ACTUALLY do it. :-) ).

@Remi-Gau
Copy link
Collaborator

While the alternative is unambiguously clear regarding the order of metadata file loading, the current principle rules have necessitated moving data exclusively applicable to func/ data out of the func/directory to do so.

My knee jeark reaction to this was "so what?" because there are some many BIDS datasets that actually make use of the inheritance principle to store their bold or T1w metadata in the root folder of the dataset anyway, so clearly out of the anat or func folder anyway.

What I would like to know is:

  • is there something that this relaxing of the inheritance principle allows us to do that we could not do with the current form of the principle?

My hunch is "no" except that the current version of the inheritance principle will potentially lead to more JSON files.

  • how human readable is this new version going to be?

I eat / drink / sleep BIDS almost 24/7 and I still get confused by some BIDS valid inheritance. Not sure this will make the life simpler for the casual user.

  • how big is the need for this (current or anticipated)? Are we in the 80% or the 20% of the pareto principle?

@Lestropie
Copy link
Collaborator Author

is there something that this relaxing of the inheritance principle allows us to do that we could not do with the current form of the principle?

Yes. But to appreciate fully, as well as what the prospective alternatives look like if this proposal is not accepted, requires even more reading: bids-standard/bids-bep016#32. I'll make an attempt at summarising / contextualising more briefly here, but I can only compress so much.

For "basic forms" of neuroimaging data, one simply has a bunch of metadata, and a single chunk of data, where those data are stored in some file format appropriate to the nature of what is being stored (e.g for intensities on a 3D Cartesian voxel grid, potentially with multiple such volumes, we use NIfTI). The inheritance principle simply facilitates spreading the metadata across multiple files, e.g. to account for cases where identical metadata contents are applicable to multiple data files, and you don't want to unnecessarily duplicate. But there is some form of independence between data files, which are all self-contained and self-sufficient (in the absence of compulsory metadata key-values), and the inheritance principle, which applies specifically to the partitioning of metadata.

In the space of derivatives, this independence breaks down. I'm working in the space of diffusion MRI models for BEP016, but I think there are concepts that could carry over to many other prospective derivatives also (whether this exceeds 20%... 🤷‍♂️).

In such cases, it is no longer guaranteed that all of the relevant data can be encapsulated into a single data file. It may instead be distributed across many files. And indeed those files could have different dimensionalities in the case of NIfTI, or they could even have different file extensions in hypothetical exotic cases. But there is a logical relationship across such in that those data files all originate from the same model. So it makes sense for there to be a metadata file that provides information relevant to the model: e.g. the data to which it was fit, what parameters were used, a URL describing the model.

But the individual data files encoding the model output may then have their own requisite metadata key-values to facilitate correct interpretation. For instance, from a single diffusion model fit, you could have one file that provides 3D intensity data corresponding to some voxel-aggregate parameter, and another file that provides a parameter for multiple crossing fibre populations within each voxel, and additionally stores values across bootstrap realisations rather than an aggregate of such. Both of these data files have requisite metadata relating to the nature of the data contained within that must be read in order to facilitate their correct interpretation (e.g. What parameter is encoded? Which image axis corresponds to bootstrap realisations? What reference axes are used when defining fibre orientations?), but this information differs between / is applicable to only a subset of the two data files.

To me, in this circumstance, it makes sense to have three JSON files: one for the model itself, and one for each of the two data files, reflecting the hierarchical nature of the contents of the model. But that would violate the current state of the inheritance principle, specifically because it forbids multiple applicable metadata files within one level of the filesystem hierarchy.

While there are multiple candidate solutions to BIDS Derivatives definitions to circumvent this limitation, e.g.:

  1. Utilise the filesystem hierarchy, with e.g. a sub-directory within dwi/ containing the individual data files and corresponding metadata files for the model, and a single metadata file within dwi/ corresponding to the sub-directory;
  2. Same as 1., but using a tarball for the model contents rather than a sub-directory;
  3. Having the specification document hard-code the requisite data files and corresponding interpretations for a finite set of specific models, rather than describing how any such model (and derivatives thereof) should encode the information requisite for their interpretation;

, in my humble subjective opinion they are either less elegant or less future-proof. If a different solution to my current preference were to be pursued, it would be because of this component of the inheritance principle, which, as I explained, I believe to have been unnecessary in the first place.

Not sure this will make the life simpler for the casual user.

This is primarily motivated by getting BIDS Derivatives out of one BIDS App and into another BIDS App. Indeed the novel capabilities afforded by the proposed change would not be exploited for raw data by all but the most savvy users.

The case of magnitude & phase data, where Units may be different between the two images but a lot of other data would be equivalent between them, is maybe the most likely. But even then most users would just rely on e.g. dcm2niix to generate the data, in which case a lot of shared metadata will just be duplicated across the two parts. So the only time the proposed change would become a factor would be if some BIDS conversion software were explicitly changed to exploit it. And even then I'd argue that there is a certain intuition to the interpretation of such data.

@Lestropie
Copy link
Collaborator Author

I also noted I missed one of @effigies's questions:

Do tool developers/maintainers agree that the algorithm proposed would be readily implemented in their packages?

Over and above BIDS validation, a BIDS App / library that supports the inheritance principle should ideally still be checking for the potential presence of multiple applicable metadata files within a single filesystem hierarchy level. The only difference between the principle as it currently stands, and this proposal, is that instead of immediately erroring out at that point, it instead defines the order in which those files should be loaded (and indeed the rule text translates fairly directly to corresponding code implementation). I'm certainly not arguing that it's trivial, but it's not the hardest programming task I've seen. And I think that most Apps would instead rely on one of a small set of API libraries to do this job, in which case their implementations could be made robust and their behaviour could be checked thoroughly (e.g. by instantiating the complex inheritance example from the specification in synthetic data and making sure that the relationships between data and metadata files are satisfied).

@sappelhoff
Copy link
Member

Thanks a lot for describing the problem so clearly!

While there are multiple candidate solutions to BIDS Derivatives definitions to circumvent this limitation

What about

4. Not using inheritance at all

Are you explicitly excluding this option for a reason, or did you just not list it because "it's obvious"?

@Remi-Gau
Copy link
Collaborator

Random thought that crossed my mind.

Could this "mutiple inheritance" be made 'easier' and more explicit by using an "intendedFor" field in some of your derivatives JSON files?

Still need to sit down to go through your answer, so this could be a completely stupid / impractical suggestion.

robertoostenveld added a commit to fieldtrip/fieldtrip that referenced this pull request Mar 30, 2022
…tional changes

triggered by bids-standard/bids-specification#1003 I checked how merging of JSON (and other) structures happens in FieldTrip. I figured that making the function names slightly more consistent would be a good step towards potential future improvements.
@Lestropie
Copy link
Collaborator Author

  1. Not using inheritance at all

I'm not sure that it ever actually occurred to me TBH.

In the case of purely informative but non-requisite metadata, there would be little harm in just duplicating those data across multiple metadata files. However there may be instances where there are two classes of compulsory metadata: those applicable to the model as a whole, and those applicable to individual files representing the model only. In this circumstance, if one were to not use any inheritance, any App reading those data would then be obliged to verify that any data corresponding to the former case is precisely identical across all metadata files. So this approach would only be deferring the "hierarchy of metadata" issue from the specification to the Apps.

@VisLab
Copy link
Member

VisLab commented Apr 1, 2022

  1. Not using inheritance at all

A nightmare of maintainability violates DRY (don't repeat yourself).

@Lestropie
Copy link
Collaborator Author

Another thought that crossed my mind in this respect, given the concerns regarding simplicity / accessibility:

The inheritance principle, following #946 and accentuated further by this proposal, is quite formulaic and prescriptive, whereas much of the surrounding text is moreso descriptive. I've tried to ensure that the text is highly accurate and precise and devoid of ambiguity, but not everyone operates so clinically.

One possibility to consider would be to create an appendix page dedicated to the inheritance principle, where the rules are enumerated and the more complex inheritance example proposed in this PR is provided. The specification text within "Common Principles" could then be simplified and modified in tone, explaining the basic concept along with an example or two, but deferring to the appendix as to precisely what is vs. is not permissible. In particular the forbidden case regarding ambiguity of sequential inheritance order does not need to be read by the vast majority of those interacting with the specification; it's only really necessary to make software implementations of such well-posed.

@sappelhoff
Copy link
Member

create an appendix page dedicated to the inheritance principle

I like your suggestion and I think it addresses the concern of "over-complicating"

It would be good to ping the mailing list (if you haven't already) to get feedback.

Yes, I'll write something for that audience. The consequences of the proposed revision for derivative data structures (bids-standard/bids-bep016#32) is likely too much of a read to engage many people (even maintainers); most there will "just want BIDS to work for derivative data in the future", and won't care how it works. So I'll try to tailor something specific.

Are you still planning to do this? I think getting some more community opinions is important now. In our last maintainers meeting we discussed that this is too big a thing to wave through (in any way) between the small group of people who have looked at this.

@Lestropie
Copy link
Collaborator Author

Are you still planning to do this?

Will do once I get my head back above water 😑

Dedicated appendix for inheritance principle, where the rules, corrollaries and complex example are presented.
For "Common principles", a more reader-friendly description of the principle is presented, along with the basic examples.
Relevant to discussion in bids-standard#1003.
@marcelzwiers
Copy link

marcelzwiers commented May 13, 2022

I vote for option 4: not using inheritance at all!

Although it may be useful when you want to build a BIDS dataset manually, it in my view is otherwise an overly complex principle that serves no meaningful purpose really. It requires continuous and error-prone querying of the folder tree, it is just a nightmare and I propose to make a to tool convert BIDS repositories to have non-inheritance (that would hardy cost disk space)

@robertoostenveld
Copy link
Collaborator

Just pinging @sjeung and @JuliusWelzel: this is the ongoing work on general BIDS principles that I mentioned in relation to BIDS-motion. Please have a look and chime in if you feel like.

@Lestropie
Copy link
Collaborator Author

For anyone who is either invested in this concept or could not understand the justification for the proposed change, please see bids-standard/bids-bep016#50.

Copy link
Collaborator

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it permissible to have a file in the root directory simply called bold.json? The number of entities in the filename is zero, but it seems to me to otherwise be adequately formed, and there's a reasonable justification for its use.

I would say "it should be permissible" . Thanks for catching that!

In my review I have touched upon relaxing ordering requirement in case of multiple files with the same number of entities.

And I think that this PR, with or without aforementioned suggestion, would also at large address the #1182 I have recently filed (Thank you @Lestropie for making Inheritance principle better and keeping an eye on related issues!!!).

Overall I would say that I would Approve such a PR if it was taken out of draft mode and after reviewing examples etc with greater attention to them.

Where more than one metadata file is applicable to a data file,
the contents of the file closest in filesystem position to the data file
(lowest in the directory hierarchy, most entities in the file name)
takes precedence.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already a specific rule, which is incomplete (no description for order at the same level) and somewhat ambiguous -- "takes precedence" as the only file loaded, or loaded last (as what it is currently, but not explained here). Thus I would just remove it from this "IP intro"

Copy link
Collaborator Author

@Lestropie Lestropie Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already a specific rule

Not sure what is intended here: it is an enumerated rule in the appendix, but the text here is intended as a more reader-friendly description.

which is incomplete (no description for order at the same level)

This is tough to get the right balance between being reader-friendly and yet accurate.

I'll try a complete alternative:
"Where more that one metadata file is applicable to the data file, all of those metadata files MUST be loaded, in a specific order dictated by the Inheritance Principle. This ordering influences the contents of sidecar information in the circumstance where the same key is defined in multiple files. That ordering is from the highest to lowest level in the filesystem hierarchy, with ordering from least to most number of entities within each directory. Therefore, if such a common key exists, it is the value in the file that is "closest" to the data file under consideration (lowest in the filesystem hierarchy, greatest number of shared entities) that takes precedence."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the text here is intended as a more reader-friendly description.

I do understand a motivation here, but IMHO it causes duplication and due to different wording thus potential ambiguity. I do not think any rewording would help as it would keep that duplication in place.

That is why my suggestion is to remove this "reader-friendly" version (starting from "That ordering is from the highest...") from here and just refer to a singular version in the appendix specification. Prior sentence might also want to be adjusted since I do not think BIDS defines "sidecar information" (metadata? ;)) .

Re "reader-friendly": some BIDS starter or blog posts could provide "user-friendly" description for IP if there is need, while leaving "specification" to provide condensed motivation (ideally) a singular clear version of how it works.

Just my 1c.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the idea of essentially "deferring" the reader to the IP appendix for any concept that is too difficult to convey in friendly text and/or runs the risk of imprecise duplication. I'll give this a try.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See c9c767d.

Comment on lines 34 to 35
These lists are concatenated in order of highest to lowest level in the
filesystem hierarchy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These lists are concatenated in order of highest to lowest level in the
filesystem hierarchy.
Levels of the filesystem hierarchy considered from the highest (top directory of the dataset) to lowest (folder with data files for a subject/session) level.
  • list provides a somewhat "implementation dependent" detail, and not even mentioned in prior sentences
  • not sure if we should try to similarly dissolve Set and just say "Applicable metadata files ..."?

Also wanted to provide explicit hint on highest/lowest, since depending how you would look for filesystem tree, e.g. root of a tree in nature is at its lowest level, and you climb up to tree to get higher.

Copy link
Member

@VisLab VisLab Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lowest to highest has always been confusing. I now understand the clarification.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I used "list" to refer to the set of applicable metadata files within a single level of the filesystem hierarchy, and there is then an ordering of those lists, but failed to define such.

I've been constantly battling to find a reader-friendly way to convey the fact that there are two "sorting mechansims": firstly down the filesystem hierarchy, and secondly within individual directories. I was trying to convey the idea that you first do a scan through to find all applicable metadata files, and then perform a sort on them prior to loading their contents, as opposed to just traversing and loading as you go.

Also wanted to provide explicit hint on highest/lowest, since depending how you would look for filesystem tree, e.g. root of a tree in nature is at its lowest level, and you climb up to tree to get higher.

  • I think I've had this discussion once before w.r.t. BIDS: The convention around here was that root = highest and one traverses down, and with my first contribution I had it the other way around.
  • I don't know the extent to which this is a globally accepted standard; it doesn't appear in "common principles", and maybe it should.
  • I'm slightly hesitant to briefly provide exemplars of highest / lowest levels given that there is some chance of the depth of the filesystem tree in BIDS increasing; see Decisions on BIDS derivatives structure bids-bep016#50.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been constantly battling to find a reader-friendly way to convey the fact that there are two "sorting mechansims": firstly down the filesystem hierarchy, and secondly within individual directories.

Just an observation which might (or not) help out: are there really multiple mechanisms or is it just manifestation of one? Due to the fact that sub- and ses- are also entities, and they are stripped off the filename as you go higher (closer to root) in the hierarchy -- sorting by number of entities generalizes across filesystem hierarchy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maaaybe. As per:

I'm slightly hesitant to briefly provide exemplars of highest / lowest levels given that there is some chance of the depth of the filesystem tree in BIDS increasing

, it would depend on whether or not such an expansion were to occur and whether all entities present at higher levels would be enforced to propagate down to all files at lower levels. It's possible that file names could start getting frustratingly long in such circumstances. So linking bids-standard/bids-bep016#50 again.

I suppose my thinking was that:

  • If existing users are familiar with the directory-based inheritance, and that is a much simpler process to understand, then it should be presented as a two-stage process, such that only "power users" wishing to perform inheritance within a directory need concern themselves with this augmentation.
  • When it comes to software implementation, one is likely to perform the metadata inheritance as currently described: order within the highest filesystem level, then step down the filesystem, ordering within each level and then concatenating. It would not make sense to put all applicable files across all directories into a single list and then sort them, even if it would produce the same result.

I'll have a go at just improving the description, and we can re-assess.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See what you think of 290de6e.

src/99-appendices/15-inheritance-principle.md Outdated Show resolved Hide resolved
@yarikoptic
Copy link
Collaborator

I vote for option 4: not using inheritance at all!

I would invite everyone interested in discussing this option to do it in bids-standard/bids-2-devel#36 . BIDS 1.x has inheritance principle and thus IMHO it must stay and only get improved within 1.x series of BIDS.

@sappelhoff
Copy link
Member

BIDS 1.x has inheritance principle and thus IMHO it must stay and only get improved within 1.x series of BIDS.

just a note: option 4 above was not about getting rid of inheritance, but only to not use inheritance where it's currently (under present rules) not feasible and/or pretty. And it was more a point that we don't always need inheritance (we never NEED it, actually. It's just convenient sometimes)

@yarikoptic
Copy link
Collaborator

yarikoptic commented Aug 9, 2022

Thank you @sappelhoff for the clarification -- I see the rationale for 4. better now. But I would like to "defend" convenience just a little bit: there is a reason why I listed it among "Principles" of http://centerforopenneuroscience.org/. "Convenience" is IMHO often under-appreciated, and I will just quote the summary I gave there:

[Convenience] Never was a buzz word (yet), but it is one of the critical drivers summarizing reliability, accessibility etc. Only when software and data are conveniently available we can talk about wide-scale adoption and reproducibility.

We work to make software and data available to the community, so that everyone could harness available resources regardless of their original complexity or size.

and I think "convenience" (as far as human agents involved in working with) is one of the driving forces behind BIDS, and often "convenience" crosses the boundary of "makes it pragmatic". IMHO "Inheritance principle" is one of those, in particular with all the .json's for _scans etc files which require presence in identical form across multiple sub-folders.

any file later in the ordering does not imply the "unsetting" of that key-value.

1. Removal of key-values present in files earlier in the ordering based on the content
of files later in the ordering is not possible.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have any changes for this PR planned @Lestropie or may be it should be taken out of Draft mode and gain a chance to be Reviewed more closely and merged? ;-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have made some proposed changes based on comment threads above.

I'm slightly unsure about pulling this out of draft given #1195. My suspicion is that maintainers would prefer to be making a decision between this PR and #1195 up-front, rather than merging this PR and then contemplating even further revisions to the IP; especially if the validator needs to be compatible with both for different BIDS versions. At some point I'd like to make a draft PR for #1195 using the code of this PR as the base, which would highlight the specific change being proposed in #1195.

I'm also very interested to hear from anyone responsible for the validator or popular BIDS APIs, since the proposed changes are highly consequential there.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I will then let you decide first which way to proceed (undraft this PR, or create one with #1195), since in principle I then should not even comment on "Draft" right, and there is only fractional value return from discussing something which has no potential on being merged ATM ;)

FWIW -- I consider this PR already a substantial improvement and #1195 resolution is not necessarily needed to proceed forward with it.

And FWIW - awaiting for feedback from other developers, in particular for a PR in Draft mode, might take another year.

Based on conversation in bids-standard#1003 (specifically comment bids-standard#1003 (comment)).
Details about the hierarchical nature of metadata file loading is abstracted away, instead deferring more technical details about the Principle to the corresponding Appendix.
Text describing the ordering of multiple applicable metadata files has been better systematized in an attempt to better explain the hierarchical nature of said ordering.
Relates to discussion in bids-standard#1003, specifically comment bids-standard#1003 (comment).
it within this list.

1. These lists are concatenated in order of highest to lowest level in the
filesystem hierarchy.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that 'highest' vs 'lowest' is ambiguous, please consider suggestion I gave before: #1003 (comment)

@Lestropie
Copy link
Collaborator Author

For anyone attempting to keep up with this discussion, the enhancement proposed by @yarikoptic has been attempted in Lestropie#5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ongoing discussion inheritance opinions wanted Please read and offer your opinion on this matter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relaxation of inheritance principle?
9 participants