Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] unintuitive behavior with split files #1064

Open
eort opened this issue Sep 2, 2022 · 6 comments
Open

[BUG] unintuitive behavior with split files #1064

eort opened this issue Sep 2, 2022 · 6 comments
Labels

Comments

@eort
Copy link
Contributor

eort commented Sep 2, 2022

Description of the problem

Hi,

I think there are some problems with the way mne-bids handles split files, potentially in combination with symlinks. This report is based on this discussion on the forum.

Suppose you have split recordings and try to read them. There are following scenarios:

  1. You're filepath is a symlink to the actual file. You specify the split field (as you should), everything seems to work fine.
  2. You're filepath is a symlink to the actual file. You don't specify the split field (perhaps only a few of your recordings are large enough to be split), mne-bids, doesn't complain and initializes the read process. However, further down the road, mne-python throws an error, because it can't find the second part of the split file.
  3. You're filepath directly lead to the actual file. You specify the split field (as you should), everything seems to work fine.
  4. You're filepath directly lead to the actual file. You do not specify the split field, everything seems to work fine as well.

As you see there are two things going on here:

  1. mne-bids/mne-python finds a split file even if no split recording was specified.
  2. it fails to do that, if the file is a symlink.

The reason for the failure in 2. is that mne-bids resolves the symlink already, such that when mne-python later checks the next_fname field, it gets confused because the specified next_fname is base on the symlink and not that underlying file.

Apart from that, I have doubts whether it is desirable to silently read a file that wasn't requested. Perhaps that should be prevented in first place?

In any case, I think the current situation is a bit confusing. It shouldn't matter on whether a filename points to a file or a symlink whether the split field is required or not. I see two options:

  1. Don't resolve symlinks in mne-bids (mne-python does that anyway), making setting the split field fully optional
  2. Throw an error/warning in mne-bids whenever the split field "is forgotten"

As mentioned on the forum, it seems removing the .resolve() is sufficient for (1)

target_path = raw_path.resolve()

Steps to reproduce

# I tested it out private files. 
# If you want I can try to generate a complete MWE with the sample datasetset (next week)

# split specified, path to symlink
path = mb.BIDSPath(..., split=split)
raw = mb.read_raw_bids(path) # works

# split not specified, path to symlink
path2 = mb.BIDSPath(...)
raw2 = mb.read_raw_bids(path2) # fails

# split specified, path to actual file
path3 = mb.BIDSPath(..., split=split)
raw3 = mb.read_raw_bids(path) # works

# split not specified, path to actual file
path4 = mb.BIDSPath(...)
raw4 = mb.read_raw_bids(path2) # works

Expected results

unclear

Actual results

Actual files are read regardless of whether split is specified. Symlinks cannot be read if split is not specified.

Additional information

Running mne-bids 0.11.dev0 and mne-python 1.2.dev0

@eort eort added the bug label Sep 2, 2022
@agramfort
Copy link
Member

Yes please share a sample dataset we can use to test and better understand the issue

@eort eort changed the title [BUG] intuitive behavior with split files [BUG] unintuitive behavior with split files Sep 3, 2022
@eort
Copy link
Contributor Author

eort commented Sep 3, 2022

Here you go. Most of it has the purpose to generate the file structure necessary to reproduce it. I stumbled over this problem, because I use datalad which uses git-annex to store data. Therefore the original files have some hash-based file path, with a human readable symlink in the appropriate bids directory that points to the originals. I think the randomness of the hash path causes the problem, as finding the next split is not trivial.

So if you have datalad already installed, you can skip most of the code by simply creating a dataset from mne-sample-bids dataset and then to read the data in the different scenarios.

Let me know if anything is unclear

import mne
import mne_bids as mb
import os
import os.path as op
from mne import datasets


# first we need to create all sets of files
# one pair of split files that are real files
# and another  pair of split files that are symlinks

# load the sample data
out_path = op.join('temp', 'sub-01', 'ses-01', 'meg')

os.makedirs(out_path, exist_ok=True)
# we need to more directories where we will hide the original symlinks in later
os.makedirs('temp/foo1', exist_ok=True)
os.makedirs('temp/foo2', exist_ok=True)

raw_dir = op.join(datasets.sample.data_path(), 'MEG', 'sample')
raw = mne.io.read_raw(op.join(raw_dir, 'sample_audvis_raw.fif'))

# define the paths for the files to be written
out_file = op.join(out_path, 'sub-01_ses-01_task-real_run-01_meg.fif')
out_symlink = op.abspath(op.join(out_path, 
                                 'sub-01_ses-01_task-sym_run-01_meg.fif'))

# crop the raw and save them
raw.crop(0, 10)
raw.save(out_file, split_size="10MB", split_naming='bids')
raw.save(out_symlink, split_size="10MB", split_naming='bids')

# adjust paths to match the splits
split1_link_src = out_symlink.replace('_meg', '_split-01_meg')
split2_link_src = out_symlink.replace('_meg', '_split-02_meg')
split1_link_tgt = split1_link_src.replace('sym', 'link')
split2_link_tgt = split2_link_src.replace('sym', 'link')

# move the symlink files somewhere non-bids conforming
# I think it is important that the splits are in different directories
split1_link_src_new = split1_link_src.replace('sub-01/ses-01/meg', 'foo1')
split2_link_src_new = split2_link_src.replace('sub-01/ses-01/meg', 'foo2')
split1_link_tgt = split1_link_tgt.replace('link', 'sym')
split2_link_tgt = split2_link_tgt.replace('link', 'sym')

# mv the original files out of the meg directory
os.system(f'mv {split1_link_src} {split1_link_src_new}')
os.system(f'mv {split2_link_src} {split2_link_src_new}')

# generate symlinks
os.symlink(split1_link_src_new, split1_link_tgt)
os.symlink(split2_link_src_new, split2_link_tgt)

# the actual exercise: trying to read data
sub = '01'
ses = '01'
datatype ='meg'
run = '01'
root = 'temp'
split='01'

# path to symlink
task = 'sym'
# split not specified
path = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
                    datatype=datatype, run=run)
raw = mb.read_raw_bids(path) # fails

# split specified
path2 = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
                    datatype=datatype, run=run, split=split)
raw2 = mb.read_raw_bids(path2) # works

# path to real files
task= 'real'
# split not specified
path3 = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
                    datatype=datatype, run=run)
raw3 = mb.read_raw_bids(path3) # fails

# split specified
path4 = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
                    datatype=datatype, run=run, split=split)
raw4 = mb.read_raw_bids(path4) # fails

os.system("rm -r temp")

@agramfort
Copy link
Member

agramfort commented Sep 4, 2022 via email

@eort
Copy link
Contributor Author

eort commented Sep 5, 2022

Now I assume this was added cause users
wanted to avoid writing new files
hence keep the original pointers to the old naming in the storage server.

In this scenario, what would happen if the split field is specified in the path for reading? It seems that specifying it solves the issues with symlinks that I have encountered. If they also work for the situation you describe, one solution could be to require the split field when reading split files.

@agramfort
Copy link
Member

agramfort commented Sep 5, 2022 via email

@drammock
Copy link
Member

drammock commented Sep 5, 2022

similar issues have come up before, see mne-tools/mne-python#9221 and the fix (though apparently not enough of a fix?) in mne-tools/mne-python#9227.

cc @adswa in case she has some ideas here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants