Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better API for specifying return data of signals #376

Open
cx1111 opened this issue May 23, 2022 · 11 comments
Open

Better API for specifying return data of signals #376

cx1111 opened this issue May 23, 2022 · 11 comments

Comments

@cx1111
Copy link
Member

cx1111 commented May 23, 2022

The current rdrecord API is rather confusing when dealing with more complex records/signals. Namely: multi-frequency signals.

We should create a better API to allow users to:

  • Specify/know the exact return types/dimensions of the signals being read
  • Deal with single and multi-frequency records consistently.

In addition, the current API is also limiting when reading single-frequency records, as there is no way for users to explicitly specify whether they want 1d or 2d numpy arrays.

We current have:

def rdrecord(
where the only parameter that influences the return type is: smooth_frames

We've chatted about the issue here, where I suggested a new set of params:
#313 (comment)

What do people think of the suggestion of having two params: return_dims and smooth_frames? return_dims will apply for both single and multi-frequency signals, and smooth_frames will apply only to multi-frequency signals. Given the combination of the params, the return value of the signals will be:

(1, True): List of 1d arrays. Guaranteed to be same length.
(1, False): List of 1d arrays. Not guaranteed to be same length for multi-frequency signals.
(2, True): Single 2d array.
(2, False): Illegal combination.

@cx1111
Copy link
Member Author

cx1111 commented May 23, 2022

@tompollard @bemoody @alistairewj thoughts?

@bemoody
Copy link
Collaborator

bemoody commented May 24, 2022 via email

@cx1111
Copy link
Member Author

cx1111 commented May 24, 2022

Thanks. Some more thoughts:

  • Many users will want the 2d array to pass into convolutional neural networks or general multi-channel filter functions.
  • For multi-frequency records, the only way to avoid losing information is to return an array of 1d arrays.

I think it's worth it for our library to support both dimensions, despite the additional complexity. Regarding efficiency/performance, would either form perform a lot better?

I imagine for the ease of writing the functions, but at a large performance cost, we do have the option of always processing the signals as a list of 1d arrays, and then only reshaping into 2d before returning the value.

The function argument values can be required, and have defaults, so that we don't have to deal with the unspecified cases.

@bemoody
Copy link
Collaborator

bemoody commented May 24, 2022 via email

@cx1111
Copy link
Member Author

cx1111 commented May 25, 2022

Let's not worry about backwards compatibility here. The current API behavior is not ideal, and does not allow the user to get a predictable output.

I didn't think about the high-res option. I'm not sure how useful it would be, but we could provide it. We would then have combinations of return_dims and smooth_frames (or better name):

  • (1, None): List of 1d arrays. Not guaranteed to be same length for multi-frequency signals.
  • (1, 'average'): List of 1d arrays. Guaranteed to be same length.
  • (1, 'upsample'): List of 1d arrays. Guaranteed to be the same length. (For Record objects, what happens to fs?)
  • (2, None): Illegal combination.
  • (2, 'average'): Single 2d array.
  • (2, 'upsample'): Single 2d array.

We could also add an INFO logging statement that occurs when reading a multi-frequency record, so that it's more immediately obvious.

I realize that all of this only considers the final output array(s) (such as when calling the current rdsamp), not the intermediate steps and the object attributes used to represent the signals. Let me think about that.

@bemoody
Copy link
Collaborator

bemoody commented May 25, 2022 via email

@bemoody
Copy link
Collaborator

bemoody commented May 27, 2022

An idea I mentioned in the meeting today was to allow the p_signal (and e_p_signal, etc.) attributes to be lazily loaded, perhaps if you call rdrecord with stream=True or something like that.

A very rough sketch might look like this:

def rdrecord(record_name, sampfrom=0, sampto=None, stream=False):
    record = rdheader(record_name)
    if sampto is None:
        sampto = record.sig_len
    if stream:
        record.p_signal = Lazy2DSignalArray(record, sampfrom, sampto)
    else:
        record.p_signal = _read_2d_signals(record, sampfrom, sampto)
    return record

class Lazy2DSignalArray:
    def __init__(self, record, sampfrom, sampto):
        self._record = record
        self._base = sampfrom
        self._len = sampto - sampfrom

    def __len__(self):
        return self._len

    def __getitem__(self, key):
        if isinstance(key, slice):
            (start, stop, step) = key.indices(self._len)
            data = _read_2d_signals(self._record,
                                    start + self._base,
                                    stop + self._base)
            return data[::step]
        elif isinstance(key, tuple):
            return self[key[0]][key[1:]]
        elif hasattr(key, '__index__'):
            return self[key:key+1][0]
        else:
            raise TypeError

@tompollard
Copy link
Member

Just catching up on this!

Let's not worry about backwards compatibility here. The current API behavior is not ideal, and does not allow the user to get a predictable output.

What about deprecating rdrecord and introducing wfdb.read_record() or maybe just wfdb.read()?

What do people think of the suggestion of having two params: return_dims and smooth_frames? return_dims will apply for both single and multi-frequency signals, and smooth_frames will apply only to multi-frequency signals.

Is return_dims necessary at all? The fact that it leads to situations like (2, False): Illegal combination. isn't nice. Could we treat everything as 2d and maybe have some kind of binary flatten flag if necessary?

@cx1111
Copy link
Member Author

cx1111 commented Jun 7, 2022

What about deprecating rdrecord and introducing wfdb.read_record() or maybe just wfdb.read()?

That's the long term plan. We want a major version where old and new APIs are available, and then another one that removes the old ones.

Is return_dims necessary at all? The fact that it leads to situations like (2, False): Illegal combination. isn't nice. Could we treat everything as 2d and maybe have some kind of binary flatten flag if necessary?

flatten essentially does the same thing as return_dims and the illegal combination issue would still hold. I'd argue that it's a worse keyword to use, since flattening an array may mean different things, thus making the keyword less explicit.

@cx1111
Copy link
Member Author

cx1111 commented Jun 7, 2022

Hot off the press: #388

See the top level functions at the bottom of the module demonstrating a non-streaming and streaming (needs a lot of work) API. My biggest takeaway while playing around with this: Why does any class/object need to have the array as one of its attribute? Does it benefit the library, or the user? If we just separate the full data array from the class/object, we can remove the p_signal/d_signal/e_p_signal naming garbage.

@cx1111
Copy link
Member Author

cx1111 commented Jun 23, 2022

^^ Thinking about it more, separating the signal data from the object may bring more confusion than benefits. Imagine reading a JPEG file into an Image object, and not having the bytes as an object field. That would be very strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants