Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lists vs Dictionaries when using List[Model] #609

Open
bruno-f-cruz opened this issue Nov 21, 2023 · 10 comments
Open

Lists vs Dictionaries when using List[Model] #609

bruno-f-cruz opened this issue Nov 21, 2023 · 10 comments
Assignees

Comments

@bruno-f-cruz
Copy link
Collaborator

A common pattern in the current schema is to use a List of generic objects to keep track of such objects. For instance, cameras can be expressed as follows:

https://github.com/AllenNeuralDynamics/aind-data-schema/blob/c8aca0341d86efc82aa32ba0255b1f66287a4a3c/src/aind_data_schema/rig.py#L64C1

While easy to use, Lists make grabbing a specific device somewhat difficult without parsing all objects since they must be indexed by the order (which can easily change across days) or the experimenter must generate somesort of keyed collection type with an unique key (e.g. device name).

I wonder if rather than using generic unsorted lists, it would be beneficial to simply use some sort of hashed dictionary structure in the form of Dict[str, CameraAssembly] instead of List[CameraAssembly]. This would make it much easier to access the objects after deserialization.

@bruno-f-cruz
Copy link
Collaborator Author

Something along these lines:

pydantic/pydantic#726 (comment)

@bruno-f-cruz
Copy link
Collaborator Author

bruno-f-cruz commented Dec 4, 2023

Relatedly, any specific reason for the name field being optional?

https://github.com/AllenNeuralDynamics/aind-data-schema/blob/e92f039d1390b13e6a95f0b793d8fdc0a131a77e/src/aind_data_schema/device.py#L93C1-L94C1

This creates a situation where even if I wanted to create the dictionary post-hoc, using the array of objects, not only do I have to deal with potential duplicated keys (which it would still be important to solve), but I also have to deal with objects without a name field.

I guess this my be related to #620

@saskiad
Copy link
Collaborator

saskiad commented Dec 4, 2023

yep, we already have a ticket to fix this.

@bruno-f-cruz
Copy link
Collaborator Author

If you have a chance, can you also comment on the original issue? And why Lists > Dict at this point? Just want to understand if it is something we want to think about, or simply drop it. Thanks!

@saskiad
Copy link
Collaborator

saskiad commented Dec 4, 2023

we'll discuss it when we've finished the v2 update

@bruno-f-cruz
Copy link
Collaborator Author

Can we revisit this discussion now that v2 has been merged? I am at a point where having this implemented would save us a lot of headaches down the road when instantiating hardware from the schema. Thanks!

@bruno-f-cruz
Copy link
Collaborator Author

I made a quick test of what this could look like here:

#704

I used additional_devices a test case. The proposed solution is backward compatible I think

@dyf
Copy link
Member

dyf commented Jan 18, 2024

I'm uncomfortable with making the name implicit as a key, and then having to replicate it inside the field itself, but only sometimes. Is the right solution just to make name required? @bruno-f-cruz @saskiad

@bruno-f-cruz
Copy link
Collaborator Author

bruno-f-cruz commented Jan 18, 2024

To clarify, I don't think the name must be the same as the key. However, for backward compatibility, I decided to use the name as a key when constructing the dictionary so people could migrate from List->Dict painlessly. If people define it as a Key:Value pair directly, I don't see why name must be the key.
I would be happy to consider other default conversations from List->Dict

@bruno-f-cruz
Copy link
Collaborator Author

After a bit of playing around with the schema, I really think that moving to dictionaries is going to make a lot of the schema grammar much simpler.

For instance, currently in order to pass a DaqChannel to the Wheel class, one must know the index of that channel in the device. This creates a huge problem as the order of devices in the array is not stable and can easily change if a channel is deleted. Moreover, it feels a bit clunky to have to define the device name in each channel since, if a channel belongs to a device, the name of the device should be a given.

import os
import aind_data_schema.models.devices as d
from aind_data_schema.models.manufacturers import Manufacturer



wheel_encoder_channel = d.DAQChannel(
    device_name='BehaviorBoard',
    channel_name='DI0', #This should be an array
    channel_type=d.DaqChannelType.DI,
    event_based_sampling=True,
)

wheel_break_channel = d.DAQChannel(
    device_name='BehaviorBoard',
    channel_name='DO0',
    channel_type=d.DaqChannelType.DO,
    event_based_sampling=True,
)

wheel_torque_channel = d.DAQChannel(
    device_name='BehaviorBoard',
    channel_name='AI0',
    channel_type=d.DaqChannelType.AI,
    event_based_sampling=True,
)


harp_board = d.HarpDevice(
    name='BehaviorBoard',
    serial_number=None,
    # manufacturer=Manufacturer.CHAMPALIMAUD, # this crashes
    data_interface=d.DataInterface.USB,
    computer_name=os.getenv('COMPUTERNAME'),
    channels=[wheel_encoder_channel, wheel_break_channel, wheel_torque_channel],
    harp_device_type=d.HarpDeviceType.BEHAVIOR,
    is_clock_generator=False,
    port_index='COM3')


wheel = d.Wheel(
    name='ThisWheel',
    encoder=harp_board,
    encoder_output=harp_board.channels[0],
    magnetic_brake=harp_board,
    brake_output=harp_board.channels[1],
    torque_sensor=harp_board,
    torque_output=harp_board.channels[2],
    radius=0,
    width=0,
    pulse_per_revolution=0,
)

My suggestion for a new pattern includes:

  1. Make the List of channels a Dict. This circumvents two problems. a) Ensures that all entries are unique; b) Allows stable indexing by key
  2. Adding channels to the Dict is done through helper methods (add and remove). Ideally, the setter for the property would also be disabled. This ensures that users cannot simply modify the property without validation. Read more about this pattern here.
  3. Delete the reference to the device from peripherals (such as Wheel, since the DAQ channel already has this valid reference.

The aforementioned pattern would thus become:

import os
from typing import Dict, Literal, Union, List

import aind_data_schema.models.devices as d
from pydantic import Field
from aind_data_schema.models.manufacturers import Manufacturer
from aind_data_schema.models.units import SizeUnit


class NewHarpDevice(d.HarpDevice):
    channels: Dict[str, d.DAQChannel] = Field(default_factory=dict)

    def add_channel(self, channel: Union[d.DAQChannel, List[d.DAQChannel]]):
        if isinstance(channel, list):
            for c in channel:
                self.add_channel(c)
        else:
            channel = self._validate_daq_channel(channel)
            self.channels[channel.channel_name] = channel

    def remove_channel(self, channel_name: str):
        if channel_name in self.channels:
            del self.channels[channel_name]
        else:
            raise ValueError(f"Channel {channel_name} does not exist")

    def _validate_daq_channel(self, channel: d.DAQChannel) -> d.DAQChannel:
        if not channel.device_name:
            channel.device_name = self.name
        if channel.device_name != self.name:
            raise ValueError(f"Device name {channel.device_name} does not match {self.name}")
        if channel.channel_name in self.channels:
            raise ValueError(f"Channel {channel.channel_name} already exists")
        return channel


class NewWheel(d.MousePlatform):
    device_type: Literal["Wheel"] = "Wheel"
    radius: float = Field(..., title="Radius (mm)")
    width: float = Field(..., title="Width (mm)")
    size_unit: SizeUnit = Field(SizeUnit.MM, title="Size unit")
    encoder_output: d.DAQChannel = Field(None, title="Encoder DAQ channel")
    pulse_per_revolution: int = Field(..., title="Pulse per revolution")
    brake_output: d.DAQChannel = Field(None, title="Brake DAQ channel")
    torque_output: d.DAQChannel = Field(None, title="Torque DAQ channel")


wheel_encoder_channel = d.DAQChannel(
    device_name='',
    channel_name='DI0', #This should be an array
    channel_type=d.DaqChannelType.DI,
    event_based_sampling=True,
)

wheel_break_channel = d.DAQChannel(
    device_name='',
    channel_name='DO0',
    channel_type=d.DaqChannelType.DO,
    event_based_sampling=True,
)

wheel_torque_channel = d.DAQChannel(
    device_name='',
    channel_name='AI0',
    channel_type=d.DaqChannelType.AI,
    event_based_sampling=True,
)


harp_board = NewHarpDevice(
    name='BehaviorBoard',
    serial_number=None,
    # manufacturer=Manufacturer.CHAMPALIMAUD, # this crashes
    data_interface=d.DataInterface.USB,
    computer_name=os.getenv('COMPUTERNAME'),
    harp_device_type=d.HarpDeviceType.BEHAVIOR,
    is_clock_generator=False,
    port_index='COM3')

harp_board.add_channel(wheel_encoder_channel) # Notice that once this method runs, it modifies the original object with the name of the DaqDevice.
harp_board.add_channel(wheel_break_channel)
harp_board.add_channel(wheel_torque_channel)


new_wheel = NewWheel(
    name='ThisWheel',
    radius=0,
    width=0,
    pulse_per_revolution=0,
    encoder_output=wheel_encoder_channel, # just to show that the object is modified and can be used as a reference
    brake_output=harp_board.channels['DO0'],
    torque_output=harp_board.channels['AI0'],
)

Finally, I would strongly advise to adopt a name for all these objects. It is a bit weird that some have name, while others have an identical property but called channel_name or assembly_name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants