Proposing change in design : uses `class` instead of dictionaries #184

khsrali · 2024-07-17T15:10:42Z

After trying out the first syllabus of documentation (two issues #176 #179), I would like to propose a more intuitive interface:

The proposal:

wg = WorkGraph()
task_1 = wg.add_task(add, x= 1, y= 2,  label= "Note: I don't want to access variables with wg.tasks[<label>]!")
task_2 = wg.add_task(multiply, y=-1, x=task_1.future)
wg.run() # see the bottom of this post for multiple runs

As it is now:

# Create a workgraph to link the tasks.
wg = WorkGraph("test_add_multiply")
wg.add_task(add, name="add1")
wg.add_task(multiply, name="multiply1")
wg.add_link(wg.tasks["add1"].outputs["result"], wg.tasks["multiply1"].inputs["b"])
wg.run(inputs = {"add1": {"x": 1, "y": 2}, "multiply1": {"y": -1}}, wait=True)

Proof of concept: Can be achieved with the following design

import inspect

class Future_pointer:
    def __init__(self, result):
        self.result = result

class Task:

    def __init__(self, func, label=None, depends_on=None, **kwargs):
        self.func = func

        for key, value in kwargs.items():
            setattr(self, key, value)

        params = inspect.signature(func).parameters
        for name in params:
            if not hasattr(self, name):
                setattr(self, name, None)
        
        self.future = Future_pointer(None)
        self._result = None
        self.dependencies = []
        if depends_on is not None:
            self.add_dependency(depends_on)


    @property
    def result(self):
        return self.future.result
    

    def calculate(self):
        for dep in self.dependencies:
            dep.calculate()
        args = {}
        for name in inspect.signature(self.func).parameters:
            if isinstance(getattr(self, name), Future_pointer):
                value = getattr(self, name).result
            else:
                value = getattr(self, name)
            if value is None:
                self.future.result = None
                return
            args[name] = value 

        self.future.result = self.func(**args)
    
    def add_dependency(self, source):
        self.dependencies.append(source)

class WorkGraph:

    def __init__(self):
        self.tasks = []
        self.links = [] # one can keep track of links from input values... a bit more coding but straight forward

    def add_task(self, func, label=None, depends_on=None, **kwargs):
        if not depends_on:
            depends_on = None if not self.tasks else self.tasks[-1]
        
        instance = Task(func, label, depends_on=depends_on, **kwargs)
        self.tasks.append(instance)
        
        return instance

    def run(self, **kwargs):
        for task in self.tasks:
            task.calculate()

def add(x, y):
    return x + y

def multiply(x, y):
    return x * y

wg = WorkGraph()
task_1 = wg.add_task(add, x=1, y=2, label="My arbitrary label which could be very long, and I don't want to access with wg.tasks[<name>]!")
task_2 = wg.add_task(multiply, y=-1, x=task_1.future)

# task_2.calculate()
wg.run()
print(task_2.result)  # should print -3 
print(task_1.result)  # should print 3 

task_1.x = 2
wg.run()
print(task_2.result)  # should print -4

I understand this might be considered a pain for @superstar54 but to me this is intuitive and lubricates life of a user.

I would like to know other's opinion as well, @giovannipizzi

giovannipizzi · 2024-07-17T15:55:17Z

I also suggested to have a tab-completable way to specify links without dicts and string keys, so I'd support the idea, even if I'm not sure of the exact suggestion with the word "future", for instance.

Also, I didn't check the implementation in detail, but I'm really confused by the class containing classes, and then you access a class attribute of the inner class... Why can't you simply use a method of the outer class and store info in an instance attribute, with less python magic? (maybe easier to discuss live, I just wanted to write this comment)

giovannipizzi · 2024-07-17T15:56:16Z

Also, I'd try to make the two examples at the top try to do the same thing, now they are a bit different I think

khsrali · 2024-07-17T21:04:13Z

That is because I started coding from inside out 😄 hahah
Please check the update, now it's fine. Both examples do the same thing.
add_link is done automatically in code, also can be given directly as depends_on to Task. But this is not the point/problem, it can be easily re-adjusted to the former behavior.

superstar54 · 2024-07-18T02:54:13Z

Hi @khsrali , thanks for sharing your idea here. I try to explain what WorkGraph implementations, and their use cases.

About using Class

WorkGraph uses Class, and it allows user to access the task or its outputs in three ways:

dictionary-like: wg.tasks["add1"].outputs["x"]
list-like: wg.tasks["add1"].outputs[0], wg.tasks[0].outputs[0]
attribute-like: wg.tasks.["add1"].outputs.x, or even wg.tasks.add1.outputs.x. Note: Tab-completable is not supported yet, but is planned.

Therefore, the code you propose to run should already work using WorkGraph.

wg = WorkGraph()
add_task = wg.add_task(add, name="add1", x=1, y=2)
wg.add_task(multiply, y=3, x=add_task.outputs.result)
wg.run()

In most cases, attribute-like and list-like can be very useful and straightforward. However, dictionary-like is the most robust and can be used in all cases. For example, user can program their work graph in an automatic fashion. When creating many tasks inside a for loop, the user can give a meaningful full name to the tasks, e.g., f"task_{i}", so that they can access it and create complex dependencies.

"Note: I don't want to access variables with wg.tasks[]!")

Users are not required to give a name for a task. However, they are still suggested to give the task a meaningful name so that later, when loading a finished WorkGraph, they can quickly access the tasks by their names.

About the creation and execution

In your proof of concept, you also introduce the way to execute the WorkGraph, like the future concept. I would like to say that the execution of the WorkGraph is rather complex because it requires asynchronous execution, checkpoints, data provenance of the workflow logic, etc.

In aiida-workgraph, there are separate classes: WorkGraph and WorkGraphEngine handle the creation and execution. You can check the WorkGraphEngine class if you are interested.

I am open to further discussion if you'd like to delve deeper into the execution engine.

khsrali added the enhancement New feature or request label Jul 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposing change in design : uses `class` instead of dictionaries #184

Proposing change in design : uses `class` instead of dictionaries #184

khsrali commented Jul 17, 2024 •

edited

Loading

giovannipizzi commented Jul 17, 2024

giovannipizzi commented Jul 17, 2024

khsrali commented Jul 17, 2024

superstar54 commented Jul 18, 2024 •

edited

Loading

Proposing change in design : uses class instead of dictionaries #184

Proposing change in design : uses class instead of dictionaries #184

Comments

khsrali commented Jul 17, 2024 • edited Loading

Proof of concept: Can be achieved with the following design

giovannipizzi commented Jul 17, 2024

giovannipizzi commented Jul 17, 2024

khsrali commented Jul 17, 2024

superstar54 commented Jul 18, 2024 • edited Loading

About using Class

About the creation and execution

Proposing change in design : uses `class` instead of dictionaries #184

Proposing change in design : uses `class` instead of dictionaries #184

khsrali commented Jul 17, 2024 •

edited

Loading

superstar54 commented Jul 18, 2024 •

edited

Loading