Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposing change in design : uses class instead of dictionaries #184

Open
khsrali opened this issue Jul 17, 2024 · 4 comments
Open

Proposing change in design : uses class instead of dictionaries #184

khsrali opened this issue Jul 17, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@khsrali
Copy link
Contributor

khsrali commented Jul 17, 2024

After trying out the first syllabus of documentation (two issues #176 #179), I would like to propose a more intuitive interface:

The proposal:

wg = WorkGraph()
task_1 = wg.add_task(add, x= 1, y= 2,  label= "Note: I don't want to access variables with wg.tasks[<label>]!")
task_2 = wg.add_task(multiply, y=-1, x=task_1.future)
wg.run() # see the bottom of this post for multiple runs

As it is now:

# Create a workgraph to link the tasks.
wg = WorkGraph("test_add_multiply")
wg.add_task(add, name="add1")
wg.add_task(multiply, name="multiply1")
wg.add_link(wg.tasks["add1"].outputs["result"], wg.tasks["multiply1"].inputs["b"])
wg.run(inputs = {"add1": {"x": 1, "y": 2}, "multiply1": {"y": -1}}, wait=True)

Proof of concept: Can be achieved with the following design

import inspect

class Future_pointer:
    def __init__(self, result):
        self.result = result

class Task:

    def __init__(self, func, label=None, depends_on=None, **kwargs):
        self.func = func

        for key, value in kwargs.items():
            setattr(self, key, value)

        params = inspect.signature(func).parameters
        for name in params:
            if not hasattr(self, name):
                setattr(self, name, None)
        
        self.future = Future_pointer(None)
        self._result = None
        self.dependencies = []
        if depends_on is not None:
            self.add_dependency(depends_on)


    @property
    def result(self):
        return self.future.result
    

    def calculate(self):
        for dep in self.dependencies:
            dep.calculate()
        args = {}
        for name in inspect.signature(self.func).parameters:
            if isinstance(getattr(self, name), Future_pointer):
                value = getattr(self, name).result
            else:
                value = getattr(self, name)
            if value is None:
                self.future.result = None
                return
            args[name] = value 

        self.future.result = self.func(**args)
    
    def add_dependency(self, source):
        self.dependencies.append(source)

class WorkGraph:

    def __init__(self):
        self.tasks = []
        self.links = [] # one can keep track of links from input values... a bit more coding but straight forward

    def add_task(self, func, label=None, depends_on=None, **kwargs):
        if not depends_on:
            depends_on = None if not self.tasks else self.tasks[-1]
        
        instance = Task(func, label, depends_on=depends_on, **kwargs)
        self.tasks.append(instance)
        
        return instance

    def run(self, **kwargs):
        for task in self.tasks:
            task.calculate()

def add(x, y):
    return x + y

def multiply(x, y):
    return x * y

wg = WorkGraph()
task_1 = wg.add_task(add, x=1, y=2, label="My arbitrary label which could be very long, and I don't want to access with wg.tasks[<name>]!")
task_2 = wg.add_task(multiply, y=-1, x=task_1.future)

# task_2.calculate()
wg.run()
print(task_2.result)  # should print -3 
print(task_1.result)  # should print 3 

task_1.x = 2
wg.run()
print(task_2.result)  # should print -4

I understand this might be considered a pain for @superstar54 but to me this is intuitive and lubricates life of a user.

I would like to know other's opinion as well, @giovannipizzi

@khsrali khsrali added the enhancement New feature or request label Jul 17, 2024
@giovannipizzi
Copy link
Member

I also suggested to have a tab-completable way to specify links without dicts and string keys, so I'd support the idea, even if I'm not sure of the exact suggestion with the word "future", for instance.

Also, I didn't check the implementation in detail, but I'm really confused by the class containing classes, and then you access a class attribute of the inner class... Why can't you simply use a method of the outer class and store info in an instance attribute, with less python magic? (maybe easier to discuss live, I just wanted to write this comment)

@giovannipizzi
Copy link
Member

Also, I'd try to make the two examples at the top try to do the same thing, now they are a bit different I think

@khsrali
Copy link
Contributor Author

khsrali commented Jul 17, 2024

That is because I started coding from inside out 😄 hahah
Please check the update, now it's fine. Both examples do the same thing.
add_link is done automatically in code, also can be given directly as depends_on to Task. But this is not the point/problem, it can be easily re-adjusted to the former behavior.

@superstar54
Copy link
Member

superstar54 commented Jul 18, 2024

Hi @khsrali , thanks for sharing your idea here. I try to explain what WorkGraph implementations, and their use cases.

About using Class

WorkGraph uses Class, and it allows user to access the task or its outputs in three ways:

  • dictionary-like: wg.tasks["add1"].outputs["x"]
  • list-like: wg.tasks["add1"].outputs[0], wg.tasks[0].outputs[0]
  • attribute-like: wg.tasks.["add1"].outputs.x, or even wg.tasks.add1.outputs.x. Note: Tab-completable is not supported yet, but is planned.

Therefore, the code you propose to run should already work using WorkGraph.

wg = WorkGraph()
add_task = wg.add_task(add, name="add1", x=1, y=2)
wg.add_task(multiply, y=3, x=add_task.outputs.result)
wg.run()

In most cases, attribute-like and list-like can be very useful and straightforward. However, dictionary-like is the most robust and can be used in all cases. For example, user can program their work graph in an automatic fashion. When creating many tasks inside a for loop, the user can give a meaningful full name to the tasks, e.g., f"task_{i}", so that they can access it and create complex dependencies.

"Note: I don't want to access variables with wg.tasks[]!")

Users are not required to give a name for a task. However, they are still suggested to give the task a meaningful name so that later, when loading a finished WorkGraph, they can quickly access the tasks by their names.

About the creation and execution

In your proof of concept, you also introduce the way to execute the WorkGraph, like the future concept. I would like to say that the execution of the WorkGraph is rather complex because it requires asynchronous execution, checkpoints, data provenance of the workflow logic, etc.

In aiida-workgraph, there are separate classes: WorkGraph and WorkGraphEngine handle the creation and execution. You can check the WorkGraphEngine class if you are interested.

I am open to further discussion if you'd like to delve deeper into the execution engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants