Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add piano graph executor #36

Merged
merged 17 commits into from
Sep 2, 2021
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion paddle/fluid/compiler/paddle2piano/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,8 @@ cc_test(piano_compile_pass_test SRCS piano_compile_pass_tester.cc DEPS piano_com
cc_library(piano_op_registry SRCS piano_op_registry.cc DEPS framework_proto op_registry note_proto piano_data_description)
cc_test(piano_op_registry_test SRCS piano_op_registry_test.cc DEPS piano_op_registry operator op_registry)

cc_library(piano_op_kernel_context SRCS piano_op_kernel_context.cc DEPS piano_op_registry proto_desc note_builder)
cc_library(piano_op_kernel_context SRCS piano_op_kernel_context.cc DEPS piano_op_registry proto_desc piano_symbolization_builder)
cc_test(piano_op_kernel_context_test SRCS piano_op_kernel_context_test.cc DEPS piano_op_kernel_context op_registry)

cc_library(piano_graph_executor SRCS piano_graph_executor.cc DEPS piano_op_kernel_context piano_symbolization_meat_op)
cc_test(piano_graph_executor_test SRCS piano_graph_executor_test.cc DEPS piano_graph_executor node)
151 changes: 151 additions & 0 deletions paddle/fluid/compiler/paddle2piano/piano_graph_executor.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/fluid/compiler/paddle2piano/piano_graph_executor.h"

#include <queue>
#include <unordered_map>
#include <unordered_set>

#include "paddle/fluid/compiler/paddle2piano/piano_op_kernel_context.h"
#include "paddle/fluid/compiler/paddle2piano/vartype2notetype.h"
#include "paddle/fluid/compiler/piano/symbolization/meta_op.h"
#include "paddle/fluid/platform/enforce.h"

namespace paddle {
namespace piano {

using framework::ir::Node;
using GraphNodeVec = PianoGraphExecutor::GraphNodeVec;

void CreateInputOperand(const GraphNodeVec& cluster_inputs, PianoScope* scope,
symbolization::NoteBuilder* builder) {
for (int64_t id = 0; id < cluster_inputs.size(); ++id) {
auto* node = cluster_inputs.at(id);
PADDLE_ENFORCE_EQ(node->IsVar(), true,
platform::errors::InvalidArgument(
"Cluster Sub-Graph Input should be var"));

const auto& var_name = node->Name();

// create operand shape
const auto& var_shape = node->Var()->GetShape();
const auto& var_type = node->Var()->GetDataType();

// convert framework vartype to piano note type
note::ElementTypeProto element_type = VarType2NoteType(var_type);
Shape operand_shape(element_type, var_shape);

// create Operand
symbolization::Operand op =
symbolization::Parameter(builder, id, operand_shape, var_name);

// store into PianoScope
scope->SetOperand(var_name, op);
}
}

void TopologicSortGraph(const GraphNodeVec& cluster,
GraphNodeVec* cluster_sorted) {
std::unordered_set<Node*> cluster_set(cluster.cbegin(), cluster.cend());

std::unordered_map<Node*, std::unordered_set<Node*>> in_ops;
std::unordered_map<Node*, std::unordered_set<Node*>> out_ops;
std::queue<Node*> topo_queue;

// ensure all op node in 'in_ops' and 'out_ops'
for (auto* n : cluster) {
PADDLE_ENFORCE_EQ(n->IsOp(), true,
platform::errors::InvalidArgument(
"Cluster Sub-Graph all should be op"));

in_ops.emplace(n, std::unordered_set<Node*>());
out_ops.emplace(n, std::unordered_set<Node*>());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么拓扑排序要写的如此麻烦,不能使用一个std::vector<Node*, size_t>维护入度吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不能啊。由于op Nodeinputsvar Node,而不同var Nodeinputs可能会有重叠,如果用std::vector<Node*, size_t>的话,这重叠的op Node就被重复计算,但后面topo遍历时却只会减一次。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

形象点说就是这样:

op1   op2
|   /   |
var1    var2
 \    /
  op3

如果用vector,计算op3的入度时,op2会被统计两次。但topo遍历时,op2只遍历了一次,op3的入度也就只减了一次。

Copy link
Owner

@wzzju wzzju Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我的理解,即使你说的这种case,下面这种写法也是正确的。

void TopologicSortGraph(const GraphNodeVec& cluster,
                        GraphNodeVec* cluster_sorted) {

  std::unordered_set<Node*> cluster_set(cluster.cbegin(), cluster.cend());
  std::unordered_map<Node*, std::vector<Node*>> adj_list;
  std::unordered_map<Node*, size_t> in_degree;
  std::queue<Node*> queue;

  for (auto* n : cluster) {
    PADDLE_ENFORCE_EQ(n->IsOp(), true,
                  platform::errors::InvalidArgument(
                      "Cluster Sub-Graph all should be op"));
    // the op's input is var
    for (auto* in_var : n->inputs) {
      // the var's input is op
      for (auto* in_op : in_var->inputs) {
        if (cluster_set.count(in_op) != 0) {
          adj_list[in_op].emplace_back(n);
          in_degree[n]++;
        }
      }
    }
  }

  // find topology entries
  for (auto* n : cluster) {
    if (!in_degree[n]) {
      queue.push(n);
    }
  }

  // topological sorting
  while (!queue.empty()) {
    auto* cur_op = queue.front();
    queue.pop();

    cluster_sorted->emplace_back(cur_op);
    for (auto* adj : adj_list[cur_op]) {
      in_degree[adj]--;

      if (!in_degree[adj]) {
        queue.push(adj);
      }
    }
  }

  PADDLE_ENFORCE_EQ(cluster_sorted.size(), cluster.size(),
              platform::errors::PreconditionNotMet(
                  "Cluster Sub-Graph shouldn't contain cycle."));

}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样写的确可以哈,就是adj_list会有重复node。但在重复结点数目不多的情况下,相比unordered_set耗时会短点,我改下

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unordered_map<Node*, std::unordered_set<Node*>> out_ops;替换为std::unordered_map<Node*, size_t>多数情况下也能减少空间吧,另外,重复node出现的概率不会太高且重复的次数一般也较少。这样的代码可读性个人感觉好一些。

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改~

Copy link
Owner

@wzzju wzzju Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果想去重的话,可以试试下面的写法:

void TopologicSortGraph(const GraphNodeVec& cluster,
                        GraphNodeVec* cluster_sorted) {

  std::unordered_set<Node*> cluster_set(cluster.cbegin(), cluster.cend());
  std::unordered_map<Node*, std::unordered_map<Node*, size_t>> adj_list;
  std::unordered_map<Node*, size_t> indegree;
  std::queue<Node*> queue;

  for (auto* n : cluster) {
    PADDLE_ENFORCE_EQ(n->IsOp(), true,
                  platform::errors::InvalidArgument(
                      "Cluster Sub-Graph all should be op"));
    // the op's input is var
    for (auto* in_var : n->inputs) {
      // the var's input is op
      for (auto* in_op : in_var->inputs) {
        if (cluster_set.count(in_op) != 0) {
          ++adj_list[in_op][n];
          ++indegree[n];
        }
      }
    }
  }

  // find topology entries
  for (auto* n : cluster) {
    if (!indegree[n]) {
      queue.push(n);
    }
  }

  // topological sorting
  while (!queue.empty()) {
    auto* cur_op = queue.front();
    queue.pop();

    cluster_sorted->emplace_back(cur_op);
    for(auto it = adj_list[cur_op].begin(); it != adj_list[cur_op].end(); it++){
      indegree[it->first] -= it->second;

      if (!indegree[it->first]) {
        queue.push(it->first);
      }
    }
  }

  PADDLE_ENFORCE_EQ(cluster_sorted.size(), cluster.size(),
              platform::errors::PreconditionNotMet(
                  "Cluster Sub-Graph shouldn't contain cycle."));

}

}

// record all op's input op and output op
for (auto* n : cluster) {
// the op's input is var
for (auto* in_var : n->inputs) {
// the var's input is op
for (auto* in_op : in_var->inputs) {
if (cluster_set.find(in_op) != cluster_set.end()) {
in_ops.at(n).insert(in_op);
out_ops.at(in_op).insert(n);
}
}
}
}

// find topology entrance
for (auto* n : cluster) {
if (in_ops.at(n).empty()) {
topo_queue.push(n);
}
}

// topological sorting
while (!topo_queue.empty()) {
auto* cur_op = topo_queue.front();
topo_queue.pop();

cluster_sorted->emplace_back(cur_op);
for (auto* out : out_ops.at(cur_op)) {
// decrease output op's in-degree
in_ops.at(out).erase(cur_op);

// if empty, push into queue
if (in_ops.at(out).empty()) {
topo_queue.push(out);
}
}
}
}

void RunCompile(const GraphNodeVec& cluster, PianoScope* scope,
symbolization::NoteBuilder* builder) {
for (auto* n : cluster) {
const auto& op_name = n->Name();
const auto* op_desc = n->Op();

const auto& op_kernel_map = PianoOpRegistry::AllPianoOpKernels(op_name);
// TODO(jiangcheng05): how to distinguish library's kernel, like cudnn?
op_kernel_map.at("PLAIN")(PianoOpKernelContext(op_desc, scope, builder));
}
}

note::ModuleProto PianoGraphExecutor::operator()() {
// Step1: create unique NoteBuilder
std::string builder_name = "NoteBuilderOfGraph_";
builder_name.append(std::to_string(graph_id_));

symbolization::NoteBuilder builder(builder_name);

// Step2: create graph's input operand
PianoScope scope;
CreateInputOperand(cluster_inputs_, &scope, &builder);

// Step3: topo sort graph
GraphNodeVec cluster_sorted;
TopologicSortGraph(cluster_, &cluster_sorted);

// Step4: get PianoOpKernel and run compile
RunCompile(cluster_sorted, &scope, &builder);

// Step5: build and return module
return builder.Build();
}

} // namespace piano
} // namespace paddle
81 changes: 81 additions & 0 deletions paddle/fluid/compiler/paddle2piano/piano_graph_executor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
/* Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#pragma once

#include <vector>

#include "paddle/fluid/compiler/piano/note/note.pb.h"
#include "paddle/fluid/framework/ir/node.h"

namespace paddle {
namespace piano {

// An executor accept sub-graph which is generated by PianoCompilePass,
// run each op's PianoOpKernel, finally return the graph's ModuleProto.
//
// Parameter:
// 1. graph_id: the unique graph id, used for generating unique notebuilder name
// 2. cluster: a vector which contains all graph op, non-topological-sorting.
// 3. cluster_inputs: a vector which contains all graph's input var, the var's
// input are outside op, the output are inside op
// 4. cluster_outputs: a vector which contains all graph's output var, the var's
// input are inside op, the output are outside op
// 5. cluster_internals: a vector which contains all graph's internal var, the
// var's input and output are inside op
//
// Example:
// -------------------------> op3 -> var4 ->
// / /
// -> var1 -> op1 -> var2 -> op2 -> var3
//
// cluster: [op1, op2, op3]
// cluster_inputs: [var1]
// cluster_outputs: [var4]
// cluster_internals: [var2, var3]
//
// Describe:
// The executor consisted by the following step:
// 1. create a NoteBuilder, it's name is unique for each graph
// 2. create PianoScope, initially, scope only consist graph's input var and its
// operand
// 3. topological sorting graph
// 4. create PianoOpKernelContext and run each op's PianoOpKernel
// 5. run NoteBuilder's Build function to generate graph's ModuleProto
class PianoGraphExecutor {
public:
using GraphNodeVec = std::vector<framework::ir::Node*>;

PianoGraphExecutor(int64_t graph_id, const GraphNodeVec& cluster,
const GraphNodeVec& cluster_inputs,
const GraphNodeVec& cluster_outputs,
const GraphNodeVec& cluster_internals)
: graph_id_(graph_id),
cluster_(cluster),
cluster_inputs_(cluster_inputs),
cluster_outputs_(cluster_outputs),
cluster_internals_(cluster_internals) {}

note::ModuleProto operator()();

private:
int64_t graph_id_;
const GraphNodeVec& cluster_;
const GraphNodeVec& cluster_inputs_;
const GraphNodeVec& cluster_outputs_;
const GraphNodeVec& cluster_internals_;
};

} // namespace piano
} // namespace paddle
Loading