Skip to content
This repository has been archived by the owner on Feb 1, 2020. It is now read-only.

General Layout Support #447

Merged
merged 13 commits into from
Apr 25, 2018
Merged

General Layout Support #447

merged 13 commits into from
Apr 25, 2018

Conversation

yzhliu
Copy link
Member

@yzhliu yzhliu commented Apr 18, 2018

This PR allows NNVM to do

  • Replace an operator via AlterOpLayout. For x86 it is more efficient to calculate convolution in NCHW16c layout. Here is an example of how it is used: https://github.com/yzhliu/topi-intel/blob/master/e2e_general_pack/schedule_pack/avx512_conv_fwd.py#L128-L155
    Note that kernel (pre-)packing is supported as well: https://github.com/dmlc/nnvm/issues/303

  • Infer and correct layout automatically

    • Given the input data layout, or the layout of operators in the network (e.g., convolution, pooling...) InferCorrectLayout pass can infer the layout for each operator (both inputs and outputs).
    • If the required input layout of an operator is different from what it receives, a __layout_transform__ operator will be inserted.
    • Each operator registers a function FInferLayout. Once a model is imported, we do a InferCorrectLayout pass and store the original layouts for each operator. After AlterOpLayout, the InferCorrectLayout runs again. This time each operator will see the original layouts it inferred before, which can be used to decide whether to keep the original one. For example, softmax can still generate correct result after the input layout changes, while flatten cannot. So flatten claims it needs the original layout and triggers a layout transform, which make the network produce correct result.
    • With such approach,
      • optimized layout (e.g., NCHW16c) can flow through the network as far as possible. No pack/unpack overhead for each (e.g., convolution) operator.
      • now model layout can be transparent to users. Even though a convolution neural network is trained with NHWC layout, users can still pass NCHW input as long as the input layout is specified, a layout transform happens automatically.

Moreover the convolution kernel layout becomes much clearer: #372 . Now we have kernel layout OIHW and HWIO for NCHW and NHWC respectively.

Will shoot a pull request for corresponding TVM changes.

@yidawang @kevinthesun

Copy link
Contributor

@kevinthesun kevinthesun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an important change, I suggest latter we add a tutorial of how to use customized layout, especially what kinds of work and how much effort a developer need to support their own layout. From there we can continue iterating on layout related stuff.

@@ -237,7 +247,6 @@ def _upsampling(inputs, attrs):
'min_axis' : _rename('min'),
'reshape' : _reshape,
'sum_axis' : _rename('sum'),
'UpSampling' : _upsampling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I was removing SSD stuff but by mistake remove this...

}
};

struct MultiBoxPriorParam : public dmlc::Parameter<MultiBoxPriorParam> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove SSD related stuff from this PR. I will make a separate PR to add SSD operators and tests after TVM SSD related PR is merged.

@@ -204,8 +218,8 @@ def build(graph, target=None, shape=None, dtype="float32", params=None, target_h
By default, llvm is used if it is enabled,
otherwise a stackvm intepreter is used.

initialize : bool, optional
Whether to initialize variables in global dict _all_var_init.
layout : dict of str to str or str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does user need to specify opt_level=3 to enable layout transform? Also how does user specify "internal" layout, if the target layout is not NCHWXc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt_level=3 is for AlterOpLayout. the param here is for user to specify the input layout. user do not specify internal layout, it is inferred by operators have layout, e.g., conv, pool.

Layout in, last_in, out;
deduce(&in, in_layouts, in_size, "input");
deduce(&last_in, last_in_layouts, in_size, "input (last infer pass)");
deduce(&out, out_layouts, out_size, "output");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume an elemwise_add op has two inputs with layout "NCHW16c" and "NCHW4c", will they be transformed back to "NCHW", or one of them will be transformed to another?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right transform to left. it's defined in ElemwiseBinaryKeepLeftLayout


template<>
struct extension_class_info<nnvm::compiler::SymbolArray> {
static const int code = 19;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be quite dangerous to expose a vector of pointers here.

std::vector<TLayoutInfo> *olayouts)>;
using FTVMAlterOpLayout = std::function<
Symbol(const NodeAttrs& attrs,
const SymbolArray& inputs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need the input symbols, or is tinfo enough for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this comment, I now get what is going on

std::vector<TLayoutInfo> *ilayouts,
std::vector<TLayoutInfo> *olayouts)>;
using FTVMAlterOpLayout = std::function<
Symbol(const NodeAttrs& attrs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just return the result output layout as vector of string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore this

* \brief A simple layout infer pass that will
* insert layout transform nodes automatically.
*/
nnvm::Graph InferCorrectLayout(nnvm::Graph src) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just call it CorrectLayout

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. will change.

_sym_arr_get = tvm.get_global_func("nnvm.compiler._symbol_array_get")
_sym_arr_size = tvm.get_global_func("nnvm.compiler._symbol_array_size")

class SymbolArray(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just use symbol for this, because we can group multiple nodes into a single Symbol


class Layout {
public:
using LayoutDim = char;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tqchen
Do you want me to change the name 'dim' to 'axis' as you suggested in https://discuss.tvmlang.org/t/datalayout-structure/80/3 ?

And to my understanding, we don't have to wait for that DataLayout implemented in TVM. We can merge this PR first (so that we can move forward to having significant improvement - at least 50% speedup - for CNN) and switch to that in TVM later, right?

Copy link
Member

@tqchen tqchen Apr 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can merge this in first, please fix the problems I mentioned, as well as the tests

@tqchen
Copy link
Member

tqchen commented Apr 24, 2018

@merrymercy Can you also do a review when you have time?

@yzhliu
Copy link
Member Author

yzhliu commented Apr 24, 2018

@tqchen The segment fault in CI looks weird. I cannot reproduce. And it happened in PR #435 as well: http://mode-gpu.cs.washington.edu:8080/blue/organizations/jenkins/dmlc%2Fnnvm/detail/PR-435/13/pipeline

const Layout& produce = produce_ilayouts[i];
const Layout& request = request_ilayouts[i];
if (produce != request && produce.defined()) {
LOG(INFO) << "Insert layout transformer for " << i << "-th input of "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to produce a lot of messages, maybe avoid print this, since this is expected behavior of the pass

@tqchen tqchen merged commit 8994059 into dmlc:master Apr 25, 2018
@tqchen
Copy link
Member

tqchen commented Apr 25, 2018

Thanks for improving the code through the review! this is now merged

abergeron pushed a commit to abergeron/nnvm that referenced this pull request May 31, 2018
* [RUNTIME] Enable extension type to PackedFunc.

* More comments
abergeron pushed a commit to abergeron/nnvm that referenced this pull request May 31, 2018
@merrymercy merrymercy mentioned this pull request Jun 24, 2018
4 tasks
larroy pushed a commit to larroy/nnvm that referenced this pull request Feb 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants