Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Support Tensorflow Op Bridge #3059

Closed
kovasb opened this issue Apr 20, 2019 · 7 comments
Closed

[RFC] Support Tensorflow Op Bridge #3059

kovasb opened this issue Apr 20, 2019 · 7 comments

Comments

@kovasb
Copy link

kovasb commented Apr 20, 2019

This is a feature request.

Can we make this kind of integration easy to achieve for end users?
https://tvm.ai/2018/03/23/nmt-transformer-optimize.html

whereby TVM is used to create a kernel that is then exported into the TF runtime. After searching, I was unable to find open source code of this kind of integration. For someone who knows both systems at the appropriate layers this is probably easy, but for someone who doesn't, its a big friction to figure out all the details.

There are two main scenerios where this is beneficial for TF users:

  1. When it is not practical to adopt TVM wholesale for model execution, because of unsupported ops, or because of preexisting investments in TF infra.
  2. For initial exploration and assessment of TVM, it would be a lighter weight and lower risk onramp to be able to move ops a la carte to TVM.

To achieve this, there are 2 forms of support that would be useful:

  1. Tutorials / example templates that demonstrate the necessary glue that can simply be copied and lightly modified (changing paths, changing the TF op definition/registration code)
  2. Fully automated system that generates all the necessary C++ wrappers & support functions from your TVM function specification

Support level 1 would be sufficient for folks like me who are very interested in the results reported by Alibaba, but for whom there is too much friction or lack the experience to easily try this out and see if its worth investing in further.

Support level 2 would be useful for the median TF model developer, folks that don't want to touch/be aware of the C++ level but do want to try out optimizations.

Further background:

Currently there is no way to create TF ops besides C++ programming. The closest contender, XLA, only allows compiling existing ops, but does not allow creating new ops based on novel combinations of the XLA primitives. Furthermore XLA is highly restricted in what primitives it supports, for example convolution is only supported for floating point.

Eventually it will be possible to create TF ops in Swift and other MLIR-targeting systems, but this will likely take years, whereas the TVM infra is ready to go today. Therefore TVM is uniquely positioned to fill a significant gap in the TF ecosystem.

Thank you for your consideration.

@tqchen tqchen changed the title Support exporting standalone Tensorflow ops [RFC] Support exporting standalone Tensorflow ops Apr 20, 2019
@tqchen
Copy link
Member

tqchen commented Apr 20, 2019

This is somewhat related to DLPack support: tensorflow/tensorflow#24453 if we can get DLPack support into tensorflow, then it will be natural to add such support. If there is someone who is interested in adding DLPack support to Tensorflow, we can then move on from there

It is already possible through for frameworks that support DLPack, such as pytorch and mxnet, see
https://tvm.ai/2018/08/10/DLPack-Bridge.html

@tqchen tqchen changed the title [RFC] Support exporting standalone Tensorflow ops [RFC] Support for Tensorflow Bridge Apr 20, 2019
@tqchen tqchen changed the title [RFC] Support for Tensorflow Bridge [RFC] Support Tensorflow Op Bridge Apr 20, 2019
@kovasb
Copy link
Author

kovasb commented Apr 20, 2019

Thanks for taking a look at this! And btw huge fan of all your other work :D

I would love to have DLPack support directly in TF and agree that would make this ask easier.

However I'd like to point out an alternative path that has some advantages from a project management / open source collaboration POV.

Getting DLPack support merged into TF is likely 10x more work than support level 1 described above. And in fact, just getting some minimal example working would be a useful step towards the more general solution, both in making issues more explicit, but also in terms of providing raw material for those who may be more expert in TF to pick up and polish further.

So DLPack-in-TF is not strictly a hard requirement, and it might be useful to try to do this the other way around.

Thanks!

@tqchen
Copy link
Member

tqchen commented Apr 20, 2019

OK, I agree. Let us know if you are interested in exploring level 1 and contribute a tutorial. TVM compiler function’s can be directly used in C++ programs, take a look at the deploy example in apps and the only question is about how to hook that into TF related data structure, contribution is welcomed

@yangjunpro
Copy link

@kovasb Nice to see your interest into our TVM&TF NMT article:)

Also we have had some internal discussions regarding to adding non-TF DL compiler backend into TF as a complementary for XLA, and TVM is absolutely one of the great choices.

There are some principles I think we might need to follow to ensure the smooth integration:

  1. TVM related support should be placed as a standalone github repository to ensure the loose coupling between TF and TVM;
  2. The concrete method to achieve this loose coupling is to leverage TF's graph optimization registration mechanism, which will be invoked at TF runtime.
  3. A new graph pass can be added based on TF graph optimization framework(just the same as TF XLA's MarkForCompilation, EncapsulateSubGraph, BuildXLALaunchOp) which can recognize some portions of the TF graph which we think might benefit from TVM backend and then cluster these TF operations into a TF2TVMCompilation(or some other name) sub-graph and finally replace those clustered ops with a TF2TVMBridgeOp macro op.
  4. During the initial run of TF2TVMBridgeOp, compile the underlying TF ops into backend executables through TVM infrastructure, to ensure the smoothness of compilation phase, an extra IR layer may be necessary in addition to TVM's own IR architecture, this should be open for design discussion.
  5. After the initial run of TF2TVMBridgeOp, for the following runs, the compiled executable can be directly invoked. Another round of compilation may be necessary when the input data shape of the TF2TVMBridgeOp changes(although TVM provides native support for dynamic shape, we may wish to tease performance boundary through static shape information)
  6. The initial scenario I personally think TVM can complement TF and XLA is its native supporting mechanism for compute-intensive operations, such as GEMM/Conv, which might be a good starting point. For non-compute-intensive operations(such as add/mul/reduce/transpose, etc.), I think XLA currently already provides good mechanism support, and we could follow XLA's infrastructure to optimize these non-compute-intensive operations directly.

There are some scenarios we estimated to be suitable for this feature and already started the design and refine work. If you have any interests, it would be highly appreciated to provide your concrete use case or jump into the design&discussion directly.

Thanks

@wweic
Copy link
Contributor

wweic commented Apr 22, 2019

@yangjunpro +1 on your work. As Relay is also planning to support dynamic shape(#3042), we might not need to directly handle step 5(relay does the JIT/bucketing under the hood). We are also thinking that is it reasonable to do the opposite, the main runtime is tvm runtime, and fallback to TensorFlow for unsupported ops, would it yield a solution that has lower memory footprint? Maybe we can open a discussion thread to further discuss(on both)? Since we don't really have an action item yet. cc @yongwww @zhiics @icemelon9

@tqchen
Copy link
Member

tqchen commented Dec 19, 2019

Move to #4464

@tqchen tqchen closed this as completed Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants