Change target string to Target object in the TE compiler and interpreter #8835

electriclilies · 2021-08-24T17:30:11Z

In this PR, I change target string to Target object in parts of the TE compiler and interpreter. This is follow-up work from #8802.

cc @junrushao1994 @Mousius @mbs-octoml @mikepapadim

# This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works!

src/relay/backend/build_module.cc

Mousius · 2021-08-24T17:46:05Z

This is amazing @electriclilies, this is a much cleaner representation of the Targets, thanks for doing this 😸

Mousius · 2021-08-24T17:49:41Z

src/relay/backend/build_module.cc

+    if (lowered_funcs.find(Target("ext_dev")) != lowered_funcs.end()) {
+      lowered_funcs.Set(Target("ext_dev"), IRModule());


My suggestion got squashed because you're too fast 😸 I was suggesting moving the duplicated instantiation out:

Suggested change

if (lowered_funcs.find(Target("ext_dev")) != lowered_funcs.end()) {

lowered_funcs.Set(Target("ext_dev"), IRModule());

Target ext_dev("ext_dev");

if (lowered_funcs.find(ext_dev) != lowered_funcs.end()) {

lowered_funcs.Set(ext_dev, IRModule());

Oh, my bad, I just didn't commit the file 😅

Ahh, classic 😅

junrushao

🙏 Thank you for doing this! Really nice!

Mousius

Took a look at this locally, looks good to me 😸

mbs-octoml · 2021-08-24T19:27:52Z

LGTM

Mousius · 2021-08-25T09:51:24Z

Hmm, something weird is occurring here, unsure why those tests didn't run for me locally - but debugging test_any.py a bit I ran tvm::Dump(per_target_module_) and got (removed function body for brevity):

{
llvm -keys=cpu -link-params=0: IRModule({GlobalVar(intrp_fused_add): PrimFunc([placeholder, placeholder, T_add]) attrs={"from_legacy_te_schedule": (bool)1, "global_symbol": "intrp_fused_add", "tir.noalias": (bool)1} {}
}), 
llvm -keys=cpu -link-params=0: IRModule({GlobalVar(shape_func_add): PrimFunc([placeholder, placeholder, _broadcast_shape_func]) attrs={"from_legacy_te_schedule": (bool)1, "global_symbol": "shape_func_add", "tir.noalias": (bool)1} {}
})}

This should be gated in GetLoweredFunctions, with these lines (these are ran twice, once for normal cache and once for shape cache):

      if (!lowered_functions.count(target)) {
        lowered_functions.Set(target, IRModule(Map<GlobalVar, BaseFunc>({})));
      }

      lowered_functions[target]->Update(lowered_func->cached_func->funcs);

And only create a single IRModule per target, this indicates the Map is instead creating many keys even for the single Target it was passed in LowerTE (i.e. same Target object, I assume copied), unsure why this isn't breaking more.

This also means I was wrong about the Target("ext_dev") comparison, my apologies 😿. I believe this is due to ObjectHash used by Map only being able to do String or ObjectPtr comparisons?

@jroesch / @junrushao1994 any ideas? It would appear Map is behaving a little bit weirdly?

mbs-octoml · 2021-08-25T14:37:21Z

Are the hash and equality on Targets legit?
Also, possibly relevant, there's a bit of hackery to force shape functions to end up on the 'cpu' target which is hard coded, so it could be there's two 'llvm' targets in action that look the same but are distinct objects as far has hash/equality is concerned. I'd like to fix that by making device planning responsible for spelling out everything so there's no special cases left here.
Note ideally the code would be:
lowered_functions[target]->Add(...)
ie let the defaut ctor do it's thing if the key is not present.

mbs-octoml · 2021-08-25T14:52:19Z

(3oz of coffee later)
It will be the on-the-fly Target creation for the shape functions. By switching from String to Target we no longer implicitly identify distinct targets objects which have the same structure. To make this work we'll need to create all the Targets we need once and pass them around in the targets map. When lowering shape functions the target will need to be retrieved for the 'default cpu'. Again, eventually I'd like device planning to just spell this out, and for shape functions to have a well-defined place on which their appropriate Target annotation can be attached. As a stepping stone just passing in the 'default_cpu_target' or something would do?

electriclilies · 2021-08-27T22:49:20Z

@mbs-octoml @junrushao1994 @Mousius I implemented what Junru suggested-- just using std::unordered_map<Target, IRModule, TargetStrHash, TargetStrEqual> instead of tvm::Map<Target, IRModule> in a few places.
It looks like this will go green soon, can you take another look so we can get this merged? Thanks!

junrushao

It looks good to me. Only a nitpick

junrushao · 2021-08-30T01:11:05Z

include/tvm/target/target.h

@@ -203,5 +204,59 @@ void CheckAndUpdateHostConsistency(Map<Integer, Target>* target, Target* host);
 * \param host The Target typed object for target host to be updated
 */
 void CheckAndUpdateHostConsistency(Map<Target, IRModule>* target, Target* host);
+
+// TODO(@electriclilies): Move to somewhere in backend and add note about appropriate use


Hey what about moving these methods temporarily to src/relay/backend/utils.h instead? Given these are only used in relay backend right now, I think it would be helpful to sort of prevent future developers to use them :-)

moved them!

mbs-octoml · 2021-08-30T15:56:15Z

src/relay/backend/interpreter.cc

@@ -382,8 +386,10 @@ class Interpreter : public ExprFunctor<ObjectRef(const Expr& n)>,

    // Project out just the function(s) we need.
    IRModule lowered_projected_mod;
-    auto mod_itr = per_target_module_.find(target->str());
-    ICHECK(mod_itr != per_target_module_.end())
+    std::unordered_map<Target, IRModule, TargetStrHash, TargetStrEqual> per_target_module_std_map_ =


nit: don't append a _ for local vars since the convention is it indicates a member var.

mbs-octoml · 2021-08-30T16:01:52Z

LGTM. It will be a happy day when we nuke these maps.

Mousius

Minor nit, but otherwise excited to see this merged 😸

Mousius · 2021-08-31T08:48:40Z

include/tvm/target/target.h

@@ -31,6 +31,7 @@
 #include <tvm/target/target_kind.h>

 #include <string>
+#include <unordered_map>


Should probably remove this as it's not used in this file.

We will send a follow up P that does this just for the sake of forward progress. Thanks!

electriclilies · 2021-08-31T20:06:43Z

Thanks all! I'll stick that change in a follow up PR @Mousius

…ter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com>

@mdw-octoml

* nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (#8853) * VTA cmake change to include Verilator header for building tsim library (#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (#8726) * [AMP] Disallow fp16 conversion for summation-like ops (#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (#8857) * [Community] @mdw-octoml -> Reviewer (#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (#8860) * [Community] @manupa-arm -> Committer (#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (#8888) * [TVMScript] Fix printing ForNode annotations (#8891) * [1/10] CMSIS-NN graph partitioner for softmax (#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

@mdw-octoml

* nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message apache#2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

…ter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message apache#2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com>

@mdw-octoml

* nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message apache#2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

@mdw-octoml

… only to `/docs` (#9031) * Add script to look for changed in doc dir * Modify Jenkinsfile * Minor changes in scripts * Working Jenkinsfile on selective stages on docs * Pass groovy formater on Jenkinsfile * Implementation of relay_to_tir target hook (#8423) This the first new hook proposed in the Additional Target Hooks RFC, longer term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc` Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now. * [CUDA] Fix dense tensorcore legalize type error when units is specified (#9030) * Fix dense tensorcore legalize type error when units is specified * revert black change due to different version from CI * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (#9017) * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op * Fix linter error for variable name and else after return * Separate quantized avg_pool impl and add TODO for global_avg_pool * Fix comment typo * Fix line break in `setup.py` (#9029) * [Onnx] Add SoftmaxCrossEntropyLoss (#8906) * nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (#8853) * VTA cmake change to include Verilator header for building tsim library (#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (#8726) * [AMP] Disallow fp16 conversion for summation-like ops (#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (#8857) * [Community] @mdw-octoml -> Reviewer (#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (#8860) * [Community] @manupa-arm -> Committer (#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (#8888) * [TVMScript] Fix printing ForNode annotations (#8891) * [1/10] CMSIS-NN graph partitioner for softmax (#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com> * [Hexagon] Don't use {} initialization with FastRPC structures (#9033) The data members in FastRPC structures aren't guaranteed to remain in the same order. Replace aggregate initialization with direct, member-by-member initialization. * Test * Minor checkstyle issue * Test * Test file * Revert changed in unit tests * Change script name * Test * Revert format on groovy file * Remove test file * Minor change in script * Minor formating changes * Revert logic in conditions for changed files Co-authored-by: Christopher Sidebottom <christopher.sidebottom@arm.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com> Co-authored-by: Leandro Nunes <leandro.nunes@arm.com> Co-authored-by: AndrewZhaoLuo <andrew.zhao.luo@gmail.com> Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

@mdw-octoml

… only to `/docs` (apache#9031) * Add script to look for changed in doc dir * Modify Jenkinsfile * Minor changes in scripts * Working Jenkinsfile on selective stages on docs * Pass groovy formater on Jenkinsfile * Implementation of relay_to_tir target hook (apache#8423) This the first new hook proposed in the Additional Target Hooks RFC, longer term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc` Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now. * [CUDA] Fix dense tensorcore legalize type error when units is specified (apache#9030) * Fix dense tensorcore legalize type error when units is specified * revert black change due to different version from CI * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (apache#9017) * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op * Fix linter error for variable name and else after return * Separate quantized avg_pool impl and add TODO for global_avg_pool * Fix comment typo * Fix line break in `setup.py` (apache#9029) * [Onnx] Add SoftmaxCrossEntropyLoss (apache#8906) * nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. Initial changes Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com> * [Hexagon] Don't use {} initialization with FastRPC structures (apache#9033) The data members in FastRPC structures aren't guaranteed to remain in the same order. Replace aggregate initialization with direct, member-by-member initialization. * Test * Minor checkstyle issue * Test * Test file * Revert changed in unit tests * Change script name * Test * Revert format on groovy file * Remove test file * Minor change in script * Minor formating changes * Revert logic in conditions for changed files Co-authored-by: Christopher Sidebottom <christopher.sidebottom@arm.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com> Co-authored-by: Leandro Nunes <leandro.nunes@arm.com> Co-authored-by: AndrewZhaoLuo <andrew.zhao.luo@gmail.com> Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

…ter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message apache#2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com>

@mdw-octoml

* nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. # This is the 1st commit message: Initial changes # This is the commit message apache#2: Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

@mdw-octoml

… only to `/docs` (apache#9031) * Add script to look for changed in doc dir * Modify Jenkinsfile * Minor changes in scripts * Working Jenkinsfile on selective stages on docs * Pass groovy formater on Jenkinsfile * Implementation of relay_to_tir target hook (apache#8423) This the first new hook proposed in the Additional Target Hooks RFC, longer term the compilation should move to using `Target` proper but this unblocks our current work whilst illustrating the eventual interface via `Target` in `src/relay/backend/contrib/example_target_hooks/relay_to_tir.cc` Ideally the host target would be annotated onto the `IRModule` so as this `Pass` could use it instead of defaulting to C but this is fine for now. * [CUDA] Fix dense tensorcore legalize type error when units is specified (apache#9030) * Fix dense tensorcore legalize type error when units is specified * revert black change due to different version from CI * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op (apache#9017) * [ONNX] QLinearAveragePool and QLinearGlobalAveragePool contrib op * Fix linter error for variable name and else after return * Separate quantized avg_pool impl and add TODO for global_avg_pool * Fix comment typo * Fix line break in `setup.py` (apache#9029) * [Onnx] Add SoftmaxCrossEntropyLoss (apache#8906) * nll loss v1 * add converter * decode strings in byte form * decode variable length inputs * make shapes correct * unsqueeze * proper weight handling * simplify if statement * fix tests * add comment about tests * delete extra file * lint * so cool * Update CI Lint Image Version (apache#8841) * Update CI Lint Image Version * trigger * [BUG] ToBasicBlockNormalForm immutability (apache#8778) * ToBasicBlockNormalForm immutability * better comment on ToBasicBlock * refine comment of ToBasicBlockForm * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm (apache#8807) * [GRAPH EXECUTOR,VM] Add benchmarking function to graph executor and vm This new benchmarking function is just a convenience function for calling time_evaluator on the underlying module. Hopefully this should make it easier for users to get good benchmarks of their code. * formatting * import order * more test, more comments, more precision * fix tests * add seconds descriptions to doc * Apply CPPLint to CRT Tests (apache#8844) This one was a bit trickier as there was more usage of dynamic arrays and less safe casts. I've tried to minimise the changes to just those required to passing linting. * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. (apache#8584) * [Relay][TOPI] Support of depthwise conv2d NHWC for Mali/Bifrost. Added initial tunable autotvm templates for depthwise conv2d with NHWC layout for Mali and Bifrost. * [Relay][TOPI] Misc fixes for depthwise conv2d Mali/Bifrost. - Fix assert for Bifrost. - Set reasonable default axis splits to avoid using tophub for NHWC. - Fixed typo: arm cpu -> Mali. * [Relay][TOPI] Fixed formatting in depthwise conv2d Mali/Bifrost. * Support for CMSIS-NN in Corstone300 Makefile (apache#8831) Change-Id: Ifc2305db4e11d1d15d45407287f8f0bea469100a * [microtvm][Zephyr] Increase timeout to fix flaky tests (apache#8846) * increase timeout * trigger * [AMP] Bump up tolerance on flaky test (apache#8850) * bumpy up tol * bumped tolerance up even more * jostle ci * [Hexagon] Rework tvm.target.hexagon() interface (apache#8823) * [Hexagon] Rework tvm.target.hexagon() interface Make the tvm.target.hexagon() function take most options as keyword parameters. This will allow adding additional parameters without changing the interface. No changes are required to existing code, except for changing positional parameters following the CPU version to keyword parameters, and updating the names of the keyword parameters: sim_args -> sim_options, llvm_args -> llvm_options, although the old names will be accepted for the time being. * formatting * change ' to " * Rename 'args' to 'config' for clarity * Use 'strip' instad of 'replace' * Restart build * [Pattern matching] Add an option to rewrite the graph only once (apache#8843) * [Pattern matching] Add an option to rewrite the graph only once If the graph returned from the callback consists of the original pattern, the rewriter will run in the loop, which is not always desired. So this patch proposes an option to run the rewriter only once. Change-Id: I85cf0a055b8961d52394f21c1e4d7aad0a7e1d06 * Make rewrite_once default to false Change-Id: Idf6f01f254c403158883681e75c2a5978efbd2d0 * update gpu and cpu (apache#8853) * VTA cmake change to include Verilator header for building tsim library (apache#8797) * VTA cmake file require Verilator include for tsim target. VTA module.cc uses svOpenArrayHandle to send wide data through DPI * Refactor Verialtor check conditions * Build TSIM only for CPU target. CPU target don't use -Werror to compile with Verilator. Jenkinsfile to have tvm_multilib_tsim defined for CPU build target. * remove build/libvta_tsim.so from non tsim targeting builds * Revert to enable TSIM build i386. Revert to -Werror in CPU config. Remove verilator CPP objects from cmake config for tsim and put them as include into vta module.cc to avoid Verilator compilation warnings * [FIX] Bug fix for a floormod rewrite simplify rule (apache#8852) * Update rewrite_simplify.cc * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * Update test_arith_rewrite_simplify.py * move rust lint script (apache#8726) * [AMP] Disallow fp16 conversion for summation-like ops (apache#8810) * [AMP] Disallow fp16 conversion for summation-like ops * test only structural equality * [TOPI] [Relay] Sparse Conv2d Implementation for 3x3 kernels (apache#8605) * [topi] add spconv2d_3x3 nhwc * [relay] sparse_conv2d: add kernel_size attr * [relay] add strategy for spconv2d_3x3 nhwc * [relay] pass to convert spconv2d with const args * [relay] convert sparse conv2d pass fixes * use array for sparse conv2d attr * fixup 1x1 tests; new 3x3 tests * extend repeat_interleave op for relay.Expr (apache#8839) Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> * Change AOT from ExprVisitor to MixedModeVisitor (apache#8856) This should allow better scale-ability for AOT when targeting larger networks. * Add a PaddlePaddle Frontend (apache#8645) * fix some problems for matmul * fix some problems for matmul * add alpha parameter for matmul * remove unnecessary condition * add TranslatedLayer which support model loaded by jit.load * add mul operator support * Add padding mode support for conv/pool2d * support 4 two-tuples * add paddle test case * add paddle conv2d case * update test_forward.py * fix paddle convert_matmul * add paddle multiply and matmul op test case * add test case and fix bug * delete import pandas * add paddlepaddle tests * modify the variable name of convert_reshape * formatting * formatting * use black to format python code * pylint check * Remove fluid api * black format Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> * [Runtime] add set_output_zero_copy (apache#8497) * Update graph_executor.h * Update graph_executor.cc * modify zero copy UT add set input zero copy * modify C style * add runtime test * realy build generatr the json Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> * [Hexagon] Change declaration order of unique_ptr objects to fix crash (apache#8859) A crash occurs when automatically deleting an instance of CodeGenHexagon because the LLVMContext object has already been freed. Objects of both types are created using unique_ptr, but the object managed by the LLVMContext unique_ptr is passed to CodeGenHexagon object (not as a unique_ptr). This crash is fixed by moving the declaration of the LLVMContext object before the CodeGenHexagon object. I'm not sure if this is the best way to fix this, but it does fix the crash. Also, in other files, the LLVMContext object is always created first. Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> * [Graph Executor, VM] Add end to end benchmarking of models (apache#8858) Add benchmarking that includes ovearhead of transfering inputs and outputs to and from the device. This should give an accurate measurement of the runtime a user would see when using the model. This is accomplished by adding functions that run from inputs to return values into the graph executor and the VM. * [UnitTests] Expose TVM pytest helpers as plugin (apache#8532) * [UnitTests] Expose TVM pytest helpers as plugin Previously, pytest helper utilities such as automatic parametrization of `target`/`dev`, or `tvm.testing.parameter` were only available for tests within the `${TVM_HOME}/tests` directory. This PR extracts the helper utilities into an importable plugin, which can be used in external tests (e.g. one-off debugging). * [UnitTests] Refactor the plugin-specific logic out into plugin.py. * [UnitTests] Moved marker definition out to global variable. * Remove AOT Executor header from Arduino project (apache#8857) * [Community] @mdw-octoml -> Reviewer (apache#8868) * [TIR] Fix opaque access in buffer locator pass and match_buffer in region detector (apache#8855) * init * fix * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * Update src/tir/transforms/plan_update_buffer_allocation_location.cc Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * address Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> * [Autoscheduler] Configurable workload keys (apache#8862) * change workload keys * remove binary string comparison * append the tuple not every integer * clean up * lint * dump workload keys to dags * fix things * change some strings * misc fixes, add tests * jostle ci * [Tutorial][Executor] Fix the usage of executors in tutorials (apache#8586) * fix: executor usage for keras tutorial * fix: executor usage for onnx tutorial * [Tutorial][Executor] Fix executors in tutorials * [Frontend][Onnx] Simplify onnx input since name accesses are not reliable. (apache#8867) * Simplify onnx input since name accesses are no longer supported. * move Celu importer. * [TIR] GetBlockReadWriteRegion (apache#8875) * [TIR] GetBlockReadWriteRegion * Fix black issue * Use constant reference for the interface * Fix lint issue * [RISCV] Add support for llvm parameter -mabi (-target-abi) (apache#8860) * [Community] @manupa-arm -> Committer (apache#8870) * adding Manupa to the contributors list * re-trigger CI * [RPC] Fix ios_rpc build (apache#8864) * [Vulkan][Target] Added the driver name to the vulkan target string. (apache#8882) Driver name (e.g. "NVIDIA", "radv", "AMD open-source driver") is read from the `driverName` property in [VkPhysicalDeviceDriverProperties](https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/VkPhysicalDeviceDriverProperties.html), or is left as `"unknown_driver_name"` if the driver does not support querying the driver name. * [ONNX][TOPI] Support select_last_index for argmin/max (apache#8816) * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * support select_last_index for argmin/max * reverse conditions which made on accident * forward args in reduce.py * make proper nodes for reduction ops * remove complicated nested lambdas * fix lambda capture for conversion * forward more arguments * forward more args * enable onnx tests * wrapping casts to remove ambiguity * revert changes extraneous * correct incorrect attrs being used for ops * change attributes * remove old impl * register new attribute node * clean up test * reformat * reformat * coolio * stable comparison * casts to avoid ambiguity * casting more * correct arg passing * fix broken input * OneElementReduceAttrs-->ArgReduceAttrs" * reduce boilerplate * change names * remove log statement * jostle ci Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> * refactor optimize GEMM on CPU tutorial (apache#8825) * refactor optimize GEMM on CPU tutorial * fix lint errors * fix more lint errors * fix typo * fix problem with redefinition of `k` add TODO and comments around loop unrolling clarify note on the array packing figure * reword general description of array packing * grap kaxis from compute definition * remove duplicate comments on unrolling * Change target string to Target object in the TE compiler and interpreter (apache#8835) * # This is a combination of 2 commits. Initial changes Ftarget string -> Target object works! * Fix remaining target strings * fix bad rebase * Fix typo * 1 more bad rebase fix * Lint * typo * Forgot to commit this * Add TargetStrHash and Map<Target... to std::unordered_map<Target... conversion fn * Passing most tests, yay * remove some comments * lint * target-str-to-target-object * Respond to change requests Co-authored-by: Jared Roesch <roeschinc@gmail.com> * [TensorIR][M2a] CacheRead/Write (apache#8863) Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> * [CI] make pre-commit hooks to run on every push instead of every commit (apache#8888) * [TVMScript] Fix printing ForNode annotations (apache#8891) * [1/10] CMSIS-NN graph partitioner for softmax (apache#8653) * cmsis graph partitioner for softmax Change-Id: I80ecd7bc5351f241b4674ef53b36e4398c8adb83 * Updated docstring in the partioning function Change-Id: Ieb4b623e5929cfdb6aa0235db64c825fac8d7055 * [microTVM][RVM] Add Arduino RVM (apache#8748) * Functioning Arduino Vagrant VM Begin building Arduino Vagrant VM Mostly working Vagrant VM Changes for debugging Add ignored json file Fix venv path * Generalize parts of RVM for multiple platforms cwd hack Add unit tests from apps directory to task_python_microtvm.sh Generalize parts of RVM for multiple platforms * Add Vagrantfile lint exceptions * Address PR comments Address Mehrdad's PR comments More PR comments Documentation tweaks Add dialout group to user * Rerun tests * Spresense fix * Rerun CI tests * Rerun tests * sce loss example * add comments, remove other tests * lint * lint * jostle * lint up * jostle * uncomment some tests * proper return * clean up * lint * minor merge errors Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Michalis Papadimitriou <mikepapadim@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com> * [Hexagon] Don't use {} initialization with FastRPC structures (apache#9033) The data members in FastRPC structures aren't guaranteed to remain in the same order. Replace aggregate initialization with direct, member-by-member initialization. * Test * Minor checkstyle issue * Test * Test file * Revert changed in unit tests * Change script name * Test * Revert format on groovy file * Remove test file * Minor change in script * Minor formating changes * Revert logic in conditions for changed files Co-authored-by: Christopher Sidebottom <christopher.sidebottom@arm.com> Co-authored-by: masahi <masahi129@gmail.com> Co-authored-by: Anirudh Sundar <quic_sanirudh@quicinc.com> Co-authored-by: Leandro Nunes <leandro.nunes@arm.com> Co-authored-by: AndrewZhaoLuo <andrew.zhao.luo@gmail.com> Co-authored-by: Andrew Zhao Luo <andrewzhaoluo@system76-pc.localdomain> Co-authored-by: Mehrdad Hessar <mhessar@octoml.ai> Co-authored-by: Jiawei Liu <jaway.liu@gmail.com> Co-authored-by: Tristan Konolige <tkonolige@octoml.ai> Co-authored-by: Christopher Sidebottom <chris.sidebottom@arm.com> Co-authored-by: Anastasia Stulova <38433336+AnastasiaStulova@users.noreply.github.com> Co-authored-by: Ashutosh Parkhi <86472128+ashutosh-arm@users.noreply.github.com> Co-authored-by: Krzysztof Parzyszek <kparzysz@quicinc.com> Co-authored-by: Elen Kalda <elen.kalda@arm.com> Co-authored-by: Anton Sorokin <anton.a.sorokin@intel.com> Co-authored-by: Chenfan <jcf94@outlook.com> Co-authored-by: Tantalus13A98B5F <jsl_713@live.com> Co-authored-by: Valery Chernov <black.chervi@gmail.com> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: Jason <928090362@qq.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming3.bjyz.baidu.com> Co-authored-by: wjj19950828 <wjjisloser@163.com> Co-authored-by: heliqi <1101791222@qq.com> Co-authored-by: Junru Shao <junrushao1994@gmail.com> Co-authored-by: Swift.Sun <sunjiwei@yeah.net> Co-authored-by: hwstaff <hwstaff@hwstaffdeMacBook-Pro.local> Co-authored-by: Cahoon, Brendon <bcahoon@quicinc.com> Co-authored-by: Lunderberg <Lunderberg@users.noreply.github.com> Co-authored-by: Yizhi Liu <liuyizhi@apache.org> Co-authored-by: Siyuan Feng <Hzfengsy@vip.qq.com> Co-authored-by: Ruihang Lai <lairuihangdongdong@qq.com> Co-authored-by: Josh Fromm <jwfromm@octoml.ai> Co-authored-by: Alexander Pivovarov <pivovaa@amazon.com> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai> Co-authored-by: Egor Churaev <egor.churaev@gmail.com> Co-authored-by: Adam Straw <astraw@octoml.ai> Co-authored-by: Lily Orth-Smith <lilyorthsmith@gmail.com> Co-authored-by: Jared Roesch <roeschinc@gmail.com> Co-authored-by: Siyuan Feng <Hzfengsy@sjtu.edu.cn> Co-authored-by: Wuwei Lin <wuwei@apache.org> Co-authored-by: Hongyi Jin <3231950289@qq.com> Co-authored-by: Bohan Hou <32121147+spectrometerHBH@users.noreply.github.com> Co-authored-by: Gavin Uberti <guberti@users.noreply.github.com>

jroesch and others added 5 commits August 23, 2021 23:15

# This is a combination of 2 commits.

c6447b6

# This is the 1st commit message: Initial changes # This is the commit message #2: Ftarget string -> Target object works!

Fix remaining target strings

3f19fca

fix bad rebase

8aaee0c

Fix typo

eb288ac

1 more bad rebase fix

76257bb

electriclilies requested review from anijain2305, jroesch, junrushao, jwfromm, MarisaKirisame, mbrookhart, slyubomirsky, vinx13, wweic, yzhliu, zhiics and ZihengJiang as code owners August 24, 2021 17:30

electriclilies commented Aug 24, 2021

View reviewed changes

src/relay/backend/build_module.cc Outdated Show resolved Hide resolved

Lint

d67e885

Mousius requested changes Aug 24, 2021

View reviewed changes

src/relay/backend/build_module.cc Outdated Show resolved Hide resolved

typo

ee7881e

Mousius requested changes Aug 24, 2021

View reviewed changes

Forgot to commit this

e3ca300

junrushao approved these changes Aug 24, 2021

View reviewed changes

Mousius approved these changes Aug 24, 2021

View reviewed changes

Passing most tests, yay

8da2c54

electriclilies requested review from areusch, comaniac, kparzysz-quic, masahi, merrymercy and tqchen as code owners August 27, 2021 18:34

electriclilies added 3 commits August 27, 2021 11:40

remove some comments

1ebe623

lint

4a65400

target-str-to-target-object

4205389

junrushao approved these changes Aug 30, 2021

View reviewed changes

mbs-octoml reviewed Aug 30, 2021

View reviewed changes

Respond to change requests

29f802c

electriclilies mentioned this pull request Aug 31, 2021

Remove LoweredModule #8886

Merged

Mousius approved these changes Aug 31, 2021

View reviewed changes

jroesch merged commit 7b91e62 into apache:main Aug 31, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change target string to Target object in the TE compiler and interpreter #8835

Change target string to Target object in the TE compiler and interpreter #8835

electriclilies commented Aug 24, 2021

Mousius commented Aug 24, 2021

Mousius Aug 24, 2021

electriclilies Aug 24, 2021

Mousius Aug 24, 2021

junrushao left a comment

Mousius left a comment

mbs-octoml commented Aug 24, 2021

Mousius commented Aug 25, 2021

mbs-octoml commented Aug 25, 2021

mbs-octoml commented Aug 25, 2021

electriclilies commented Aug 27, 2021

junrushao left a comment

junrushao Aug 30, 2021

electriclilies Aug 30, 2021 •

edited

Loading

mbs-octoml Aug 30, 2021

electriclilies Aug 30, 2021

mbs-octoml commented Aug 30, 2021

Mousius left a comment

Mousius Aug 31, 2021

jroesch Aug 31, 2021 •

edited

Loading

electriclilies commented Aug 31, 2021

		if (lowered_funcs.find(Target("ext_dev")) != lowered_funcs.end()) {
		lowered_funcs.Set(Target("ext_dev"), IRModule());

-    if (lowered_funcs.find(Target("ext_dev")) != lowered_funcs.end()) {
-      lowered_funcs.Set(Target("ext_dev"), IRModule());
+    Target ext_dev("ext_dev");
+    if (lowered_funcs.find(ext_dev) != lowered_funcs.end()) {
+      lowered_funcs.Set(ext_dev, IRModule());

Change target string to Target object in the TE compiler and interpreter #8835

Change target string to Target object in the TE compiler and interpreter #8835

Conversation

electriclilies commented Aug 24, 2021

Mousius commented Aug 24, 2021

Mousius Aug 24, 2021

Choose a reason for hiding this comment

electriclilies Aug 24, 2021

Choose a reason for hiding this comment

Mousius Aug 24, 2021

Choose a reason for hiding this comment

junrushao left a comment

Choose a reason for hiding this comment

Mousius left a comment

Choose a reason for hiding this comment

mbs-octoml commented Aug 24, 2021

Mousius commented Aug 25, 2021

mbs-octoml commented Aug 25, 2021

mbs-octoml commented Aug 25, 2021

electriclilies commented Aug 27, 2021

junrushao left a comment

Choose a reason for hiding this comment

junrushao Aug 30, 2021

Choose a reason for hiding this comment

electriclilies Aug 30, 2021 • edited Loading

Choose a reason for hiding this comment

mbs-octoml Aug 30, 2021

Choose a reason for hiding this comment

electriclilies Aug 30, 2021

Choose a reason for hiding this comment

mbs-octoml commented Aug 30, 2021

Mousius left a comment

Choose a reason for hiding this comment

Mousius Aug 31, 2021

Choose a reason for hiding this comment

jroesch Aug 31, 2021 • edited Loading

Choose a reason for hiding this comment

electriclilies commented Aug 31, 2021

electriclilies Aug 30, 2021 •

edited

Loading

jroesch Aug 31, 2021 •

edited

Loading