[Ansor][AutoTVM v2.0] Part 0: Ansor minimum system for auto schedule …

…generating (#5962) * Code migration Start (#1) * Init commit: Code migration Start * Add loop_state.cc/h * Add ComputeDAG basic test * Split transform_step out & Update more UTs (#3) * Split transform_step out * Update GetProducers & GetConsumers * Update UTs * Add UT for CacheReadWrite & Some bug fix * Add search_task, measure and serialization (#4) * Add FollowSplit & FollowFusedSplit tests * Update dag.InferBound & its UT * Add search_task, measure and serialization * Update Serialization UT * Add MetaTileRewritePolicy (#5) * Add feature * Add cost_model, meta_tile_rewrite_policy * Add MetaTileRewritePolicy basic UT * Basic Python API for State (#6) * Add Basic Python API for State * Add UTs for State * Add Python API: Measure & Task (#7) * Update the return value of state operation * Add task * Copy measure.py & utils.py * Fix LocalBuilder * Fix LocalRunner * Add ansor.auto_schedule() API; First AutoSchedule working version(#8) * Add basic Python support for ansor.auto_schedule * Update AutoSchedule API * Bug fix for get the attach point of a fused iter * Update UT after infer bug fix * Bug fix & Add python serialization API (#10) * Delete C++ UT hack since Python is ready * Add ndarray.non_empty * Update Serialization python API * Improve code style, python wrapper and test cases (#11) * Update c++ code style and unit test * Update python State wrapper and test cases * fix unit tests * Add RPCRunner & OpenCL/CUDA test (#12) * Add RPCRunner & OpenCL search test * Add CUDA search test * Add RPCRunner test * rebase to upstream/master * Add Ansor basic tutorial (#13) * Add basic tutorial * migrate feature extraction (#14) * Add XGBModel & RPCRunnerWarpper (#15) * Add XGBModel & RPCRunnerWarpper * Revert "Add Parallel Granularity Mutation" * Migrate workload_registry.py (#16) * add workload registry * update * update * add task scheduler (#17) * Add conv2d cuda tutorial with workload registry (#18) * add tune_test.py (the old tune_wkl.py) (#19) * add tune_test.py (the old tune_wkl.py) * update * fix measure * fix for gpu * Code refine for tune_test.py & Add a pre load callback (#20) * Bug fix for tutorials * Add PreLoadMeasuredStates * Add search_callback support for task tuner * Code refine for tune_test.py * Update * Update * Update * Update * Bug fix * Add python custom sketch rule (#21) * Add custom sketch rule * Bug fix * Ansor Relay Integration (without layout rewrite) (#22) * relay integration * Add tune_op_subgraph.py & Some code clean for tune_network.py (#23) * Add single op tune scripts * Add tune subgraph support * Merge all op & all subgraph to one file * Rename file * add explicit_unroll_max_extent (#25) * Add Index simplification & API update (#26) * Add vectorized cooperative_fetching test * Update math simplify for vectorized CF * File rename * Update tune_network * API update * Update PreLoadMeasuredStates & Some bug fix (#27) * Add a threading wrapper to fix the test bug * Set default TVM_USE_AUTO_SCHEDULER to false * Update PreLoadMeasuredStates callback * Add tensorize step for loop_state (#31) * Add tensorize step * State python api update (#33) * Start to update api * Add compute_dag to state * API update * kernel layout rewrite (#28) * kernel layout rewrite * remove some hacks * add defuse_ops pass and move kernel_layout_rewrite pass after fuse_ops pass * set TVM_RELAY_DISABLE_BUILD_CACHE for task extraction and prepare_layout_rewrite * [cache flush] port cache flush to ansor (#32) * Improve relay integration (#34) * tmp checkpoint * Improve relay integration * Improve relay integration * Fix xgb error & Simplify dispatcher (#35) * Rename "MetaTileRewritePolicy" to "SketchPolicy". (#36) * Rename "MetaTileRewritePolicy" to "SketchPolicy". * Add a new class for auto_unroll_max_step, storage_offset in StageNode * fix tune_op_subgraph.py * rebase * Migrate all node::make to noderef's construct function (#37) * Start to move xxxnode::make to noderef() * Update * Update * Finish transform_step * Finish comute dag & auto schedule * Update * Update * Update * Update * Update * Code refine * Code refine * Code refine * Update * Update * Some lint fix & Recover the double constructor of tvm::PrimExpr (#39) * lint fix * clang-format-fix * pylint fix * Update * Recover the double constructor of tvm::PrimExpr * Fix pylint * pylint fix * pylint fix * Add MutateComputeLocation and MutateParallel in evolutionary search (#40) * Add MutateComputeLocation and MutateParallel in evolutionary search * fix lint * Improve loop state python API (stage_tensors -> stage_ops) (#41) * improve loop state python API (stage_tensors -> stage_ops) * fix * ComputeDAG bug fix & Add Custom TensorCore Matmul Example (#42) * Bug Fix * Sample example of Custom TensorCore Matmul * Rever Commits, Start to build minimum Ansor system * Code clean for minimum Ansor system * Bug fix & Delete AccessAnalyzer * Delete attachmap & Code clean * Doc update Update statenode::stages from vector to Array * Headfile update & Python doc update * clang-format fix * pylint fix * Update * Doc update * Update * Bug fix after code merge to the new master * clang-format fix * Update * Update * Update std::vector to Array; Update verbosity setting; Some commemts addressed * std::vector->Array & std::string->String * Add init_state to ComputeDAG * Update * Update some unordered_map to Map * clang-format fix * Comments addressed Delete ReplayAndInferBound Delete ReplaySteps & InferBoundCommon * Lint fix * Update * Update * Update * Update * Update * Update * Update * Update * Update * Rename ansor namespace to auto_schedule * Update * Rename ThreadPool to ParallelFor * Add parallel_for * Remove ThreadPool * Update python/tvm/auto_schedule/auto_schedule.py * trigger CI Co-authored-by: Lianmin Zheng <lianminzheng@gmail.com> Co-authored-by: Minmin Sun (孙敏敏) <minmin.smm@alibaba-inc.com> Co-authored-by: Zhao Wu <zhaowu@apache.org>
apache · Jul 15, 2020 · 456c58d · 456c58d
1 parent a23592c
commit 456c58d
Show file tree

Hide file tree

Showing 35 changed files with 6,266 additions and 0 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -185,6 +185,7 @@ assign_source_group("Include" ${GROUP_INCLUDE})
 
 # Source file lists
 file(GLOB_RECURSE COMPILER_SRCS
+    src/auto_schedule/*.cc
     src/node/*.cc
     src/ir/*.cc
     src/arith/*.cc

diff --git a/python/tvm/auto_schedule/__init__.py b/python/tvm/auto_schedule/__init__.py
@@ -0,0 +1,34 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# pylint: disable=unused-import, redefined-builtin
+""" Namespace for TVM Auto-scheduler. """
+
+from . import compute_dag
+from . import measure
+from . import measure_record
+from . import loop_state
+from . import utils
+from . import workload_registry
+
+# Shortcut
+from .compute_dag import ComputeDAG
+from .auto_schedule import SearchTask, TuningOptions, HardwareParams, \
+    auto_schedule, EmptyPolicy
+from .measure import MeasureInput, LocalBuilder, LocalRunner
+from .measure_record import RecordToFile, RecordReader, load_best, \
+    load_records, save_records
+from .workload_registry import register_workload, make_workload_key
diff --git a/python/tvm/auto_schedule/_ffi_api.py b/python/tvm/auto_schedule/_ffi_api.py
@@ -0,0 +1,22 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+""" Register FFI APIs from C++ for the namespace tvm.auto_schedule. """
+import tvm._ffi
+
+
+tvm._ffi._init_api("auto_schedule", __name__)
diff --git a/python/tvm/auto_schedule/auto_schedule.py b/python/tvm/auto_schedule/auto_schedule.py
@@ -0,0 +1,194 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+User interface for TVM Auto-scheduler.
+
+The basic schedule search process for TVM Auto-scheduler is designed to be:
+`Program sampling` -> `Performance Tuning`.
+
+In `Program sampling`, we use some predefined precise or heuristic rules to generate several
+initial schedules. Based on these initial starting points, we perform `Performance Tuning` which
+uses cost model based evolutionary search to select schedules with the best performance.
+
+Candidate schedules are measured against the specific hardware target.
+"""
+
+import tvm._ffi
+from tvm.runtime import Object
+from .measure import LocalBuilder, LocalRunner
+from . import _ffi_api
+
+
+@tvm._ffi.register_object("auto_schedule.HardwareParams")
+class HardwareParams(Object):
+    """ The parameters of target hardware used to guide the search policy
+
+    TODO(jcf94): This is considered to be merged with the new Target specification:
+    https://discuss.tvm.ai/t/rfc-tvm-target-specification/6844
+
+    Parameters
+    ----------
+    num_cores : int
+        The number of device cores.
+    vector_unit_bytes : int
+        The width of vector units in bytes.
+    cache_line_bytes : int
+        The size of cache line in bytes.
+    """
+    def __init__(self, num_cores, vector_unit_bytes, cache_line_bytes):
+        self.__init_handle_by_constructor__(_ffi_api.HardwareParams, num_cores,
+                                            vector_unit_bytes, cache_line_bytes)
+
+
+@tvm._ffi.register_object("auto_schedule.SearchTask")
+class SearchTask(Object):
+    """ The computation information and hardware parameters for a specific schedule search task.
+
+    Parameters
+    ----------
+    dag : ComputeDAG
+        The ComputeDAG for the corresponding compute declaration.
+    workload_key : str
+        The workload key for the corresponding compute declaration.
+    target : tvm.target.Target
+        The target device of this search task.
+    target_host : Optional[tvm.target.Target]
+        The target host device of this search task.
+    hardware_params : Optional[HardwareParams]
+        Hardware parameters used in this search task.
+    """
+    def __init__(self, dag, workload_key, target, target_host=None,
+                 hardware_params=None):
+        self.__init_handle_by_constructor__(_ffi_api.SearchTask, dag,
+                                            workload_key, target, target_host,
+                                            hardware_params)
+
+
+@tvm._ffi.register_object("auto_schedule.SearchPolicy")
+class SearchPolicy(Object):
+    """ The base class of search policies. """
+
+
+@tvm._ffi.register_object("auto_schedule.EmptyPolicy")
+class EmptyPolicy(SearchPolicy):
+    """ This is an example empty search policy which will always generate
+    the init state of ComputeDAG.
+    """
+    def __init__(self):
+        self.__init_handle_by_constructor__(_ffi_api.EmptyPolicy)
+
+
+@tvm._ffi.register_object("auto_schedule.TuningOptions")
+class TuningOptions(Object):
+    """ This controls the options of performance tuning.
+
+    Parameters
+    ----------
+    num_measure_trials: int = 0
+      The number of measurement trials.
+      The search policy measures `num_measure_trials` schedules in total and returns the best one
+      among them.
+      With `num_measure_trials` == 0, the policy will do the schedule search but won't involve
+      measurement. This can be used to get a runnable schedule quickly without auto-tuning.
+    early_stopping: Optional[int]
+      Stop the tuning early if getting no improvement after n measurements.
+    num_measures_per_round: int = 64
+      The number of schedules to be measured at each search round.
+      The whole schedule search process will try a total number of `num_measure_trials` in several
+      rounds.
+    verbose: int = 1
+      Verbosity level. 0 for silent, 1 to output information during schedule search.
+    builder: Union[ProgramBuilder, str] = 'local'
+      ProgramBuilder which builds the program.
+    runner: Union[ProgramRunner, str] = 'local'
+      ProgramRunner which runs the program and measures time costs.
+    measure_callbacks: Optional[List[MeasureCallback]]
+      Callback functions called after each measurement.
+      Candidates:
+        - auto_schedule.RecordToFile
+    pre_search_callbacks: Optional[List[SearchCallback]]
+      Callback functions called before the search process.
+      Candidates:
+        - auto_schedule.PreloadMeasuredStates
+        - auto_schedule.PreloadCustomSketchRule
+        TODO(jcf94): Add these implementation in later PRs.
+    """
+    def __init__(self, num_measure_trials=0, early_stopping=None, num_measures_per_round=64,
+                 verbose=1, builder='local', runner='local', measure_callbacks=None,
+                 pre_search_callbacks=None):
+        if isinstance(builder, str):
+            if builder == 'local':
+                builder = LocalBuilder()
+            else:
+                raise ValueError("Invalid builder: " + builder)
+        elif not isinstance(builder, tvm.auto_schedule.measure.ProgramBuilder):
+            raise ValueError("Invalid builder: " + builder +
+                             " . TuningOptions expects a ProgramBuilder or string.")
+
+        if isinstance(runner, str):
+            if runner == 'local':
+                runner = LocalRunner()
+            else:
+                raise ValueError("Invalid runner: " + runner)
+        elif not isinstance(runner, tvm.auto_schedule.measure.ProgramRunner):
+            raise ValueError("Invalid runner: " + runner +
+                             " . TuningOptions expects a ProgramRunner or string.")
+
+        self.__init_handle_by_constructor__(
+            _ffi_api.TuningOptions, num_measure_trials, early_stopping if early_stopping else -1,
+            num_measures_per_round, verbose, builder, runner, measure_callbacks,
+            pre_search_callbacks)
+
+
+def auto_schedule(task, search_policy='default', tuning_options=None):
+    """ Do auto scheduling for a computation declaration.
+
+    The task parameter can be a `string` as workload_key, or directly
+    passing a `SearchTask` as input.
+
+    Parameters
+    ----------
+    task : SearchTask
+        The SearchTask for the computation declaration.
+    search_policy : Union[SearchPolicy, str] = 'default'
+        The search policy to be used for schedule search.
+    tuning_options : Optional[TuningOptions]
+        Tuning and measurement options.
+
+    Returns
+    -------
+        A `te.schedule` and the a list of `te.Tensor` to be used in `tvm.lower` or `tvm.build`.
+    """
+    if not isinstance(task, SearchTask):
+        raise ValueError("Invalid task: " + task +
+                         " . `auto_schedule.auto_schedule` expects a SearchTask.")
+
+    if isinstance(search_policy, str):
+        if search_policy == 'default':
+            # TODO(jcf94): This is an example policy for minimum system, will be upgrated to
+            # formal search policy later.
+            search_policy = EmptyPolicy()
+        else:
+            raise ValueError("Invalid search policy: " + search_policy)
+    elif not isinstance(search_policy, SearchPolicy):
+        raise ValueError("Invalid search policy: " + search_policy +
+                         " . `auto_schedule.auto_schedule` expects a SearchPolicy or a string.")
+
+    sch, tensors = _ffi_api.AutoSchedule(task, search_policy,
+                                         tuning_options if tuning_options else TuningOptions())
+    return sch, tensors