Merge e5a2a26 into 2a386ec

pytorch · Aug 24, 2023 · 51f38c5 · 51f38c5
2 parents 2a386ec + e5a2a26
commit 51f38c5
Show file tree

Hide file tree

Showing 3 changed files with 34 additions and 6 deletions.
diff --git a/docs/FAQs.md b/docs/FAQs.md
@@ -1,6 +1,7 @@
 # FAQ'S
 Contents of this document.
 * [General](#general)
+* [Performance](#performance)
 * [Deployment and config](#deployment-and-config)
 * [API](#api)
 * [Handler](#handler)
@@ -34,9 +35,23 @@ No, As of now only python based models are supported.
 Torchserve is derived from Multi-Model-Server. However, Torchserve is specifically tuned for Pytorch models. It also has new features like Snapshot and model versioning.
 
 ### How to decode international language in inference response on client side?
-By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode. 
+By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode.
 
-If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py#L55
+If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py
+
+## Performance
+
+Relevant documents.
+- [Performance Guide](performance_guide.md)
+
+### How do I improve TorchServe performance on CPU?
+CPU performance is heavily influenced by launcher core pinning. We recommend setting the following properties in your `config.properties`:
+
+```bash
+cpu_launcher_enable=true
+cpu_launcher_args=--use_logical_core
+```
+More background on improving CPU performance can be found in this [blog post](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles).
 
 ## Deployment and config
 Relevant documents.
@@ -97,7 +112,7 @@ TorchServe looks for the config.property file according to the order listed in t
 
 - [models](configuration.md): Defines a list of models' configuration in config.property. A model's configuration can be overridden by [management API](management_api.md). It does not decide which models will be loaded during TorchServe start. There is no relationship b.w "models" and "load_models" (ie. TorchServe command line option [--models](configuration.md)).
 
-### 
+###
 
 ## API
 Relevant documents
@@ -133,7 +148,7 @@ Refer to [default handlers](default_handlers.md) for more details.
 
 ### Is it possible to deploy Hugging Face models?
 Yes, you can deploy Hugging Face models using a custom handler.
-Refer to [HuggingFace_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md#huggingface-transformers) for example. 
+Refer to [HuggingFace_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md#huggingface-transformers) for example.
 
 ## Model-archiver
  Relevant documents

diff --git a/docs/index.rst b/docs/index.rst
@@ -56,6 +56,13 @@ What's going on in TorchServe?
    :link: performance_guide.html
    :tags: Performance,Troubleshooting
 
+.. customcarditem::
+   :header: Large Model Inference
+   :card_description: Serving Large Models with TorchServe
+   :image: https://raw.githubusercontent.com/pytorch/serve/master/docs/images/ts-lmi-internal.png
+   :link: large_model_inference.html
+   :tags: Large-Models,Performance
+
 .. customcarditem::
    :header: Troubleshooting
    :card_description: Various updates on Torcherve and use cases.

diff --git a/docs/performance_guide.md b/docs/performance_guide.md
@@ -44,11 +44,17 @@ TorchServe exposes configurations that allow the user to configure the number of
 
 <h4>TorchServe On CPU </h4>
 
-If working with TorchServe on a CPU here are some things to consider that could improve performance:
+If working with TorchServe on a CPU you can improve performance by setting the following in your `config.properties`:
+
+```bash
+cpu_launcher_enable=true
+cpu_launcher_args=--use_logical_core
+```
+These settings improve performance significantly through launcher core pinning.
+The theory behind this improvement is discussed in [this blog](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles) which can be quickly summarized as:
 * In a hyperthreading enabled system, avoid logical cores by setting thread affinity to physical cores only via core pinning.
 * In a multi-socket system with NUMA, avoid cross-socket remote memory access by setting thread affinity to a specific socket via core pinning.
 
-These principles can be automatically configured via an easy to use launch script which has already been integrated into TorchServe. For more information take a look at this [case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles) which dives into these points further with examples and explanations from first principles.
 
 <h4>TorchServe on GPU</h4>