Skip to content

Commit

Permalink
Merge e5a2a26 into 2a386ec
Browse files Browse the repository at this point in the history
  • Loading branch information
sekyondaMeta committed Aug 24, 2023
2 parents 2a386ec + e5a2a26 commit 51f38c5
Show file tree
Hide file tree
Showing 3 changed files with 34 additions and 6 deletions.
23 changes: 19 additions & 4 deletions docs/FAQs.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# FAQ'S
Contents of this document.
* [General](#general)
* [Performance](#performance)
* [Deployment and config](#deployment-and-config)
* [API](#api)
* [Handler](#handler)
Expand Down Expand Up @@ -34,9 +35,23 @@ No, As of now only python based models are supported.
Torchserve is derived from Multi-Model-Server. However, Torchserve is specifically tuned for Pytorch models. It also has new features like Snapshot and model versioning.

### How to decode international language in inference response on client side?
By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode.
By default, Torchserve uses utf-8 to encode if the inference response is string. So client can use utf-8 to decode.

If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py#L55
If a model converts international language string to bytes, client needs to use the codec mechanism specified by the model such as in https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/model_handler_generalized.py

## Performance

Relevant documents.
- [Performance Guide](performance_guide.md)

### How do I improve TorchServe performance on CPU?
CPU performance is heavily influenced by launcher core pinning. We recommend setting the following properties in your `config.properties`:

```bash
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core
```
More background on improving CPU performance can be found in this [blog post](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles).

## Deployment and config
Relevant documents.
Expand Down Expand Up @@ -97,7 +112,7 @@ TorchServe looks for the config.property file according to the order listed in t

- [models](configuration.md): Defines a list of models' configuration in config.property. A model's configuration can be overridden by [management API](management_api.md). It does not decide which models will be loaded during TorchServe start. There is no relationship b.w "models" and "load_models" (ie. TorchServe command line option [--models](configuration.md)).

###
###

## API
Relevant documents
Expand Down Expand Up @@ -133,7 +148,7 @@ Refer to [default handlers](default_handlers.md) for more details.

### Is it possible to deploy Hugging Face models?
Yes, you can deploy Hugging Face models using a custom handler.
Refer to [HuggingFace_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md#huggingface-transformers) for example.
Refer to [HuggingFace_Transformers](https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/README.md#huggingface-transformers) for example.

## Model-archiver
Relevant documents
Expand Down
7 changes: 7 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ What's going on in TorchServe?
:link: performance_guide.html
:tags: Performance,Troubleshooting

.. customcarditem::
:header: Large Model Inference
:card_description: Serving Large Models with TorchServe
:image: https://raw.githubusercontent.com/pytorch/serve/master/docs/images/ts-lmi-internal.png
:link: large_model_inference.html
:tags: Large-Models,Performance

.. customcarditem::
:header: Troubleshooting
:card_description: Various updates on Torcherve and use cases.
Expand Down
10 changes: 8 additions & 2 deletions docs/performance_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,17 @@ TorchServe exposes configurations that allow the user to configure the number of

<h4>TorchServe On CPU </h4>

If working with TorchServe on a CPU here are some things to consider that could improve performance:
If working with TorchServe on a CPU you can improve performance by setting the following in your `config.properties`:

```bash
cpu_launcher_enable=true
cpu_launcher_args=--use_logical_core
```
These settings improve performance significantly through launcher core pinning.
The theory behind this improvement is discussed in [this blog](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles) which can be quickly summarized as:
* In a hyperthreading enabled system, avoid logical cores by setting thread affinity to physical cores only via core pinning.
* In a multi-socket system with NUMA, avoid cross-socket remote memory access by setting thread affinity to a specific socket via core pinning.

These principles can be automatically configured via an easy to use launch script which has already been integrated into TorchServe. For more information take a look at this [case study](https://pytorch.org/tutorials/intermediate/torchserve_with_ipex#grokking-pytorch-intel-cpu-performance-from-first-principles) which dives into these points further with examples and explanations from first principles.

<h4>TorchServe on GPU</h4>

Expand Down

0 comments on commit 51f38c5

Please sign in to comment.