Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for quantization and custom audio context size to OpenVino #2184

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dscripka
Copy link
Contributor

Compiling with OpenVino on supported platforms can significantly improve performance, but currently it lacks two other features that can also significantly improve performance:

  • Using quantized models (10%+ performance increase)
  • Supporting custom audio context sizes (~3x performance increase for short audio files)

This PR adds both of these as optional arguments to the OpenVino model conversion scripts, enabling performance of quantization, audio context size, and OpenVino to stack.

There are a few important caveats, however:

  • The OpenVino encoder (to my knowledge) only supports a fixed audio context size, so the converted model is somewhat more restricted
  • This does require monkey patching a method from the openai-whisper library in convert-whisper-to-openvino.py, which isn't ideal
  • Quantization is done with nncf 2.7.0, which currently only supports 4 and 8 bit quantization

Despite these, the performance improvement can be so substantial for certain use cases it may be worth it. For example, on the ~10 second jfk.wav file on a Intel(R) Xeon(R) W-2123 CPU:

Threads Quant Model Encoder Time (s) Arguments Build
1 8 bit small-en 8 -bs 1 -ac 1500 BLAS = 1
1 8 bit small-en 0.8 -bs 1 -ac 550 OPENVINO = 1

Command to produce the OpenVino model for this PR: python convert-whisper-to-openvino.py --model small.en -ac 550 -qb 8

@dscripka dscripka changed the title Openvino audio ctx quantize Add support for quantization and custom audio context size to OpenVino May 25, 2024
@jason-ni
Copy link

Is there a way to improve openvino encoder to support dynamic audio_ctx? That would be much more usefull.

@dscripka
Copy link
Contributor Author

I agree that would be ideal, but it would require more substantial modifications to how the encoder is converted and the actual OpenVino runtime implementation. Possible perhaps, but difficult.

@jason-ni
Copy link

But CPU and other GPU backends all support dynamic audio_ctx given fixed model encoding size. If we allow converting OpenVino model with different encoding size, it would make model users confused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants