Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement #9

Closed
ogmkp opened this issue Aug 25, 2023 · 20 comments
Closed

Performance improvement #9

ogmkp opened this issue Aug 25, 2023 · 20 comments

Comments

@ogmkp
Copy link

ogmkp commented Aug 25, 2023

Hi, I'm testing on Debian 12 with OBS 29.1.3 with the pre-set parameters, my 4-thread CPU grinds and I get a randomly generated sentence with a huge delay.
I've looked at Whisper.cpp but I can't correlate the parameters.
Do you have any recommended settings for fast, resource-efficient transcription?

Thanks a lot!

@Destroy666x
Copy link
Contributor

Destroy666x commented Sep 8, 2023

Confirming here, even on clean launch (after laptop restart) the CPU jumps from like 8% to 40-50% whenever I speak. Then sentences gets generated very late and often not too accurately. Using default settings. With bigger models it's way worse in terms of latency and CPU usage, but ofc better with accuracy.

Intel i7 7820 CPU, Windows 10, OBS 29.1.3

@Destroy666x
Copy link
Contributor

Destroy666x commented Sep 8, 2023

Would it be possible to allow GPU usage instead? In general, my GPU is more free, as I'm mainly streaming games relying more on CPU. I see Whisper can run on GPU. This also shows better performance with GPU: https://github.com/MiscellaneousStuff/openai-whisper-cpu#results if I understand correctly.

@royshil
Copy link
Collaborator

royshil commented Sep 9, 2023

Yes I'm working on acceleration for Whisper.cpp build and I'll release a pull request as soon as I got it working on my PC

There are several options.. but the general goal of GGML is to enable running on CPUs and their inherent acceleration e.g. SIMD

I'm still unpacking this, but it's important to get it

@royshil
Copy link
Collaborator

royshil commented Sep 11, 2023

@Destroy666x can you try the build in https://github.com/royshil/obs-localvocal/actions/runs/6142210185#artifacts ?

it should be much faster and more performant

@Destroy666x
Copy link
Contributor

For me CPU usage seems still rather high with that, maybe a few % lower on average.

@royshil
Copy link
Collaborator

royshil commented Sep 11, 2023

@Destroy666x so there is improvement! this is a good thing
for me it improves 100% e.g. x2 faster

were you able to benchmark whisper.cpp separately?

i think i will merge this in anyway, since it's an improvement

@Destroy666x
Copy link
Contributor

Destroy666x commented Sep 11, 2023

Well, I think it is, but I don't quite know how to check it consistently, as it ran under different conditions. Similar, but different, as Windows definitel had different random processes like indexers and what not launched. But ye, improvement was like 35-45% compared to previous 40-50% reported by OBS.

As for separately, do you mean me checking Whisper's different options outside of this plugin? I can check that when I'll have time.

@Destroy666x
Copy link
Contributor

Destroy666x commented Sep 11, 2023

I see there's bench.exe.

  • for a single run on non-BLAS with tiny model it tells me around 1-1.5 second
  • for a single run on BLAS with tiny model it tells me around 1-1.5 second
  • for a single run on non-BLAS with small model it tells me around 11-13 seconds
  • for a single run on BLAS with with small model it tells me around 10-12 seconds

Haven't found how to do more runs for consistent test.

According to this it should work better in OBS with tiny model at least as I also had bigger delay with that.

And, interestingly, after increasing threads from 4 to 8, small went up to ~15 seconds 🤔

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

thanks for this research @Destroy666x
i'm looking into CLBlast acceleration next. it should be supported on many platforms and will be able to use the GPU

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

here are some timings i get consistently

No BLAS

whisper_print_timings:     load time =   137.18 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =  1551.64 ms /     1 runs ( 1551.64 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1724.93 ms

OpenBLAS

whisper_print_timings:     load time =   145.05 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =  1107.12 ms /     1 runs ( 1107.12 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1287.19 ms

I conclude OpenBLAS brings the most performance on my PC

CLBlast

whisper_print_timings:     load time =  1163.69 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =  2474.72 ms /     1 runs ( 2474.72 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  3670.20 ms

@Destroy666x
Copy link
Contributor

With what model and CPU/GPU, out of curiosity?

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

This is with an Intel i7-8700T
It has an NVidia GPU but it's not being used.
The Intel GPU is UHD Graphics 630, which is being used by CLBlast, but as you can see it doesn't bring any performance boost.

@Destroy666x
Copy link
Contributor

And tiny model I assume? Weird that it doesn't use the "real" GPU.

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

yes this is the tiny model
Nvidia/CUDA GPU is not being used since Whisper wasn't built to use them.
I'm trying Whisper w CUDA now to see if it makes a difference...

@Destroy666x
Copy link
Contributor

Oh, so there's yet another cuBLAS just for CUDA, I see that now: ggerganov/whisper.cpp#834 I'll test it on my machine too, assuming compilation is as easy as shown there.

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

this is the timing for whisper with CUDA

whisper_print_timings:     load time =  1227.32 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     0.00 ms
whisper_print_timings:   sample time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   encode time =   728.45 ms /     1 runs (  728.45 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  1991.31 ms

it is faster than the rest, but not a huge gain over OpenBLAS

the downside with CUDA is that it's so big there's no hope to ship it with the plugin. and the compatibility is horrendous, e.g. if i compile vs CUDA v12.2 and the client has v11.1 then it doesn't work.

@Destroy666x
Copy link
Contributor

Destroy666x commented Sep 12, 2023

For me it was 1.5+x faster on NVidia GeForce 1080.

Perhaps, could the executable be optionally provided through path setting, since there are so many different options? They're compatible with your code, right? Then additional options like downloading CUDA and compiling that version could be described through documentation.

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

@Destroy666x ok i've added CUDA building instructions.

as soon as this clears i'm going to merge since i'd like to release a new version

@royshil
Copy link
Collaborator

royshil commented Sep 12, 2023

#12 has landed and introduced performance improvements
closing for now until we open again for discussion and requests

@royshil royshil closed this as completed Sep 12, 2023
@ogmkp
Copy link
Author

ogmkp commented Sep 12, 2023

Hey I opened this issue because the plugin is extremely slow and cpu consuming on Linux, please keep it open !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@royshil @Destroy666x @ogmkp and others