Faster whisper
For reference, here's the time and memory usage that are required to transcribe 13 minutes of audio using different implementations:.
One feature of Whisper I think people underuse is the ability to prompt the model to influence the output tokens. Some examples from my terminal history:. Although I seem to have trouble to get the context to persist across hundreds of tokens. Tokens that are corrected may revert back to the model's underlying tokens if they weren't repeated enough. We need a better solution. It would be much better if there were an easy way to fine tune Whisper to learn new vocab. Why can Whisper not just reuse the prompt for every 30 second window?
Faster whisper
Faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. This container provides a Wyoming protocol server for faster-whisper. We utilise the docker manifest for multi-platform awareness. More information is available from docker here and our announcement here. Simply pulling lscr. This image provides various versions that are available via tags. Please read the descriptions carefully and exercise caution when using unstable or development tags. When using the gpu tag with Nvidia GPUs, make sure you set the container to use the nvidia runtime and that you have the Nvidia Container Toolkit installed on the host and that you run the container with the correct GPU s exposed. See the Nvidia Container Toolkit docs for more details. For more information see the faster-whisper docs ,. To help you get started creating a container from this image you can either use docker-compose or the docker cli. Containers are configured using parameters passed at runtime such as those above. For example, -p would expose port 80 from inside the container to be accessible from the host's IP on port outside the container. Keep in mind umask is not chmod it subtracts from permissions based on it's value it does not add.
Havoc 3 months ago parent prev next [—].
.
The best graphics cards aren't just for gaming, especially not when AI-based algorithms are all the rage. The last one is our subject today, and it can provide substantially faster than real-time transcription of audio via your GPU, with the entire process running locally for free. You can also run it on your CPU, though the speed drops precipitously. Note also that Whisper can be used in real-time to do speech recognition, similar to what you can get through Windows or Dragon NaturallySpeaking. We did not attempt to use it in that fashion, as we were more interesting in checking performance. Real-time speech recognition only needs to keep up with maybe — words per minute maybe a bit more if someone is a fast talker. We wanted to let the various GPUs stretch their legs a bit and show just how fast they can go. There are a few options for running Whisper, on Windows or otherwise. Getting WhisperDesktop running proved very easy, assuming you're willing to download and run someone's unsigned executable.
Faster whisper
Real-time transcription using faster-whisper. Accepts audio input from a microphone using a Sounddevice. This audio data is converted to text using Faster-Whisper. If the sentences are well separated, the transcription takes less than a second. Save and load previous settings.
Hyena hoodie
Whisper is really good at transcribing Greek but no diarization support, which makes it less than ideal for most use cases. By analogy to other machine learning models, I would expect a lightweight LoRA approach would also work. Use Docker. If you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. MobiusHorizons 3 months ago root parent next [—]. We need a better solution. Someone mentioned Alexa-style home assistants, which would have short enough audio snippets that initial prompt would actually be useful. Notifications Fork Star 7. Report repository. I'd be interested in running this over a home camera system, but it would need to handle not talking well. Will this be any faster than running those by themselves? Ensure any volume directories on the host are owned by the same user you specify and any permissions issues will vanish like magic. Founder of Replicate here.
Support distil-whisper model Robust knowledge distillation of the Whisper model via large-scale pseudo-labelling. Upgrade ctranslate2 version to 4.
It would be much better if there were an easy way to fine tune Whisper to learn new vocab. I'm not sure why you're so dismissive when real-time transcription is an important use-case that falls under that bucket of "quick snippets". Simply pulling lscr. We need a better solution. I've been playing around a lot with whisper. Mar 1, Faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. Notifications Fork Star 7. I'm curious, How did you know about this thread here? They could be the original OpenAI models or user fine-tuned models. When using the gpu tag with Nvidia GPUs, make sure you set the container to use the nvidia runtime and that you have the Nvidia Container Toolkit installed on the host and that you run the container with the correct GPU s exposed. Real time transcription is not necessarily short snippets. MaximilianEmel 3 months ago root parent prev next [—] But it will influence the initial text generated, which influences the subsequent text as well.
I am afraid, that I do not know.
Just that is necessary. Together we can come to a right answer. I am assured.
As much as necessary.