Speech to text from wav/mp3 file

I am very new to KNIME, coming from Alteryx. I am investigating speech to text capabilites of KNIME and I have discovered many example workflows in the KNIME forums, Here is an example. I know the BING and Watson APIs are not working but I cannot even get the CMU sphinx to work. it will work with the sample workflow but I cannot figure out how to make it work with my own file. Can anyone give me a push in the right direction?

Hi @onp1ldy,

in said example, you should be able to delete the “Explorer Browser” node (only needed to handle the relative path of the audio file - should not be relevant in your case) and configure the “List Audio Files” node so that it reads the file you want to transcribe (drag in a new “List Audio Files” node or make sure to remove the flow variable setting in the “Flow Variables” Tab of the existing node - see here or here for background on flow varaibles).

Other than that, could you please provide some more information? Could you describe why/what/where your procedure is failing? If you could even share an audio snippet I’m happy to help further along.

I hope that helps already!

Best regards,
Lukas

3 Likes

Thanks @LukasS, So I tried what you had suggested. I selected an audio clip of myself speaking. It is about a 45 second clip of me talking about dead air in an audio file and how long I will stop speaking. What I get from CMUSphinx is the transcription in the photo

Hi @onp1ldy,

I think the culprit is the sampling rate: the underlying model is trained with 16 kHz sampling rate - see the node description of the CMUSPhinx4 SR node - and most recording softwares use a higher sampling rate. You can check the sampling rate with the “Audio Data Extractor” node:


44.1kHz in my case. In order to transcribe this, you’d need to downsample your audio file to 16kHz.

Maybe your recording tool already allows to set the sample rate? Otherwise I’d suggest ffmpeg (in my case: ffmpeg -i .\WhatIsKNIME.wav -ar 16000 .\WhatIsKNIME_16kHz.wav from the command line) or the like to do the downsampling. See also the FAQ of CMUSphinx

(I tried reading the first paragraph of Wikipedia on KNIME and with the 16 kHz sampling rate results are indeed not any more useful. Lets see what your results are, hopefully this is only an issue from my side. I’ll keep you posted if I have more insights)

Best Regards,
Lukas

3 Likes

Worked like a charm!!! I used PyDub to convert the frame rate and all set now. Thank you!!

4 Likes

No problem, I’m glad it worked! I guess in my case I have to work on my articulation :grinning_face_with_smiling_eyes:

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.