Text Processing - Splitting Words from Transcript

Below are two images for the issue I am having. I have inputed a transcript that was auto-generated from a YouTube video, and was trying to get a count of every word mentioned in that video. However, some of the words are put together in the transcript and I was wondering if there was anyway to separate the words without manually doing it?

Screen Shot 2022-05-13 at 11.50.52 AM

Hi @brendencampana,

Again a very interesting problem :smiling_face:
Could you please post the transcript as it comes from YouTube ? It maybe help to bring an easier solution if we can work from the source of the problem.

Thanks & best regards,
Ael

1 Like

Transcript.xlsx (51.4 KB)

Very messy file, but this is pulled directly from the HTML using Selenium. I filter out a lot of the <> which removes the code to get just English sentences.

@brendencampana you should replace end of line characters with spaces before any other manipulation. Use the String Replacer node with a space in the Replacement text box

2 Likes

That was it, thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.