Rule-based row splitter

Hi all

I’m new to Knime and it’s a great tool, but I find it hard to find “how to” guides. Right now I have a large data sat from Facebook that I’m doing some analysis on. I’ve cleaned the data and next thing I want to do is create word clouds. However I have issues with it, so I hope some of you out there may help.
The data is in multiple languages, e.g. English, French, Italian. I want to translate all text to English so that I can create a relevant word cloud, however from what I can tell that’s not possible.

  1. Is it in any way possible to translate all data to English?
  2. I haven’t been able to find any tool that can solve 1), therefore I’m now trying to just extract all the english data (comments) from the data set and then work with that. I have cleaned the data set, then → sentence extractor → tiki language detector → rule-based row splitter. But I don’t get any output, though it looks like the node has been processed (green). Can anyone help? I’ve followed the guide from ScottF but I don’t get the grey box-outcomes with e.g. English Separation of multiple languages in one document

Thank you very much in advance

Best,
Siv

1 Like

Hello @Siv and welcome to the forum.

For the first translation question, within KNIME you could use the Amazon Translate node - but note this requires an Amazon account and is not free. You might also try it with a Python Script node that uses the translate package as described here: https://www.tutorialspoint.com/python_text_processing/python_text_translation.htm

For your second question, if you upload your current workflow along with a toy dataset, I can try to figure out why you’re not getting the results you expect.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.