I am currently writing my master thesis and using KNIME for the first time. My task is to aggregate and group round about 200,000 responses of a survey to a number of superior topics. Thereby the topics are not formulated beforehand but should be deduced from the repsonses. The responses contain either catchwords or at most one sentence. My current status is that I have all the responses in one column and reduced them by dropping out multiple indications.
Now my problem: I still have a lot of similar words or responses that mean nearly the same (e.g. culture, cultural offerings, theatres, etc.) which I would like to group. I tried out several things like the topic extractor which did not give me reasonable topics. Moreover I tried to work with dictionary nodes and added txt-files so that KNIME could notice several synonyms-but I also did not get reasonable results.
Could someone please give me some hints how I can manage to do it? My survey responses are in German so that I can not use Wordnet which I thought would be a nice thing. Is it possible to handle that problem within KNIME? What's about the term co-occurence, clustering and naive Bayes predictor nodes? I thought about using them but actually did not manage to achieve with them what I like to do.
For my research it is not necessary to know how often terms were used and I should not formulate topics beforehand. Is it possible to do this grouping without entering topics before? Do I have actively to integrate ditionary files and synonym lists into KNIME or has KNIME an own possibility to access dictionaries? Can I integrate an URL link to a dictionary website or do I have to download a word list and upload it to KNIME? How can I do it so that KNIME works with this word list and recognizes the synonyms and word groups in my responses? Is it necessary to build own tables with synonyms/ word groups so that KNIME learns it? In this case, KNIME would not be that helpful as my database is too large to make own synonym tables. Lastly I would like to ask how I can fix spelling errors within the responses so that they can be linked to the right categories/topics.
It would be a big help for me if you could answer to my questions!! Thank you in advance!