we try to analyze a text corpus of news regarding specific key words.
Here I applied steps to get most frequent terms (TDF, IDF) checked the most relevant terms per document and created a tag cloud as well as a network representation of connected terms via co-occurance counter.
Now we would like to detect which main topics are discussed in this news (e.g. "quality of service", "warranty" etc.) where each document could include several topics (but not just terms and key words).
We gather regularily new documents in this context where we would like to follow how intensive these detected topics are discussed.
We would like to tag new documents with the topics, to see trends in this know topics.
In the best case we can also detect new topics over time.
For this I was trying to use classification of documents (learning, predicting), but could not get which would be the best way to do this. I am not sure if KNIME is build to support these kind of tasks.
I would appreciate comments, tipps, or also examples how this could be solved.