Quick question (hopefully). I have from our CRM around 7000 support emails. I extracted all of the emails and loaded them into knime. I created the documents, cleaned out special characters, signatures, disclaimers etc ..
So far so good, now I want to be able to group the words in the way they occur next to each other. So for example I know that a lot of the emails that have printer in them also have slow, color, empty, sh** and old. While emails that are regrading the phone system had noisy, loud, busy and dirty.
I cant wrap my head around how to approach that, it's probably some distance calculation or vector? And then once I have the data, I would love to visualize it in a network graph.
ny idea/samples on how to approach that? I ccould tag emails by topic, but also don't want to limit my ability to discover information.
Any help would be appreciated