Distance or commonality to Network Graph

Hello all,

Quick question (hopefully). I have from our CRM around 7000 support emails. I extracted all of the emails and loaded them into knime. I created the documents, cleaned out special characters, signatures, disclaimers etc ..

So far so good, now I want to be able to group the words in the way they occur next to each other. So for example I know that a lot of the emails that have printer in them also have slow, color, empty, sh** and old. While emails that are regrading the phone system had noisy, loud, busy and dirty. 

I cant wrap my head around how to approach that, it's probably some distance calculation or vector? And then once I have the data, I would love to visualize it in a network graph.

ny idea/samples on how to approach that? I ccould tag emails by topic, but also don't want to limit my ability to discover information.

Any help would be appreciated

Thank you

 

 

 

 

 

 

 

 

 

 

 

 

 

To discover content in a text you can use the Topic Extractor (Parallel LDA) node (unsupervised) or the Keygraph keyword extractor node or the chi-square keyword extractor..

For the graph representation you can check the example workflow in the EXAMPLES server under 08_Other_Analytics_Types/05_Network_Mining.

If you are interested in the frequency of co-occurring words, you can try the Term Co-occurrence counter node.

Does this help?

-- Rosaria