Email Data Mining

Hi,

I am working on the Enron Email dataset to do a NLP project in KNIME.

Capture

Kindly help me extract the information in this dataset from this format to a cleaned one with only words, and then do a supervised and unsupervised learning in KNIME.

Link to csv from kaggle:
The Enron Email Dataset | Kaggle

You’ll need to do some parsing if you want to get nice text data from this. Probably the mbox nodes which I built a while ago might partly help you with the first steps on parsing the raw mail data:

The “TIKA” node might also help, as it supposedly can digest eml files:

Don’t expect a “one click” solution though, this will still require quite some efforts :slight_smile:

then do a supervised and unsupervised learning in KNIME.

Actually, what do you want to “learn”?

5 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.