Email Data Mining

augustinejoseph · November 14, 2021, 4:30pm

Hi,

I am working on the Enron Email dataset to do a NLP project in KNIME.

Capture

Kindly help me extract the information in this dataset from this format to a cleaned one with only words, and then do a supervised and unsupervised learning in KNIME.

Link to csv from kaggle:
The Enron Email Dataset | Kaggle

qqilihq · November 14, 2021, 6:31pm

You’ll need to do some parsing if you want to get nice text data from this. Probably the mbox nodes which I built a while ago might partly help you with the first steps on parsing the raw mail data:

The “TIKA” node might also help, as it supposedly can digest eml files:

Don’t expect a “one click” solution though, this will still require quite some efforts

then do a supervised and unsupervised learning in KNIME.

Actually, what do you want to “learn”?

system · May 16, 2022, 6:32am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.