I am working on the Enron Email dataset to do a NLP project in KNIME.
Kindly help me extract the information in this dataset from this format to a cleaned one with only words, and then do a supervised and unsupervised learning in KNIME.
Link to csv from kaggle:
The Enron Email Dataset | Kaggle
You’ll need to do some parsing if you want to get nice text data from this. Probably the mbox nodes which I built a while ago might partly help you with the first steps on parsing the raw mail data:
The “TIKA” node might also help, as it supposedly can digest
Don’t expect a “one click” solution though, this will still require quite some efforts
then do a supervised and unsupervised learning in KNIME.
Actually, what do you want to “learn”?
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.