E-Mail as source for text mining

Hallo everyone, hallo Killian,

I finally made it in getting a closer look to huge textmining improvements. Great. and great tutorials.

I wonder, if no one else already raised the topic yet, whether would it be possible to have:

1) email as document (i.e. a node to link email client like Thrunderbird, Lotusnotes or Outlook as a source of documents)

2) an analysis like on tweets or other social media done on email server of a typical company as tutorials

Any thoughts?

Best

Andrea Z

 

 

Hi Andrea,

so far there is no dedicated node to read/parse emails from email clients. To get the email data into KNIME you need to export it as e.g. csv formatted plain text. For thunderbird there is an addon (ImportExportTool https://addons.mozilla.org/de/thunderbird/addon/importexporttools/) which can do this for you. Then read the data using the File Reader node and convert it into documents using the Strings to documents node.

There are white papers on social media analysis with KNIME available (http://tech.knime.org/examples). In the papers a slashdot data set is used. However, there are no papers/tutorials available for twitter or email data analytics.

Cheers, Kilian

Thanks for the update, Kilian.

I will use the addon to show the capabilities.

As portfolio manager in a generic company I would like to bring this aspect of BI to the discussion table.

I was just wondering that business intelligence "a la mode" just refers to social media, when we (and company customers too) are sitting on zillion of texts which are costantly neglected.

Best+thx

Andrea

 

I would like to import email directly from mail servers and database the content, headers, and attachements. I tried the Thunderbird method suggested but the .csv export file is limited. There is an email package in Python for  working with email:

https://docs.python.org/3.5/library/email.html

I have not worked with Python in KNIME.  Would implementing the email package and functions be involved?

Thanks,

Gilbert

@gcarmich Possibly. However, here's a Stackoverflow question that had some code that looked legit to start testing. From a flow perspective, if you can get the IMAP connection authenticated and open, getting the other data you're looking for should be pretty simple. 

http://stackoverflow.com/questions/2230037/how-to-fetch-an-email-body-using-imaplib-in-python

Thanks.  I'll see what I can do with the informaiton on the link provided.  I recently posted a question on the forum but have not recieved any responses

https://tech.knime.org/node/55539/view

The method I have been trying recently uses the Context.io service and scripting for authentication.  The script outputs the email messages in JSON format and the service has the ability to transfer attachments.  I've been having problems getting Context.io to run since my Python skills are limited.

Hello everyone, I would like to take up this topic and ask whether is now such a node there.

There are nodes for parsing .mbox files:

Does this help?

Philipp

Hello Philipp,

thank you.

I´ll check this, but the nodes cannot be installed.

Uwe

Hi Uwe,

you could try out the Tika Parser node as well.

Best,
Julian

Hi Julian,

thank you.

I will check it.

Thx

Uwe

Hi Uwe,

Why not? What’s the error you see? Can you give some more details?

–Philipp

Hi Philipp,

i see under available software “nodepit”, but I can’t find or download the nodes anywhere.

Sorry for my questions, but i am a beginner :slight_smile:
regards Uwe

Worry not:

Please follow these instructions to install the NodePit plugin and then you can easily install the mbox nodes trough the NodePit plugin.

In case of further questions, let me know!

–Philipp

KNIME says: All items are installed (Available Software)

But i doesn´t find e.g. the mbox node in the Node Repository

-Uwe

Did you follow the instructions linked in previous post?

–Philipp

I try it again - and yeah :v: : the nodes are now available!!

Thank you for the support.

best regards

Uwe

Happy to hear :slight_smile: :+1:

Hope they work for you!

Philipp

Updating this thread to let everyone know the new Email Reader (Labs) and associated nodes were added in the 5.2 release. :slight_smile:

1 Like