I am looking for a solution to process Emails from Thunderbird (on Linux). Either directly from the TB-files (read-only) or from the exported-files to either an .mbox- or multiple EML-files with the “ImportExportTools NG” add-on.
I would require to get the whole header information of the emails extracted (including the ‘Received: from’, ‘X-Mozilla-…’ and any other info in the headers.
I haved read so far the recommendations at Extract Text From Email and tried the “Tika Parser” as well as the “mbox Reader”.
The Tika Parser" didn’t return the header info, only the the Author, Date/time, Title , and Text.
The ‘mbox Reader’ returned with an “Execute failed: Malformed input!” error. This might come from that this node expects a mbox-file, while the outputed format seems to be a mboxcl2-file (but I might be wrong).
Does anyone have any sugguestions how to I can get the required data into a table (or document.
Is there a possibility to see the “Tika Parser” Code so that i can eventually expand the extraction fields.
Or how the “mbox Reader” can be brought to operation and deliver the required data
Thanks.
PS: I dont need solutions to retrieve data from any kind of servers.
Hi @kludikovsky,
I think there is no ready-made node, but eml is text-based (as long as you do not need to deal with attachments) and so I have created a component that may help: Parse EML (Local) – KNIME Community Hub.
Let me know if this is useful!
Alexander
thank you for your message on NodePit and your issue report regarding the Thunderbox files. I’m answering here, as it will be of benefit for the entire community.
On NodePit there’s an updated version of the mbox Nodesv2.1.0 which allows you to change the parsing behavior, so that you should be able to process your Thunderbird exports properly. To use it, open your existing workflow and replace the now “deprecated” version of the mbox Reader with the new version from the node repository. Open the configuration and change the “From” setting from the default to the second entry.
To the background: Per default, the parser expects an @ character in lines starting with From. Thunderbird’s export however also produces From lines which can look like this From - Sun Aug 22 22:43:14 2021.
Fingers crossed, and again thank you for your support in hepling us to improve these nodes!