Update of apache tika library

Hey,

there is a small problem with the TikaParser. Currently version 1.14 is used in Knime. If you separate big data e.g. emails from their attachments, the error “Too many open files” appears after some time. Unfortunately I don’t have the link at hand where this bug is discussed and it was marked as fixed in later versions.

As a workaround I have now downloaded and overwritten TikaParser version 1.18. Would it be possible to do the same in the next Knime version?

Best regards

Hi @mheitzhausen,

thanks for pointing this out. Currently we are using Tika version 1.14. I will open a ticket for the lib to be upgraded to the latest version.

Cheers,
Kilian

1 Like

Hi @mheitzhausen,

We’ve updated Tika to version 1.18 in our nightly build. As we were not able to reproduce the issue it would be great if you could give it a try with the nightly build and provide some feedback.

Cheers,
Marten

Hi @Marten_Pfannenschmidt ,

thank you for your answer. I’ve just tested it with the current nightly build and it’s running without errors.

Will the new version of the text processing extension only be available with 3.7, or will it still be available as an update for 3.6?

Cheers

I’m happy to hear that it works with the updated Tika library. Unfortunately this change will only be available with 3.7 and not as part of the upcoming 3.6.1 release.