Are there limits to TIKA parser with pdf size?

Hi,
I have used TIKA parser for separating jpg files from inside a pdf. Each jpg extracted is around 17MB, and everything goes ok with a 84 MB file. It filles the second port with data about the files, and files are in the correct location.
Settings used:
no Search recursively
checked Ignore hidden files
File extension: pdf
checked Extract attachments and embedded files
checked Extract inline images from PDFs
output directory already created

However i have 2 other bigger pdfs 120 and 189.4 MB and then there is no output, using same settings as with the smallest one. No error is output.

All pdf files are from the same vendor.

Grateful for any suggestions!

Batjesen

computer: Asus AMD 64bit 16kernels 64GB RAM 8 GB for KNIME

1 Like

Hi @Batjesen, thanks for your question.

There are no size limits for our TIKA Parser nodes. How many GB are you currently allocating to KNIME Analytics Platform? If you’re allocating, say, 4GB, that’d likely be too little for many text processing applications. If you’d like, please post your input files here and perhaps we could test them out.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.