Possible bug in Tika Parser URL Input

Hello,

I came across this issue that could potentially be a bug.

When using the Tika Parser URL Input and passing a link that doesn’t end with “.pdf”, the node doesn’t work. When passing the same link after deleting its last part after the extension of the file, it works as expected, reading the pdf.

Example:

Here’s the message in the error column: “File doesn’t match any selected extension(s)”

I tried to find other links for further testing but couldnt’ find anything that didn’t end with the extension name (sorry).

Is it an expected behaviour or is this a bug?

Have a nice evening,
Raffaello Barri
LinkedIn

Hi @lelloba

I quickly checked what the code is roughly doing and it looks like when you are using “File extension” it checks for any path/url what is the file extension and if this file extension is part of the included file extension i.e. /folder/folder_1/file.csv the extension would be “.csv”. For the URL you are using I would simply suggest to use the MIME-Type option.

I hope this sheds some light :slight_smile:

Let me know if this works for you!

Best regards
Lars

1 Like