document grabber outputs Selected directory: is not empty (even though it is)

Hello,
I hope the following is no duplicate: I tried to replicate the text processing work flow (https://www.knime.com/knime-text-processing). However, I fail at the first step: document grabber sends an error stating that “the selected directory is not empty”, even though the directory is empty. I tried to update knime or to select another directory. I also created a completely new one… nothing helps.
Any help would be appreciated.
Anne Catherine

Hi there @annecatherine and welcome to the forum.

I tried to reproduce your problem, but was unable to. It’s not clear to me which workflow you’re using as a starting point, but I can say as a general rule the part of our website that you’ve linked to is fairly old.

Have you tried searching for the Document Grabber node on the KNIME Hub, and looking for related workflows there?

Hi ScottF,
thank you for your reply.
Yes, I searched the internet and the forum, but could not find any related problem…
I still get the error message (in the meantime, I tried to uninstall and reinstall everything) and also downstream nodes are not executed (unsurprisingly, but still worth a try). Do you have any idea what may produce the error other than a folder that already contains files? As I understand you, there have not been similar problems so far with the document grabber. I also tried to read the code on github, but I am just a beginner in coding and could not find anything special.
I really appreciate your help.
Anne Catherine

Are you able to upload the a small workflow that reproduces the problem here?

(And were you able to check out any of the workflows on the KNIME Hub that use the Document Grabber to see if they perform any differently?)

Another thing I thought of: is it possible you are trying to write to a directory that - even if empty - you don’t have permissions for? Can you create a brand new empty directory and point the node to that?

1 Like

Dear Scott,
thanks for your reply and sorry for my late reply.
Thank you for your suggesting to create a brand new directory. Alas, it did not work. But I tested a new workflow with the same old directory (document_grabber_test), where the document grabber does not give any error or warning, however, regardless of the keywords, it only loads up to 10 abstracts. I am sure there is something I missed. … however, maybe the main problem is solved.

I still uploaded two workflows, one that reproduces the error and the other one that doesn’t:

  • knime_project_export is the workflow that gave the error. The document grabber executes and pretends at the end that the directory is not empty, though it is. The error appears regardless of whether the document grabber is connected to subsequent nodes or not.
  • document_grabber_test is a workflow where I tested whether the document grabber would work and if subsequent nodes could be the reason. The subsequent nodes work with exception of the keyword extracter.
    I also upload the console output if this is of some help.
    Thanks in advance
    Anne Catherine
    console_output_document_grabber_test.txt (856 Bytes) console_output_knime_project.txt (214 Bytes)
    document_grabber_test.knwf (12.4 KB) KNIME_project_export.knwf (15.8 KB)

Hi @annecatherine -

Thanks for the workflows and logs. First off, I’ll say that when we’re looking at the execution of the Document Grabber, subsequent nodes don’t affect its performance - so we don’t have to worry about that.

I will say that for both of your workflows, if I point to an empty directory on my machine, in both cases the node executes successfully and returns 1000 documents. I did notice an error in your second log:

ERROR Document Grabber 0:1 Failed to swap to disc while freeing memory

So perhaps the memory allocated to KNIME is an issue? You can increase this by adjusting the -Xmx parameter in knime.ini as described in https://www.knime.com/faq#q4_2. My desktop was struggling with memory allocation on your second workflow (downstream in the preprocessing nodes), so maybe a combination of increasing memory and decreasing the number of documents produced by the grabber node might help.

Sorry that I was unable to reproduce your original problem. My best guess remains that it might be some kind of directory permissions problem.

1 Like