Hello everyone,
I have been working of some workflows which require parse PDF files, I am facing some difficulties using the Text Processing nodes (still new to Knime, my bad) so I was looking for alternatives to convert the PDF to Plain Text and found the utility from Silvercoder called DocToText which is pretty good actually.
The situation I am facing now is how to create a workflow to call the executable convert the original file in PDF and create the new one in TXT; I was trying to use the CmdwInput node which I though I can use to call the executable in command line, passing the parameters, input file and output file, however I receiving errors and havent been able to found examples how to correctly use this node or alternatives (such as External Tool).
In order to convert the PDF to TXT you have to use the following sentence from command line:
"<PATH-TO>\doctotext.exe" --pdf "<PATH-TO>\originalFile.pdf" >> "<PATH-TO>\convertedFile.txt"
Already tried calling the above command from the CMD window and working file, also calling the command from BAT file with positive result, however I cant execute the workflow in Knime, its return the following error message:
ERROR CmdwInput 2:60 Execute failed: STDERR message: Using PDF parser. Cant open C:\doctotext\doctotext\TXT\outputFile.txt for reading It is possible that wrong parser was selected. Trying different parsers. Trying to detect document format by its content. Error opening file C:\doctotext\doctotext\TXT\outputFile.txt. Error processing file C:\doctotext\doctotext\TXT\outputFile.txt. |
I dont know what I am doing wrong, if my understanding of the node is not correct, or something related to permissions/access.
NOTE:
I'm a Windows user (Knime Version 3.5.1) so I have to deal with CMD and Windows URL paths.
Attached the working workflow for references.
Any ideas, suggestions would be very appreciated
aledezma003