I'm trying to gian some experience in Knime by creating a workflow to loop through xml files but log the instances where the xml file isn't valid or there was an error.
I get the list of xml files from an excel sheet and convert the rows of the tabel into variables before using the fileLocation variable as the xml reader parameter
Before I tried to include the error handling I would encounter the error
Execution failed in Try-Catch block: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 9493; The processing instruction target matching "[xX][mM][lL]" is not allowed.
But I don't think I have the error handling configured correctly. If I have the xml reader node contributing the 2 inputs to the Catch Error node "Active Scope End node in inactive branch not allowed."
Could anyone advise the correct configuration of nodes needed so the valid xml files are read but the invalid ones are logged or notified to the operator in some way?
The problem is that when you pass to the XML Reader node the name of a non-existent file through your workflow variable, the node actually fails on configuration, not on execution. Try / Catch nodes work by trapping execution errors, so they won't help you in this case.
The solution has been discussed before: https://tech.knime.org/forum/knime-general/problem-getting-error-trapping-to-work-to-trap-file-not-found-for-filecsv-reader
It consists of using a String to URI node to check for the existance of your file before trying to read it. The node needs to be set with the option Fail if file does not exist (only applies to local files). Given your use case, you don't even need to attempt reading the XML file to know it does not exist.
If the String to URI node fails, you can use the alternative path of the Try/Catch nodes (Data Ports) to log the error to the output table, otherwise just log success and keep looping (btw, you need to use a different loop type for this solution to work -- a Chunk Loop will do).
I'm not sure I really follow. I don't want to substitute data into the flow, just to fail gracefully.
How would the correct sequence differ from my image below?
My read is the table of data inputs into the Chunk Loop -> String to URI attempts to read the file to ensure it exists before the table row is converted to variables and the xmlpath column is supplied to the XML reader to parse the xml file we know exists
Also if I use the workflow from this forum post https://tech.knime.org/forum/knime-developers/reading-xml-files-flat-file-document-parser-problems
When I allow it to index the folder of xml files it will create a list of xml files but when I pass it to the Interate list of Files operator I get the ERROR XML Reader 5:10:4 Execute failed: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 9493; The processing instruction target matching "[xX][mM][lL]" is not allowed.
Sorry, I didn't mean invalid, I meant not well-formed, i.e. "broken". I'm not sure you want to handle such cases (silently) in KNIME. You better repair the broken files or the programs that breaks them.
If you really want to ignore such files, you can use another try-catch block aroung the XML Reader.
While this seems to stop the xml error it isn't looping through the rows, it seems to get the first row, read the xml file and the xml reader node provides an output but no output from the Catch Errors either as on failure or Output port
Have you any recommendation for a suitable node when the desire is to log xml parsing errors and not substitute data into the process?
To add to the discussion: I think a parsing error would be similar to
Execute failed: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 8; XML document structures must start and end within the same entity
I'm still not sure what the error originally received could mean
ERROR XML Reader 5:10:4 Execute failed: org.xml.sax.SAXParseException; lineNumber: 2; columnNumber: 9493; The processing instruction target matching "[xX][mM][lL]" is not allowed
if this is not a proper structured XML, you can read it with the Line Reader. However this means you cannot use the xpath nodes to extract information from the document.
Hi all, I was hoping I would be able to design a process that would identify a corrupt/broken file and notify the operator in some way while continuing to process the other xml files. The cost of re-generating the corrupt file is low so I'm not in need of alternative ways of reading the file at the moment but I'd prefer to get a list of files that need re-creating so I know which ones are broken and act accordingly