I have encountered some problems with the XPath node. I am working with the NCBI Gene Reports for some genome-wide analysis and data mining. When querying certain attributes/values the node gives me missing values where there shouldn't be any missing values. And - even worse - the missing values occure randomly, which means that executing the same node 100 times gives 100 different results.
This issue can be reproduced with all versions of KNIME (at least from 2.9.4 to 2.11.0). I have created an example workflow with two equally configured and already executed nodes which have - of course - two different output tables. Unfortunately the workflow exceeds the upload limit of this forum by more than 350 MB.
Do you have any ideas how to fix this promptly?
As a workaround (with KNIME 2.11) you can try to convert it to JSON (XML to JSON), use JSONPath and if you need the results in xml try to convert back with JSON to XML. It has not 1-1 match to XPath, but it has quite similar features.
Okay, I will try that.
Does your answer mean that this is a known problem?
No. I am not a regular XML nodes user. Just thought this might be a workaround (JSON and XML both represent trees and probably these work similarly, but as they are different implementations it might work with JSON).
I will send you a link where you can upload your workflow. This sounds like a strange problem.
I just uploaded the workflow.
To solve the mystery: the XML documents contained references to an external DTD. Occasionally this DTD could not be downloaded properly leading to parsing errors and finally to the missing values. We will try to add a cache for external DTDs.
In 2.11.1 all external DTDs and XSDs will be cached.
I just want to let you know, that the problem is still present in 2.11.1. Please see the screenshot attached to this post.
Updating KNIME to 2.11.1 (not just the XML extension) solved the problem! :-)