I am trying to process an XML file, but some XPath processing does not work as expected. Sometimes, the result of the very same process of one XPath node is good in places, in other it is not.
I’m not sure if it is a bug, but I cannot see that the behaviour is desirable, and I cannot see a way to avoid what is happening using just a single XPath node.
What it looks like is that all of the various attributes are returned , and then they effectively get shifted to the top of the table rather than returning “blanks”.
If, for example, I restrict the XPath to just acting on row 4 of your input data, it returns this:
What is evident to me is that all non-null values have been shifted upwards regardless of which row they should be associated with!
The only way I see at the moment of returning attributes associated with their “Columns” (and I mean the “Column” elements in your xml data) is to first using XPath to return a collection of column and ungroup these. Then do your further XPath on these returned elements:
Note in the following, this is your original xpath but now at the /column/ level rather than /table/column/ level, as it is processing the “column” xml generated by the previous xpath node:
ahh, XML, that brings back some nice memories. What you experienced Thiemo caused me quite some pain until I understood what needs to be done.
Takbb already pointed towards the solution:
I cannot see a way to avoid what is happening using just a single XPath node.
You need to follow the “Divide et impera” principle or in other words, try not to do everything t once. Use one XPath node to extract the first axis / elements as nodes, then another XPath to get their data or again XML-data.
That way you can also parallelize the extraction and ensure data coherency. Another trick would be to extract the data as a collection, always ensuring return missing cell on empty string is checked.
To add a bit more context. If an XPath does not match, it does not give back a result. Hence, causing data disjoints. Here is an example of the issue you experienced: