Xpath creating incorrect data

ThomasRobsonPG · September 6, 2022, 8:33am

Hello,

I am trying to read a large XML of customer data. The issue is that there are several columns within the XML around 12. The KNIME is reading these as individual columns and just putting them next to each othere, where as i need them to be read as one customer.

The problem here is that if one piece of data is blank, instead of leaving it blank, it pushes the data up, so now the data is out of sync, for example where the data should look like this:

Customer
|Customer|Area|ID|
|—|—|—|—|—|
|Thomas|Sunderland|789|
|David| |123|
|John|Newcastle|485|

The data instead looks like this:
Customer
|Customer|Area|ID|
|—|—|—|—|—|
|Thomas|Sunderland|789|
|David|Newcastle|123|
|John| |485|

Can anyone help me resolve this issue?

ArjenEX · September 6, 2022, 7:35pm

Hi @ThomasRobsonPG

I have slight trouble visualizing what your actual situation is. You mention having a XML, but your current output and expected output does not really reflect that. Also how is your Xpath is configured? This also regarding the output being in separate columns as you mention.

Theoretically, you should be able to play around with the empty cell option in the Xpath node to control this.

Could you please enrich your post with a sample (anonymized) xml, current workflow, used settings etc.

Like the honorable @bruno29a always says, the more accurate you are in your information, the more accurate the solution will be

ThomasRobsonPG · September 7, 2022, 8:36am

Hello,

Thank you for your reply.

So i have a very large XML file that i am reading in. I have attached a sample size of the structure of the XML. In this you can see that in some cases there is missing values, not blank values, but the headers themselves are not there. In this case the value will move up.

So for the first customer, they don’t have a tax field, in my KNIME i would expect this field to be left blank, but instead it moves the tax field from the customer below it up one space. This is causing my data to be out of sync.

I have tried to use the “return missing cell on empty string” and i set it to “multi row” but it doesn’t resolve my issue, because technically the cell isn’t blank, there just isn’t one there at all for it to read.

exampledata.xml (1.3 KB)

ArjenEX · September 7, 2022, 9:08am

Hi @ThomasRobsonPG

You can solve this by first seperating the customer as an individual section/xml. From here, you can query the fields to your desire.

For this, use Xpath node 1 to extract all the customers with the data type node cell. This creates a xml line for each customer.

Output

Next, add another Xpath node to query the customer specific fields. Something which you probably already have in similar form. Note: the null checkbox is not required.

This will now generate the correct output without the shift taking place:

See WF:
Xpath creating incorrect data.knwf (17.7 KB)

Hope this helps!

ThomasRobsonPG · September 7, 2022, 10:09am

Thank you so much! This is very helpful!

ScottF · September 7, 2022, 2:34pm

ArjenEx: XPath Master!

system · September 14, 2022, 2:35pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.