XML namespace problems

Hello

I hope your day where you are is as sunny as mine!

I would like to pick your brains to come up with the best solution to a problem I have just been given.

About 3 months ago I created a work flow that got an xml document (we’ll call this XML_Orig), read, analysed and acted upon. This is quite a bulky flow (took me two weeks too complete it) It happily processes 100s of XML everyday.

I have now been asked to add another source of XMLs (we’ll call this XML_New) to process in the same way as XML_Orig. Both XMLs contain the same data with the same structure, but they have different namespaces.

These are the options I can think of:

  1. duplicate the work flow, in the new work flow, go in an change every XPath node and any other node required.
  2. Change the namspace of XML_New to match XML_Orig before the work flow.
  3. Strip out the namespaces of both XMLs, change all the XPath nodes etc.

These are my thoughts on the different options:

  1. this is an easy fix, but then anytime something has to be changed it needs to be changed twice, it feels like ‘future errors’ is written all over it, and am I creating more work for me (and my work mates) in the long run?
  2. I haven’t be able to do this in Knime, and anyway, it just doesn’t seem like a good solution.
  3. I was taught that stripping namespaces out of XML should not be done. But, this seems like the solution I should go for. I can foresee in x months time I get another XML file with a different namespace, if that happened a) I already have the component to strip namespaces and b) I don’t have to change the flow. But, once again despite looking search for how to do this, I am still unable too.

Please can you comment on the options I have have given and if you have other solutions please, please, please let me know.

Frank

Hi @FrankColumbo

Do you happen to have a anonymized/dummy/snippet of both type of files?

I have an idea about extracting the namespace from the file itself dynamically and then passing this along as flow variable string list to the XPath node. Which, theoretically, should always take care of it for you regardless of the amount of different ns’ you are getting as input.

image

It’s a bit easier to test this theory with a proper reference :wink:

hi @ArjenEX
thanks for your quick reply. I have attached two xml documents which are stripped of identifiable data.
XML_New.xml (4.0 KB)
xml_Orig.xml (10.9 KB)

Frank

Hello @FrankColumbo

Been trying for about two hours now but I have to give up :roll_eyes:

I’m getting close though with my test bench:


Steps:

  • Import the org xml.
  • Clean the ns tags. I do this by establishing a ns tag list, group them and do a RegexReplace() in the String Manipulation. This creates a clean xml.
  • Convert it back to xml to be queried.

I was hoping this would allow a wildcard Xpath query, along the lines of /*/ActualPrice/GrossPriceAmount that could be applied to both. For the ‘old’ equivelant, the same field is actually a lot more levels down. It’s approachable through /*/*/*/*/*/*/ActualPrice/GrossPriceAmount

The number of wildcards has to exactly match the original path otherwise it won’t work, which is the issue that I’m stuck at: writing it in such a way that both can be bound. Maybe an Xpath export can jump on the case.

Hopefully this still helps in any way.

WF:
wildcard xpath ns.knwf (58.1 KB)

Regards,
Arjen

1 Like

@ArjenEX - thank you.

I am working on your flow now.

Frank

@ArjenEX - you did it.!

Thank you again

Frank

2 Likes

@FrankColumbo Great to hear!

What was the final missing piece to the puzzle?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.