Comparing XML

Good Morning KNIMErs

Could you please help me out with this task I have been given - I am stumped!

I need to compare files from the output of one process to files from the output of a replacement process to ensure the replacement works correctly. The output files of these processes are xml. I have no control of how the output xml is made (nor did I design it!).

The basics of the xml is this:

<root>
    <UUID>5478a9d5-d480-4c4b-8833-b86c03027654</UUID>
    <Created>2024-04-08T14:25:48.233918700</Created>
    <DocumentName>doc55678</DocumentName>
    <Reference>abcd1234</Reference>
    <FileType>66</FileType>
    <KnimeVersion>TESTING</KnimeVersion>
    <Source>acme inc</Source>
    <RuleSet>
        <item>
            <Properties>
                <NumberOfNames>2</NumberOfNames>
            </Properties>
            <ImportScript>SCR-NAMES</ImportScript>
            <ObjectName>Names</ObjectName>
        </item>
        <item>
            <ImportScript>SCR-SERIALNUMBER</ImportScript>
            <ObjectName>SerialNumber</ObjectName>
            <item>
                <NumberOfOtherSerialNumbersFound>1</NumberOfOtherSerialsNumberFound>
                <OCENumberSerialNumbers>2</OCENumberSerialNumbers>
                <Properties>
                    <item>
                        <item>ZYX123</item>
                    </item>
                </Properties>
            </item>
        </item>
        <item>
            <Properties>
                <IsItBlue>false</IsItBlue>
                <PurchaseDate>2017-09-08</PurchaseDate>
                <IsItSquare>true</IsItSquare>
                <IsItHot>false</IsItHot>
                <MoreThan25kg>false</MoreThan25kg>
            </Properties>
            <ImportScript>SCR-PARTICULARS</ImportScript>
            <ObjectName>Particulars</ObjectName>
        </item>
    </RuleSet>
    <Inputs>
        <item>
            <Properties>
                <Colour>purple</Colour>
                <Antenna>triangle</Antenna>
                <Notes>carries a red bag</Notes>
                <Size>tallest</Size>
            </Properties>
            <ImportScript>SCR-TINKYWINKY</ImportScript>
            <ObjectName>TinkyWinky</ObjectName>
        </item>
		<item>
            <Properties>
                <Colour>green</Colour>
                <Antenna>dipstick</Antenna>
                <Notes>sometimes wears a hat</Notes>
                <Size>2nd tallest</Size>
            </Properties>
            <ImportScript>SCR-DIPSY</ImportScript>
            <ObjectName>Dipsy</ObjectName>
        </item>
        <item>
            <items>
                <item>
                    <Name>Aramis</Name>
                    <RealName>René d'Herblay</RealName>
					<Muskerhound>true</Muskerhound>
                </item>
                <item>
                    <Name>Athos</Name>
                    <RealName>Count de la Fère</RealName>
					<Muskerhound>true</Muskerhound>
                </item>
				<item>
                    <Name>Porthos</Name>
                    <RealName>Baron du Vallon de Bracieux de Pierrefonds</RealName>
					<Muskerhound>true</Muskerhound>
                </item>
				<item>
                    <Name>Dogtanian</Name>
                    <RealName>Charles de Batz de Castelmore</RealName>
					<Muskerhound>false</Muskerhound>
                </item>
            </items>
            <ImportScript>Muskerhounds</ImportScript>
            <ObjectName>SCR-MUSKERHOUNDS</ObjectName>
        </item>
    </Inputs>
</root>

The top part will always have the same tag names. The RuleSet object will always (should) have the same item objects (3) - but they are normally in a different order. The Inputs objects will include 3 to 25 item objects and these will also be in a different order. There only two different Inputs objects.

I can collect the files from different sources and concatenate the list and use a group loop start node. The thing I am struggling with is how can I compare the item object with the equivalent object in the other file.

Many thanks in advance

Frank

Hi @FrankColumbo , so just to confirm, if the sub-elements within are in a different order, but have the same values in both files, are they considered to be the same?

Also, do you need to know what the differences are, or simply know that they are different? (clearly knowing is “ideal”, but is it necessary for your use case?)

hello @takbb

thank you for helping.

I can confirm:
the order of the sub elements does not matter.
I just need to know they are different - not what is differences are.

I did some work on this yesterday and have so far come up with this:

Its not great, bloated, not finished and does quite work yet. its for your (and others reference).

Many thanks again

Frank

Hi @FrankColumbo ,

I’ve approached it with a piece of java to reduce the node-workload, and then added some nodes on top.

I hit upon a thought that two pieces of XML are the same, if you can beak both xml down into all the different “groups” of elements and within those groups sort all the elements by name and value, and then take all those sub-groups and sort them again… And if what you end up with is the same string, then they are materially the same xml…

Ok that’s a bit of a mouthful and not particularly well explained, so…let me know if this gets close…

A word of caution: it’s quite possible that I’ve not considered a scenario, and that this will miss it, so this should be considered work-in-progress until fully tested!

1 Like

hi @takbb

many thanks for this.

I have taken / stolen / used your component to enable me to adapt do what I need to do. You took me 90/95% of the way.

Frank

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.