I am currently building a way in KNIME to compare two XML files with each other. This already works quite well. Unfortunately, I still have a sorting problem. I have two Collection Cells and would like to sort their contents so that I can compare them with each other.
Thank you for the ideas. Unfortunately, it doesn’t quite fit yet. Duplicates can also occur in the collections, which must also remain. Therefore, it is unfortunately not an option to convert this into a set. I have also not yet found a way to simply sort the list. With arraySort() I only get the message that it cannot be sorted because it is not an array. Are there any other ideas?
I have another question. I am working with the “Table Difference Finder” node in the second step. Unfortunately, this only compares on a row basis and not on the basis of an ID. There are a lot of product IDs in my XML files. However, it is possible that not all product IDs are present in both XML files.
In this case, the Table Difference Finder would incorrectly compare the Product IDs: 789 and 678, because it found this in the matching row. However, it should then compare the 678 with nothing and only then compare the 789 with the 789 again. Which step would I have to implement beforehand so that this works? Or is a completely different node instead of the “Table Difference Finder” the better choice to achieve this?
I hope this was understandable and I am grateful for any help or ideas.
Hi Ael, thank you for the answer. I tried it and unfortunately it doesn’t quite work. When I do this, it combines all the rows again. But I lose all the other columns that I also have in the evaluation. How can I set this up correctly?
Then you probably don’t have a Collection cell type but rather a String cell. Can you check that?
Use inner join from Joiner node based on product ID prior to using Difference Table Finder. This way you can do correct comparison as long as product IDs are sorted in the same order in both XMLs. Joiner node also offers outputting left/right unmached rows in separate tables which gives you product ID differences between XMLs.
not sure what’s the issue regarding arraySort() in Column Expressions node as it seems your column is a collection column type from your configuration. Can you check it’s really a collection column before you feed it to Column Expressions? You can check column types in Spec tab of node output.
Alternatively you can share workflow with (dummy) data for someone to check it out.
Ok, I think I have solved the sorting problem. Now I only have the problem with the shifting of the product IDs. @ipazin : You suggested that I set a joiner before the Table Difference Finder, but that doesn’t work because then I only have one table and the Table Difference Finder expects two in order to be able to compare both tables with each other. Do you have any other ideas?
Thanks a lot!
Hi @aworker, thanks for that. That solved the problem of sorting in the end. I just ended up splitting it into two paths and then joined the two tables together. Now I have the values as I need them, only unfortunately the offset in the product IDs…
glad to hear you managed to solve your sorting issue.
You can use some Splitter node to split your table to 2 tables again and then do comparison. Alternatively to Joiner node you can use Reference Row Filter node two times based on your Product ID column. First time table 1 is your data table and table 2 is your reference table and next time vice versa.