Sort Content of a Collection Cell

Hi all,

I am currently building a way in KNIME to compare two XML files with each other. This already works quite well. Unfortunately, I still have a sorting problem. I have two Collection Cells and would like to sort their contents so that I can compare them with each other.

Example:
Collection Cell XML1 Row 1: C,D,T,Z,Q
Collection Cell XMK2 Row 1: T,Z.Q,C,D

The comparison with the Table Difference Finder fails because the order is different, but the content is the same.

I am looking for a way to make them look like this after sorting:
Collection Cell XML1 Row 1: C,D,T,Q,Z
Collection Cell XML2 Row 1: C,D,T,Q,Z

What is the best way to do this? I have already tried many things but have not yet found the right solution. I hope someone here can help me.

Hello @jgawin,

and welcome to KNIME Community!

I can think of few ways to deal with this:

  • instead of List/Collection use Set column - this will remove duplicates so might not work in your case but when comparing [A,B,C] to [B,A,C] it’s equal
  • there are couple of nodes which have List (sorted) aggregation method but this is dependent on way you obtain your collection columns (I assume it’s not with any of such nodes)
  • use arraySort() function from Column Expressions node to sort you collection columns

Hope this helps!

Br,
Ivan

5 Likes

Hello Ivan,
Thank you for the ideas. Unfortunately, it doesn’t quite fit yet. Duplicates can also occur in the collections, which must also remain. Therefore, it is unfortunately not an option to convert this into a set. I have also not yet found a way to simply sort the list. With arraySort() I only get the message that it cannot be sorted because it is not an array. Are there any other ideas?

I have another question. I am working with the “Table Difference Finder” node in the second step. Unfortunately, this only compares on a row basis and not on the basis of an ID. There are a lot of product IDs in my XML files. However, it is possible that not all product IDs are present in both XML files.

Example:

XML1:
123
234
456
789

XML2:
123
234
456
678
789

In this case, the Table Difference Finder would incorrectly compare the Product IDs: 789 and 678, because it found this in the matching row. However, it should then compare the 678 with nothing and only then compare the 789 with the 789 again. Which step would I have to implement beforehand so that this works? Or is a completely different node instead of the “Table Difference Finder” the better choice to achieve this?

I hope this was understandable and I am grateful for any help or ideas.

Many greetings
Julia

Hi @jgawin & welcome to the KNIME community,

Would the following old post in the KNIME forum be of help to answer your question ?

Best
Ael

1 Like

Hi Ael, thank you for the answer. I tried it and unfortunately it doesn’t quite work. When I do this, it combines all the rows again. But I lose all the other columns that I also have in the evaluation. How can I set this up correctly?

Hello @jgawin,

Then you probably don’t have a Collection cell type but rather a String cell. Can you check that?

Use inner join from Joiner node based on product ID prior to using Difference Table Finder. This way you can do correct comparison as long as product IDs are sorted in the same order in both XMLs. Joiner node also offers outputting left/right unmached rows in separate tables which gives you product ID differences between XMLs.

Br,
Ivan

1 Like

Hi Julia,

As stated by @ipazin, most probably the problem comes from the way the data were aggregated using “Concatenation” instead of "List (Sorted).

Please find below a workflow with a possible solution to your question:

20230105 Sort Content of a Collection Cell.knwf (49.2 KB)

Hope it helps.

Many greetings!
Ael

1 Like

I read the XML files with XPath and have selected “Collection Cell” for Multiple tag options and “String” for XPath Data Type. Is this the problem?

Thank you, I will try this

Hello @jgawin,

not sure what’s the issue regarding arraySort() in Column Expressions node as it seems your column is a collection column type from your configuration. Can you check it’s really a collection column before you feed it to Column Expressions? You can check column types in Spec tab of node output.

Alternatively you can share workflow with (dummy) data for someone to check it out.

Br,
Ivan

1 Like

Ok, I think I have solved the sorting problem. Now I only have the problem with the shifting of the product IDs. @ipazin : You suggested that I set a joiner before the Table Difference Finder, but that doesn’t work because then I only have one table and the Table Difference Finder expects two in order to be able to compare both tables with each other. Do you have any other ideas?
Thanks a lot!

1 Like

Hi @aworker, thanks for that. That solved the problem of sorting in the end. I just ended up splitting it into two paths and then joined the two tables together. Now I have the values as I need them, only unfortunately the offset in the product IDs…

Hello @jgawin,

glad to hear you managed to solve your sorting issue.

You can use some Splitter node to split your table to 2 tables again and then do comparison. Alternatively to Joiner node you can use Reference Row Filter node two times based on your Product ID column. First time table 1 is your data table and table 2 is your reference table and next time vice versa.

Br,
Ivan

1 Like

Many thanks to @ipazin and @aworker. This helped me find a solution to my problem. I now have a solution I can work with.

2 Likes

Thanks Julia (@jgawin) for your feedback and for validating the solution. Glad @ipazin & myself posts helped.

Best wishes with your KNIME development,
Ael

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.