removing duplicate values within a row

Sian_Evans · April 5, 2024, 1:50pm

Hi,

I’m convinced there is a simple solution to this, but I can’t work it out and I can’t find any topic which quite covers it.

I have data as follows:

A 1 2 3 3
B 4 2 2 5
C 6 3 2 6

I would like to keep the format the same but remove all the duplicate values within a row, so it would look like this:

A 1 2 3
B 4 2 5
C 6 3 2

So my plan was to split columns 2-4 on to different rows, remove the duplicate rows then put it back together again. However I can’t figure out a node or group of nodes which will split columns 2-4 out on to different rows grouped by the letter e.g.

A 1
A 2
A 3
A 3…etc

Perhaps there’s a better way of doing this? If anyone can help I would be very grateful.

Thanks,
Sian

takbb · April 5, 2024, 3:06pm

Hi @Sian_Evans , you could use Column Aggregator to aggregate all of your value columns (cols 2 to 4), using “Unique Concatenate” and telling it to “Remove the aggregation columns”, but keep “Retained columns”. Specify a delimiter that won’t appear in your data.

Then use a column splitter to split this aggregation column into new columns using the same delimiter used in the Column Aggregator. Tell it it “Remove the input column”.

After that, rename the columns back to what they were on the original table, or whatever you now want them to be called. Column Renamer (Regex) may be a good choice here, or you can possibly use the Column Renamer (Dictionary) to rename them based on the original table, if the columns are in the same order.

If your data is numeric, then the Column Auto Type Cast node is one way of setting the correct data types, or if the column names are changed back to those on the original table, you can use the Table Validator (Reference) to set the data types for you: Switch off the Check Column Existence (since some columns may no longer be present), and switch on the “Check Data type” and set it to “Try to convert”.

Edit: The column splitter may well get the types correct, if the (default) option to “Guess size and column types” is selected, so you can probably ignore that last bit!

btw, welcome to the KNIME community!

Daniel_Weikert · April 5, 2024, 3:17pm

Alternatively column aggregator with set aggregation and then split collection column could work too
br

takbb · April 5, 2024, 3:21pm

Very true @Daniel_Weikert , that’s probably the better option. For some reason I always forget about the Split Collection Column node. @Sian_Evans, if it answers your question, I think Daniel’s answer should be awarded the solution!

Sian_Evans · April 5, 2024, 4:12pm

Thank you so much @Daniel_Weikert and @takbb - I’ve been splicing and dicing the data with every node I could find for ages…and it turns out I just needed to know about the ‘unique concatenate’ option on the column aggregator node.

I couldn’t get the split collection column node to work as the data type wasn’t correct, and I couldn’t change it from string, due to the delimiter still being present. Perhaps I misunderstood this solution? Anyway, @takbb’s solution worked a treat, so all good.

Thank you so much!!

system · April 12, 2024, 4:12pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.