Remove occurrence of a text on other rows if it appeared more than 3 times already

I_T · September 15, 2024, 3:54pm

Hi, I’m really confused on how loops work here or if I even need a loop to get the output I need.

I have a column that has delimited values of IDs, sample:
row IDs
1 one1;two2;three3;four4;five5
2 one1;three3;four4;five5;six6
3 one1;seven7;four4;eight8;nine9
4 one1;two2;three3;nine9
5 one1;two2;four4;ten10

I want my output to be:
row IDs
1 one1;two2;three3;four4;five5
2 one1;three3;four4;five5;six6
3 one1;seven7;four4;eight8;nine9
4 two2;three3;nine9 --------------------------> removed one1
5 two2;ten10 --------------------------> removed one1 & four4

My goal is to have only 3 at max occurrences of an ID for all of the rows of the data table.

Help me please!

#loops #ruleengine

JPollet · September 16, 2024, 2:56pm

Hi @I_T and welcome !
Here is a very crude workflow (maybe someone would provide a more elegant solution). The main idea is to convert IDs in list and use exclusive-or between initial list and the IDs to be remove at the right place.
Hope that could be usefull !

data_cleaning_v0.knwf (44,6 Ko)

Best,
Joel

JPollet · September 16, 2024, 3:23pm

Oups…
A much more direct proposal !
See second branch of the workflow.

Rank node does all the job.

data_cleaning_v1.knwf (60,9 Ko)

Joel

I_T · September 17, 2024, 7:14am

Thanks Joel! I’ll try this approach.