Replacing duplicate values on rows

Hi Community,

I have a dataset that looks like this:

ID|Prod1|Prod2|Prod3|Prod4|Prod4|UniqueFlag

1|24|25|26|27|28|Unique
2|24|25|26|27|27|NotUnique
3|24|25|26|26|26|NotUnique

I would like to find a way to replace duplicate Pod1-5 values where the row indicates ‘Not Unique’ in the UniqueFlag, effectively trying to make all row have the flag ‘Unique’ as a final output. Conceptually, I have thought of something like this from a flow perspective:

  1. Filter out all rows that are ‘Not Unique’
  2. Create a reference table of some sort to list say 50 product ID that could we used as a replacement if the row is not unique
  3. Lastly, and this is the tricky bit I think, somehow insert product id’s from the reference list where there is a duplicate value in the original table row (as indicted by the ‘UniqueFlag’).
  4. Ideally I would like to create some sort of loop that replaces values until there are no more flagged ‘None Unique’ rows left

Any ideas on how I can implement this practically? Thanks in advance for your help with this.

Hi,

A few question:
Do you want to replace the entire row when it’s “Not Unique” or just the duplicate values?
And how exactly do you want to replace it?
Please provide more explanation and a sample input and output.

Best,
Armin

Hi @armingrudd ,

Do you want to replace the entire row when it’s “Not Unique” or just the duplicate values?
Just the duplicate values

And how exactly do you want to replace it?
By utilising the values in the Reference File sequentially

Please provide more explanation and a sample input and output.
I have attached a sample file here with the current input, the reference table and the desired output (the values in red indicated that they have been replaced by the process). I hope this is enough explanation but otherwise let me.

Prod_Unique.xlsx (11.0 KB)

Thanks again for your continued help with this.

Hi there!

not the coolest solution using 4 Rule Engine nodes but it seems to work well. Actually there is a good chance it has a mistake somewhere so check it twice :smiley:

Couple of observations:

  • You do not need 5 replacement values because if all 5 products are the same you will only replace 4 of them :wink:
  • You got couple of mistakes in your desired output :slight_smile:
  • This will work only if you do not have replacement values already in Prod1, Prod2…
  • Column UniqueFlag is not necessary as you can run this rules against all rows
  • If you will have additional columns rules will get pretty ugly

Here is workflow:
2019_01_30_Replace_Duplicate_Values_Within_Row.knwf (48.9 KB)

Good luck!

Br,
Ivan

1 Like

Hi,
I am so sorry for replying too late. :sweat:
It seems I have missed this topic.
Many thanks to @ipazin for his solution. Here I have a solution as well:
dup1

To solve this issue, I created a counter column (to count duplicate replacements in the loop) and then pushed data into a recursive loop (5 loops and collecting data from last iteration only).
The column expressions node does all the job. How? I’m pretty sure you’d rather not ask :face_with_head_bandage: as this is very confusing even to me :sweat_smile::
2 expressions, the first one to replace duplicates and the second one to modify counter value.
The first expression:

The second expression:

Please check the workflow and let me know if it works fine:
replaceDuplicate.knwf (33.1 KB)

Best,
Armin

1 Like

Hi @ipazin,

Thanks for a great solution, this works really well on my larger data set.

A quick follow-up questions, a minority of rows in my table does not have all Prod-columns populated and because of this there might be rows that have missing values in either Prod1, Prod2, Prod3, Prod4 and/or Prod5. I would now like to populate the missing values with products from the reference table and again check that they are all unique. What would be the best way to do this?

Thanks again.

Hi @armingrudd,

Your solution also worked perfectly, thank you so much for send this over.

I have updated my solution to replace duplicates and missing values by adding 1 new condition (column(variable(“column name”)) == null) at the end of each 2nd level “if” statement.

Best,
Armin

Hi!

This complicates things a bit. The best would be to check both uniqueness and missings at the same time. So you would need to add 1 Rule Engine node for column Prod1 to check if it has a missing and then in each rule engine node add additional rules and modify existing following the logic that is already in place.

Good luck :wink:

Br,
Ivan

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.