I would like to find a way to replace duplicate Pod1-5 values where the row indicates ‘Not Unique’ in the UniqueFlag, effectively trying to make all row have the flag ‘Unique’ as a final output. Conceptually, I have thought of something like this from a flow perspective:
Filter out all rows that are ‘Not Unique’
Create a reference table of some sort to list say 50 product ID that could we used as a replacement if the row is not unique
Lastly, and this is the tricky bit I think, somehow insert product id’s from the reference list where there is a duplicate value in the original table row (as indicted by the ‘UniqueFlag’).
Ideally I would like to create some sort of loop that replaces values until there are no more flagged ‘None Unique’ rows left
Any ideas on how I can implement this practically? Thanks in advance for your help with this.
A few question:
Do you want to replace the entire row when it’s “Not Unique” or just the duplicate values?
And how exactly do you want to replace it?
Please provide more explanation and a sample input and output.
Do you want to replace the entire row when it’s “Not Unique” or just the duplicate values? Just the duplicate values
And how exactly do you want to replace it? By utilising the values in the Reference File sequentially
Please provide more explanation and a sample input and output. I have attached a sample file here with the current input, the reference table and the desired output (the values in red indicated that they have been replaced by the process). I hope this is enough explanation but otherwise let me.
not the coolest solution using 4 Rule Engine nodes but it seems to work well. Actually there is a good chance it has a mistake somewhere so check it twice
Hi,
I am so sorry for replying too late.
It seems I have missed this topic.
Many thanks to @ipazin for his solution. Here I have a solution as well:
To solve this issue, I created a counter column (to count duplicate replacements in the loop) and then pushed data into a recursive loop (5 loops and collecting data from last iteration only).
The column expressions node does all the job. How? I’m pretty sure you’d rather not ask as this is very confusing even to me :
2 expressions, the first one to replace duplicates and the second one to modify counter value.
The first expression:
Thanks for a great solution, this works really well on my larger data set.
A quick follow-up questions, a minority of rows in my table does not have all Prod-columns populated and because of this there might be rows that have missing values in either Prod1, Prod2, Prod3, Prod4 and/or Prod5. I would now like to populate the missing values with products from the reference table and again check that they are all unique. What would be the best way to do this?
I have updated my solution to replace duplicates and missing values by adding 1 new condition (column(variable(“column name”)) == null) at the end of each 2nd level “if” statement.
This complicates things a bit. The best would be to check both uniqueness and missings at the same time. So you would need to add 1 Rule Engine node for column Prod1 to check if it has a missing and then in each rule engine node add additional rules and modify existing following the logic that is already in place.