Hard to explain this and can not easily share any data due to personal content but will do my best here.
I have two tables of data - one is a list of duplicate records identified by a number. These are small groups of records that have been identified by users.
DuplicateGroup ObjectReference Group1 34 Group1 44 Group1 65 Group2 100 Group2 433 Group3 34 Group3 433
I have a second table that contains groups of two records that have been checked and identified as not being the same. Object References can appear in here multiple times
KnownDifRef ObjectReference KTBD1 34 KTBD1 65 KTBD2 34 KTBD2 433
What I need to do is remove from the first list all the entries where the Object Reference has been identified as being different from the whole group. For example Group3 should be marked up for exclusion as both entries are in KTBD2. Group one should still show as while 34 is different to 65 they both need to be compared to 44. I can deal with the Group3, when there are just two values, but what I am struggling with is if the following are added to the second table and there are multiple checks:
KnownDifRef ObjectReference KTBD3 34 KTBD3 44 KTBD4 65 KTBD4 44
Now the added complication - my data has already reached 1.4 million potential duplicates and 60,000 Know Difference pairs. I know KNIME can handle this sort of question but I am just now sure how, and more importantly how to do if efficiently since the numbers are only going to go up, at least for a while.
I know that is probably not very clear but anything anyone can throw up would be really helpful - at the moment I am saying that there are more duplicates than there are which makes people discount any reporting, rather than accepting that they have the issue with the 1.4 million records!