State if it's a duplicate based on a row

Hello all!
I’m fairly new to Knime, hope you can help me identify below situation, grouping by DCN’s, sometimes there are rows that are duplicate (same information on all columns but different reference) and that should be flagged as a duplicate (first line was the payment and second line was another payment made by mistake), this was already solved using Duplicate Row Filter.

Some other times, there’s an additional line with an H on the credit column, which means that the additional payment amount was returned, I’m having issues trying to find a way to validate that line with the H is being a counter-payment for one of the S rows.

Thanks in advance! :slight_smile:

Hi @carlos_amaya maybe a solution could be to change the columns you are using for duplicate detection in the -Duplicate Row Filter- node?

I have used Dummy data and only two columns but for example, if you use only the DCN column for detection you would get:


If you use DCN and the Debit/Credit column for detection, the result would be:


Does the reference column need to be the same to be a duplicate? Or this column can be different?


1 Like

Hi @carlos_amaya,

I see that I may have overcomplicated the requirement, and the solution may be more simple but since I had worked on this already, I will post it anyway!

If my understanding is correct, you are saying that:

  1. a “duplicate S” row is no longer considered a duplicate if there is a valid H row that reverses it, and
  2. an H row is considered valid if it exactly reverses ONE S row
  3. a duplicate S row that is not reversed is still considered invalid
  4. an H row that has no corresponding S row is considered invalid

At the end you want to flag all S rows that are duplicates but not reversed, as well as all H rows that are incorrect reversals.

The solution presumably needs to take into account that there could be multiple H and S rows for any given DCN and that it must take into account the idea that a single H row can only be considered a reversal for one of the S rows. This adds some additional complication with ensuring that we count reversal only once.

Extending your example data set to cover a number of issues, I looked at the following:

DCN Reference DETAIL-FIELD-A DETAIL-FIELD-B Debit/Credit comment
10349347 513477894 312 GAL S valid
10349347 11816177 312 GAL S valid because revered
10349347 413343494 -312 GAL H valid reversal
123 27953295688 200 ABC S valid
123 45093121452 200 ABC S valid because reversed
123 33481082957 200 ABC H invalid reversal
456 67057236481 30 XYZ S valid
457 57581440870 -30 XYZ H invalid reversal (wrong DCN!)
123 432646278 200 ABC S invalid because duplicated and not reversed
123 54543643 -200 ABC H valid reversal

I have added a comment to the right showing what I believe should be the outcome. Where there are reversals, it is arbitrary which is actually considered the valid remaining S row, as there is no stated rule for that.

I think that the attached workflow will achieve this, and perhaps can be adapted to your data. The numeric data in this workflow are integers. If your real data is non-integer you will need to adjust it accordingly; in particular the math formula node that negates the result is specifically configured to return an integer and would need to be modified or else the joins will fail.

Validating rows for duplicates and reversals.knwf (52.6 KB)

The end result is this:


My Flag column is currently better described as a “Comment” column, but you can adapt this to simply return OK/FAIL or whatever you require. I left it like this as it helps with testing.

You will note that the choice of lines that are considered reversed differs to my “test” table listed earlier, because it arbitrarily matches the S and H lines on a first-match basis, but the net result should be the same.

If it is possible for a single H to reverse more than one S, or a combination of H lines reverse a combination of S lines, then the solution would unfortunately become a great deal more complex. (I have done something like that before but I had to use some java to simplify the process, and it took me over a week to write! I hope therefore that this “simplified” solution helps.


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.