Duplicate string within a string


I am testing invoice payments for duplicates and was wondering if it was possible to test for parts of a string that appear in other rows?

For example, I have the following two invoice numbers (Invoice 1 is actually 3 invoices logged as one, Invoice 2 is 2 invoices logged as one):

Invoice 1: 9587637 9680086 9680085
Invoice 2: 9672145 9680085

Per the above 9680085 is a duplicate, however the traditional duplicate testing node would not identify this as such.

Is there a way of identifying this as a duplicate?


Hi Chris,


  1. Split the strings with more or less sophistication, e.g. a simple split or a regex extraction:
  1. Ungroup the collection:
  1. Group them back on invoice number and count the occurrences - rows with occurrence >= 2 are duplicates.

Hope that helps!


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.