I have a column of invoices where duplicates are created with abbreviations A, B, and sometimes A2.
I want Knime to help create a second column where it identifies the original invoice.
For example:
Column 1:
ABC12345
ABC12345A
ABC12345B
EFG12345
EFG12345A
EFG12345A2
If the stem of the invoice numbers is always the same length (apart from the A/B/A2, so in your examples here, the length is 8) then you can use the substring function in the String Manipulation node to extract the leftmost 8 characters and leave behind the A/B/A2.
Hi @Rokank
If your invoice ids are of a fixed length (excluding the duplicate suffix) or follow a pattern like a set of letters followed by a set of numbers, optionally followed by the duplicate suffix (A/B/A2…), then it could be possible with a regex.
If this is not the case, it can be difficult to clearly define what is a duplicate.
if you are aware of all suffixes that can happen you can use Regex Split node with following syntax that could use improvement as I’m far from regex expert