I have a column of invoices where duplicates are created with abbreviations A, B, and sometimes A2.
I want Knime to help create a second column where it identifies the original invoice.
Column 1: Column 2
ABC12345 : ABC12345
ABC12345A : ABC12345
ABC12345B : ABC12345
EFG12345 : EFG12345
EFG12345A : EFG12345
EFG12345A2 : EFG12345
Thanks in advance!
If the stem of the invoice numbers is always the same length (apart from the A/B/A2, so in your examples here, the length is 8) then you can use the substring function in the String Manipulation node to extract the leftmost 8 characters and leave behind the A/B/A2.
If your invoice ids are of a fixed length (excluding the duplicate suffix) or follow a pattern like a set of letters followed by a set of numbers, optionally followed by the duplicate suffix (A/B/A2…), then it could be possible with a regex.
If this is not the case, it can be difficult to clearly define what is a duplicate.
The Invoice IDs are not fixed length. They vary depending on the vendor.
There could be “JKL1234567” and “JKL1234567A”.
However, there would not be “JKL12345”
Hi there @Rokank,
if you are aware of all suffixes that can happen you can use Regex Split node with following syntax that could use improvement as I’m far from regex expert
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.