Duplicate Invoices (similarities in the invoice amount)

DA2023 · November 7, 2023, 6:14pm

Good evening everybody,

I’m working on a workflow that helps identifying duplicate invoices/payments.

Basically, there has to be a check for every invoice (each row in one column) in an excel file. The transaction should be considered as a “duplicate”, if: the “Vendor ID”, “Invoice Number” and “Invoice Date” are identical (exact match) and the “Invoice Amount in EUR” is “similar”.

Here, “similar” describes three possible scenarios (a kind of fuzzy logic):

The “Invoice Amount in EUR” differs ± 3% from each other (another invoice),
The “Invoice Amount in EUR” is exactly twice as high as the other (e.g. 50.15 and 100.30).
The “Inoice Amount in EUR” begins with the same first four digits (123.45 and 1,234.5).

It would be great, if the output could be similar to the “duplicate row filter”, with the duplicate classifier (chosen and duplucate) and duplicate identifier (e.g. row 12), so that the duplicates can be extracted and displayed seperately (e.g. for internal audit purposes).

Unfortunately, I have no idea on how to realize the “similar” criteria, as I don’t know how to do the check for each individual row in the column (check for all invoices, from invoice to invoice) and hope that someone can help.

Many Thanks in advance & best regards!

Kazimierz · November 8, 2023, 7:42am

Hello @DA2023

I have no ready-to-use solution; just two thoughts:

I would use ‘Duplicate Row Filter’ to identify invoices with identical ‘Vendor ID’, ‘Invoice Number’ and ‘Invoice Date’ columns.
Then, I would:

Make sure amount in ‘Invoice Amount in EUR’ is a number.
Calculate 3% variation.
Change numbers to strings.
Remove special characters like commas, dots, or any others, but keep the original values.
Change strings back to numbers.
Check if values meet conditions you mentioned: doubled values or with identical beginning.

Please treat this as inspiration for next trials.
Good luck,
Kaz

DA2023 · November 8, 2023, 3:34pm

Hello @Kazimierz

many Thanks for your help!

My problem is that I don’r really know, how to define a check from invoice to invoice. The invoices have to be compared to each other. The problem is that I don’t know, how to do that in KNIME.

Best regards,
DA

rfeigel · November 8, 2023, 4:15pm

Can you share a data set?

DA2023 · November 8, 2023, 4:33pm

Hi @rfeigel

unfortunately, I am not allowed to use the real data. But I have created an excel table. Here, I have tried to implement the patterns, I would like to search for. Hope that helps.

Best regards
DA
Test Data.xlsx (11.5 KB)

hmfa · November 8, 2023, 7:18pm

Hi @DA2023.

I produced this workflow for you.

Duplicate Invoices.knwf (94.9 KB)

Hope it helps.
Br

DA2023 · November 9, 2023, 3:40pm

Hi @hmfa

it works perfectly well, many Thanks for your support!

Just for my understanding: with the “Counter Generation” & “Joiner” nodes you have duplicatet the input in order make it comparible to each other, right? Can you tell, where I can find additional info regarding coding in the “Column Espressions” node?

Best regards
DA

hmfa · November 9, 2023, 4:14pm

Hi @DA2023

Yes, you are right.
The counter generation serves to uniquely identify each invoice.
The joiner relates the table to itself, to compare each invoice to all others from the same supplier and date, including itself.
Finally, it excludes lines where an invoice compares to itself because it is no exception (same counter).

The column expressions node is very powerful.

You can find more information about JAVA script rules here.

Br
Hugo

DA2023 · November 10, 2023, 12:24pm

Hi @hmfa ,

many Thanks for the Infio & your support!

Have a nice day & enjoy your weekend.

Best regards,
DA

system · November 17, 2023, 12:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.