Today we’re back with a finance puzzle proposed by KNINJA @MartinDDDD. You are a data analyst supporting various finance sub-departments in a big company, and your main goal is to upgrade their processes with the latest tools and technology. It’s January 2nd and all of the previous quarter’s supplier invoices need to be processed. Can you assist the finance department and find ways to extract relevant data from XML invoices, create a management reporting package, among other tasks?
Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-9 .
Need help with tags? To add tag JKISeason4-9 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!
Too many downloads for what’s supposed to be just a Just KNIME It challenge…
At this point, are we still talking casual enthusiasts, or is this turning into a full-time job disguised as a community sprint?
Honestly, it’s starting to feel more like Formula 1 than data science!
Haha, I was about to share the same thing—but you beat me to it with a perfect undercut!
Important note: We wanted to preserve @MartinDDDD’s challenge as much as possible, so we kept the dataset he used. Please let us know if it is becoming a problem to download it and we will work on a sample that is easier to play with!
Thank You @MartinDDDD with your attachment I was able to download it as well!
Just one question: In the third task where we can find the purchase order date? (or the invoice date, as it’s IssueDateTime in the xml). I just found this Date, and the third task asks to compare the invoice date to the purchase order date:
< ram:IssueDateTime>
<udt:DateTimeString format=“102”>20240203</udt:DateTimeString>
</ram:IssueDateTime>
If it’s the part of the challenge to find that date, then I will work on it!
The third task:
“3. Assist with performing internal controls – i.e., taking into consideration two datasets provided by the Procurement department you should identify any invoices that (a) do not have a PO Number, (b) do not match a PO Number issued by the company, and (c) have an invoice date which is before the corresponding Purchase Order Date;”
for the invoices your are on the right path - for the POs - they should be in the PO Header or Item data set… which I could not find in the folder so attaching them here for reference…
Thought it might be useful to provide some general context on this challenge:
Invoices are in XML format as since January 2025 it is mandatory to be able to send/receive e-invoices in a standardised format (with transitionary periods until 2027)
The XML format reflects a somewhat simplified standard e-invoice, but data included in them is synthetic (thanks gpt-4o-mini…)
In a corporate governance context, typically approval is required before spending company money, once spend has been approved a Purchase Order is issued to the supplier. Suppliers are required to reference Purchase Order number on their invoices
This allows for various checks to be performed between purchase order and invoice: Has the spent be approved before services were rendered or items delivered? Has the supplier invoiced what was ordered? Are there any differences between Purchase Order and Invoice (Prices, Quantities), is there a matching Purchase Order to the invoice?
Data regarding open purchase orders is also used in Financial Planning and Analysis: when will we spent how much money? Who are the top suppliers we are spending money with?
As you can imagine a lot of the above stuff, to this date, happens in Excel… so I thought it might be fun for everyone to change that
I have the feeling that it might be a very big project as you can extract so many interesting insights from these datasets.
I just spent some time on the goals defined within the challenge - not much more.
But I enjoyed using the generic e-chart node (along with the K-AI to generate the code) to be able to show some “big number” in the report.
I know I could extract more information, but here what I show in that workflow:
Number of invoices and total amount without PO number
Number of PO and total amount not invoiced
Number of invoices with a date before the PO’s date and total amount
Graph showing the amount to be invoiced in the future
Top 3 suppliers with remaining amoung
2 other tables with details on invoices with no PO and invoices with a date prior to PO
Note: many other graphs and numbers might be extracted. The graph could be summarized per week for example, to get more insights, instead of per single day.
This is sleak and Swift … Esp the BANs … text formatting been really a challenge esp with colors and highlights… For reports Its vital … thanks for adding the flavour.
This week’s Just KNIME It! challenge hit me right in the accounting brain I don’t have finance terms everywhere and me just trying to survive with inner peace and KNIME nodes
Even though I don’t have a strong finance background, I gave it my best shot and completed the challenge in under 24 hours after it was published!
It wasn’t easy (hello Level: Hard ), but learning never is, so let’s keep going, even when our brains scream “please no more XMLs!”
Keep learning, keep KNIME’ing, and don’t forget to laugh through the chaos!
Phew, that was real real interesting. In my opinion the best challenge in this season yet. It felt like real life. Messy, you do not exactly know what you’re seeing, what to watch, what to do… I think the inspiration came from real events @MartinDDDD, if that’s the case, it can be felt
So my steps was:
Reading the xml and extracting the necessary information
Internal controls
2.1 Checked if PO is missing
2.2 Checked if the two PO number is the same in the xml (all of them were the same)
2.3 Checked if the Invoice date is earlier than the POs date
Supporting Financial Planning & Analysis’ forecast
3.1 Checked if the product is the same in the PO as in the invoice (all of them were the same)
3.2 Checked if the total number in the PO and in the invoice for the line item are the same (there was a lot problem here :D)
3.3 I shifted the dates with 14 months to invoice the leftover amounts (it’s suspicious how much greater the amounts in the forecast, maybe I used wrong field to calculate (Total - Charged), but this was what the data told me )
I’m really open to suggestions to improve my solution. I felt really involved with this challenge, and it was real great, thank you @MartinDDDD! And if it was really inspired by real challenges I’m really curious how you solved it