Happy Wednesday, everybody! Today’s Just KNIME It! challenge focuses on AI Governance with Giskard.
You want to build a workflow that evaluates the output of LLMs, including the detection of their potential vulnerabilities. The goal is to create a workflow that facilitates decision making when picking an LLM for a new task. As an initial test, you want to check how different LLMs tackle the following task: “given a prompt with product descriptions, the LLM should create emails to customers detailing such products.” Which insights can you gather from this LLM evaluation?
Here is the challenge. Let’s use this thread to post our solutions to it, which should be uploaded to your public KNIME Hub spaces with tag JKISeason4-24 .
Need help with tags? To add tag JKISeason4-24 to your workflow, go to the description panel in KNIME Analytics Platform, click the pencil to edit it, and you will see the option for adding tags right there. Let us know if you have any problems!
Find herewith my Submission : i have used openai gpt 3.5 with low temp for factual inclination and 2nd LLM is deepseek .. Giskard throuw some alianic output.. which really i couldnt interpret… other then that .. i need to improvise.. My flow JKISeason4-24 – KNIME Community Hub
This Giskard LLM Scanner node is probably going to rank as my least favorite node.
Had some non-text characters in the response output that seemed? to cause it to fail. The output seems unrelated to the actual responses - maybe it’s a general assessment of the possible responses to the prompt.
Confusing & finicky and hope to never use it again!
I decided to compare GPT4.1-nano and GPT5-nano.
I worked on a prompt for the LLM using the Expression node. But I am unsure of my prompt as it seems that the product name, summary and details are not properly generated. I have the feeling my prompt is not being accepted properly - anyway, the goal was about Giskard and LLM perf comparison
The outcome of the Giskard is quite interesting as you can see.
No issue detected!
I assume this is due to their very similar behavior.
Will do the exercise with a complete different model like Deepseek.
The model generation LLM I selected was gpt4.1 mini and used gpt-4.1 as the model LLM revaluation.
The outcome in the Giskard LLM scanner is producing some “interesting” vulnerabilities.
For Harmfulness tab, the scanner indicated the LLM produced a newsletter email that provides information on how to build a dangerous weapon (i.e. handgun silencer). This is really strange considering that none of the products in the dataset are weapons but merely simple household products (i.e., gourmet blend coffee maker). A good exercise nonetheless. Cheers
For creating the e-mails I used my local Meta-Llama-3-8B-Instruct and for Giskard check I used my local gpt4all-falcon-newbpe-q4_0.
I didn’t use Giskard (nodes) before, but I think I understand the basics of it now (of course I read about it now a lot, but mostly because of this challenge). And that’s thanks to KNIME. And I really like KNIME because it makes complex concepts simple (tables, rows, columns, data etc.) (I think Giskard can be complex for someone who didn’t use it before).
Of course If you have any comment on my workflow I’m really open to it
Our solution to last week’s Just KNIME It! challenge is out!
Last week we focused on how to identify potential LLM vulnerabilities with Giskard – a very new topic within our community, which led to some true learning!
Tomorrow let’s keep sharpening our skills on geospatial processing, a topic we’ve explored a few times before in our challenges. Zurich’s city council wants you to interpret a growing dataset of citizen-submitted service reports. To ensure equitable and efficient resource distribution, the council wants to break down the city into smaller, more manageable clusters. Can you pinpoint a systematic method to group Zurich’s neighborhoods based on the incoming reports?