Hi,
I’d like to propose the option to persistently mark the sensitifiy level of data. That would allow to prevent i.e. sensitive data to be passed un-anonymized to AI-Agents, providing the ability to instate safetymeasures and automations to anonymize data.
Best
Mike
Adding to this (have discussed this with Mike and also some KNIMErs at DataHop in Stuttgart):
I feel this would be a great feature - especially in the context of working with AI Agents.
The overall story with having a data layer that allows the user to control what data (if any) gets fed to the LLM is great already. In my use case I still face a challenge that any data required to actually call a tool right now has to go to the LLM.
E.g. if my agent should create a customer record in my CRM, say for Customer “MartinDDDD” then:
- MartinDDDD is typed into chat interface
- It is then send to LLM
- LLM generates the Payload required to call the tool (which will include the parameter e.g. customerName: MartinDDDD
- The tool creates the record
I have thought about alternatives on how to implement this:
- Option 1: Create a custom Agent UI (i.e. not using Agent Chat view etc), sanitise user input before it goes to the LLM (e.g. using Presidio extension) => this works for the input, but given that the tool call happens before the response gets back, this would mean that a record for the sanitised name is created (and also not using the great Chat Views / Widgets seems like a waste)
- Option 2 (and this is just conceptual, probably impossible): Somehow funneling the customer name into the data layer - but this would probably mean for the user to have a form available which saves the name in e.g. .table-format in a temp location (which I understand right now is tricky as relative paths are not supported between Agent-Level WF and the Tools), then have the agent use a tool which fetches the data and in the same tool to create the record. This seems super complicated and impractical
- Option 3: ignore this challenge in KNIME and e.g. use a self-hosted LLM (renting GPUs…), which will be pricey and likely not a possibility for most users
What I am thinking of how this could be solved in KNIME (possibly with a trade of in terms of increased latency):
- MartinDDDD is typed into chat interface
- Agent Chat View / Prompter / Chat Widget contain a setting to trigger anonymisation e.g. using Presidio under the hood => if active:
- Text from Interface / Table is anonymised, temp e.g. presidio model is stored) - e.g.:
- Customer_A: MartinDDDD
- It is then send to LLM
- LLM generates the Payload required to call the tool (which will include the anonymised parameter values e.g. customerName: Customer_A
- Before tools are invoked, the temp presidio model is used to de-anonymise to turn customerName: Customer_A into customerName: MartinDDDD
- The tool creates the record
I hope the above reasoning and example makes sense. Happy to explain my views in more detail 
1 Like