Feedback Wanted: Help Us Improve K-AI's Workflow-Building Capabilities

Hi!

Do you use K-AI, the KNIME AI assistant, to help with building workflows? Or maybe you tried, but something didn’t quite work? We would love to hear about your experience!

If you have a few minutes to spare, please leave your feedback in this short survey: K-AI Workflow-Building Capabilities Feedback

Many thanks on behalf of the AI team @ KNIME :slightly_smiling_face:

-Ivan

3 Likes

Hi,

I think that the KAI is an amazing feature and that you should further build on it. At the same time, it needs to find a safer niche position.

KAI is inevitably compared with less specialised general purpose LLMs on the web and this comparison is not going to go away, in particular now that the free tier has been limited - and economically I totally understand why there has to be a limited free tier. (As a matter of fact, I’d even be fine if KAI was an exclusive feature of the paid tiers.) Fact is that a general purpose LLM that has access to the internet can already answer the generic non-confidential questions about KNIME with a pretty inspiring degree, albeit not with KAI’s workflow-building capabilities, granted.

How is the workflow-building capability justified in the light of the fact that rookie users can obtain working python or R code for any kind of workflow, via other specialised LLMs, that they simply paste into a relevant KNIME node? What’s more, when looking at (currently GPU hungry) python packages such as skrub, what is going to be KAI’s approach to AutoML?

Personally, I steer away from KAI because of data protection and confidentially concerns, more specifically pertaining to the data sharing. It is not clear to me what and how data is shared and what is done with it beyond the privacy policy promises. There is no way for a user to limit or configure what is being shared (eg only the node names and graph of nodes, the nodes+config, the data structure, the data, etc.) and for which purpose. Proton AG is trying to build an LLM, named Lumo, that is both privacy- and EU-focused and, while these efforts are also still very early stage and far from the stated goal, I think that it could a differentiating benefit for KNIME to build a more privacy-aware and more user-configurable KAI pipeline. For anything private, the only solution for now is a locally deployed LLM, and that’s reserved for the few having access to GPU power, or not to feed any private data into any LLM at all.

Finally, it would be great if there was a way to have KAI switched off by default without any questions asked, even on first installation.

Kind regards

Geo

2 Likes

best way to improve the workflow building would be improving the basic nodes and functionalities in the first place.

see e.g.

  • String Manipulation (Multi Col) still has no column selection by type
  • Math Operation (Multi Col) still has no column selection by type
  • String to Date&Time still has no column selection by type
  • The new Row Filter can’t wildcard filter Path
    There is no easy way to combine a Date column and a Time column into a Datetime except for concatenating casted Strings
  • Lag column can only lag forward (1,2,3,…) but not backward (-1,-2,-3,…)
  • most of the new nodes (e.g. Value Lookup) cannot use the RowID. requiring you to cast it into the dataset before using those nodes
  • We had a Date&Time based Row Filter for ages but no Splitter - always requiring a subsequent Reference Row Splitter
  • There is no Number to Duration node. Column Expression now officially Legacy but Expression node not covering that either
  • No way to combine two columns of type Duration into one without casting
  • There is a Parallel Chunk Loop extension but no Concurrent Loop option
  • Parquet Files cannot be read from \Server\DriveLetter$ paths. Further, encrypted .parquet files cannot be read (or written).

Add 'append' option to the SharePoint Online List Writer (which will hopefully become resolved after 2 years and 9 months

5 Likes

and adding

2 Likes

Hey @fe145f9fb2a1f6b,

I totally agree and we will focus on improving our nodes as soon as we have migrated them to Modern UI. We have hoped to do it in one go as we touch the nodes while migrating, but it turns out that this is not as easy, which is why we will first migrate them to modern UI and afterwards rework them. Our internal list of things we would love to improve is already growing with every node, but it is even more valuable to us to see the pain points of our users. So please keep it coming! I noted down all of your points and hope we can improve them soon.

Greetings,

Daniel

2 Likes

Hi Geo, many thanks for the feedback and thoughts!

You raise a few really good points, let me contribute with some thoughts of my own.

Regarding general-purpose LLMs on the web being able to answer KNIME-related questions: yes, true to a degree, but that degree likely seems much higher than it actually is. From how you talk about LLMs, you probably know quite a bit about the not-so-inspiring reality of huge portions of LLM output being false. While it’s literally the inherent way they are meant to work (hallucinating the answer), you can, of course, steer those hallucinations towards correctness by providing concrete bits of relevant information. As you said, the chatbots online are pretty good at browsing the web to retrieve some relevant information, but usually that’s simply not enough. I can confidently say that because even K-AI, who has access to vector stores with carefully curated KNIME-specific information, can and does make mistakes in its answers. From this perspective, K-AI does provide quite a bit of value in terms of onboarding and learning, it’s just difficult to make this case when answers from ChatGPT always look so convincing.

Regarding K-AI’s worklfow-building capabilities compared to LLM-generated code in a scripting node: I totally understand your point, but this is more of a question about visual programming vs. high-code programming. To me the value of LLM-generated visual workflows is very apparent compared to LLM-generated scripts - clear representation of the flow of data, clear abstraction of each transformation done to that data, presumably well-documented via node and workflow annotations. Sure, you can compress all those steps into a single node with a Python script inside, but you lose all the benefits that come from programming such workflows visually. Even so, you do have access to K-AI inside scripting nodes, and you can still benefit from the rest of KNIME’s ecosystem even with a single scripting node in your workflow (deploy, schedule, etc.) :slightly_smiling_face:

Regarding privacy: we recently rolled out a note on this in our documentation, have a look - KNIME Analytics Platform User Guide . But yes, the two super important points you mention here are on our minds as well. We’re planning to let users have control over what’s accessible to K-AI and what isn’t (e.g. only table specs, or also node configurations, or also perhaps samples from actual data, and so on). And letting K-AI’s backend LLM be configurable only makes sense, that’s also definitely something we want to do.

Regarding turning all AI features off if you’d rather not have them - you can already do that right away: Preferences → KNIME → KNIME Modern UI → AI Assistant

Really enjoyed reading your feedback, many thanks again.

-Ivan

3 Likes

Thank you for reading through my feedback and providing your reflections, Ivan! Let me please complete my reflection with a more challenging benchmark than ChatGPT: Brave’s AI, which is embedded into their search engine and ships with access to the worldwide web, is pretty good because it systematically cross-references its sources for every single sentence. This makes it very easy to verify the AI’s output.

Kind regards

Geo

1 Like

Hi Ivan,

do you have statistics on what situations K-AI is used?
In my case I almost 90% it is “python scripting node” and the “generic ECharts view”. In both cases the general-purpose LLMs are working pretty good, but especcially for the python part with outdated syntax.

For me the usage itself is quite annoying as I have to confirm and accept the disclaimer everytime I open the Python Scripting Node and want to use K-AI.

2 Likes

I can second @ActionAndi - primarily using K-AI in Python Script / eCharts / Expression nodes.

I have tried Build mode here and there more for experimental purposes and “the basics” seem to be working quite well. That said I have not given it a try for medium-high complexity tasks.

If I compare how I use K-AI to how I use LLMs when e.g. working in VS Code, in my case, I use K-AI for anything that requires typing in supported nodes whereas I take care of most of the workflow building myself. Often times the workflow I build are hard to “summarise” in a prompt given that I am trying to find a solution, which requires reviewing table outputs and thinking about the next step rather than simpler things like “aggregate by X and group by Y,Z”. That said I use KNIME every day and that is my specialisation.

I’d be really keen to hear from how K-AI helps new KNIME users get started.

3 Likes

Just wanted to add support for a few enhancements.

KAI settings options (per workflow and default global) - Rather than a 2step disclaimer, I would have a KAI workflow settings reminder visible when in use. Settings of what is accessible to KAI would help user confidence and allow us to dial in privacy vs effectiveness for a given workflows data.

Local KAI option - Ideally KNIME would share that underlying LLM and vector data package within the existing install structure options so that we can run and further customize a locally run KAI, and have a setting that allows us to universally (or by workflow) run KAI in local LLM mode. That way when workflows contain data with privacy issues we can still utilize it with confidence.

Iterative prompt and response - Even with advanced prompt construction techniques and a myriad of details in an initial prompt, I find “AI” tools to be borderline useless without the ability to dial in toward a solution via iterative followup interactions with access to historical conversations. No amount of fine tuning KAI will allow it to hit home runs based on a first prompts only. This is not only a limitation of the current technology, but a general rule of effective communication. I recognize that this challenge has security applications as well, which is why it would likely have to come hand-in-hand with the above recommendations.

3 Likes