Do you use K-AI, the KNIME AI assistant, to help with building workflows? Or maybe you tried, but something didn’t quite work? We would love to hear about your experience!
I think that the KAI is an amazing feature and that you should further build on it. At the same time, it needs to find a safer niche position.
KAI is inevitably compared with less specialised general purpose LLMs on the web and this comparison is not going to go away, in particular now that the free tier has been limited - and economically I totally understand why there has to be a limited free tier. (As a matter of fact, I’d even be fine if KAI was an exclusive feature of the paid tiers.) Fact is that a general purpose LLM that has access to the internet can already answer the generic non-confidential questions about KNIME with a pretty inspiring degree, albeit not with KAI’s workflow-building capabilities, granted.
How is the workflow-building capability justified in the light of the fact that rookie users can obtain working python or R code for any kind of workflow, via other specialised LLMs, that they simply paste into a relevant KNIME node? What’s more, when looking at (currently GPU hungry) python packages such as skrub, what is going to be KAI’s approach to AutoML?
Personally, I steer away from KAI because of data protection and confidentially concerns, more specifically pertaining to the data sharing. It is not clear to me what and how data is shared and what is done with it beyond the privacy policy promises. There is no way for a user to limit or configure what is being shared (eg only the node names and graph of nodes, the nodes+config, the data structure, the data, etc.) and for which purpose. Proton AG is trying to build an LLM, named Lumo, that is both privacy- and EU-focused and, while these efforts are also still very early stage and far from the stated goal, I think that it could a differentiating benefit for KNIME to build a more privacy-aware and more user-configurable KAI pipeline. For anything private, the only solution for now is a locally deployed LLM, and that’s reserved for the few having access to GPU power, or not to feed any private data into any LLM at all.
Finally, it would be great if there was a way to have KAI switched off by default without any questions asked, even on first installation.
best way to improve the workflow building would be improving the basic nodes and functionalities in the first place.
see e.g.
String Manipulation (Multi Col) still has no column selection by type
Math Operation (Multi Col) still has no column selection by type
String to Date&Time still has no column selection by type
The new Row Filter can’t wildcard filter Path
There is no easy way to combine a Date column and a Time column into a Datetime except for concatenating casted Strings
Lag column can only lag forward (1,2,3,…) but not backward (-1,-2,-3,…)
most of the new nodes (e.g. Value Lookup) cannot use the RowID. requiring you to cast it into the dataset before using those nodes
We had a Date&Time based Row Filter for ages but no Splitter - always requiring a subsequent Reference Row Splitter
There is no Number to Duration node. Column Expression now officially Legacy but Expression node not covering that either
No way to combine two columns of type Duration into one without casting
There is a Parallel Chunk Loop extension but no Concurrent Loop option
Parquet Files cannot be read from \Server\DriveLetter$ paths. Further, encrypted .parquet files cannot be read (or written).