Sometimes, Knime is Hard. (Just a discussion)

Felipereis50 · March 24, 2024, 3:22pm

This post is not a request, it’s just a discussion comment aimed at seeking improvement.

I’ve been using Knime for about 2.5 years now, and it’s an incredible tool. I’ve learned a lot and have created workflows that have helped improve my work and my month salary.

There are still many nodes for me to explore. So, I always strive to improve and step away from the most commonly used nodes to try and find better workflows from other users, preventing myself from getting ‘stuck’ in a situation that I could easily resolve with a specific node or component.

Today, I learned about the ‘Lag’ node, whereas, before I knew about it, I was using the same concept in a formula in the ‘Column Expressions’ node. (OFFSET) with help of some friends in the forum.

To learn more, whenever I can, I like to study the examples from ‘Just Knime It,’ which is a wonderful learning initiative and a series that shouldn’t end. But also, there are community-created components that can teach a lot.

With this, new nodes to be discovered always emerge, and then the study of these nodes begins and becomes part of the cycle of learning (examples, videos, etc)

I always think that to better understand each node, the ideal is to search for examples on the internet or YouTube (in video format) that are simple to learn the concepts.

And it’s at this point that I encounter complex workflows, honestly, ones that I don’t understand at all and realize that I have a lot to learn.

For example, today I was analyzing this video on YouTube, ‘KNIME Live Demo - Cleaning Data Columns without Coding,’ and at the end, an ‘Interactive Column Filter’ component is presented.

I found it interesting and downloaded the component to analyze step by step. I came across a component with a LOT of metanodes inside, loops, etc. It was something I thought would be simple, but in fact, it’s a complex component, which I imagine has been created to handle various situations without ‘breaking.’

I tried to understand it from the beginning and gave up. There comes a point where I understand nothing about the logic behind it.

That’s why I relate learning Knime to learning DAX (Power BI). The learning curve is very fast in the first 6 months. Then, it seems like the step is bigger, and to move beyond this point, it requires an extra effort that didn’t exist in the first 6 months.

In summary: learning Knime is easy to a certain point or until encountering complex workflows that scare me.

Level of difficult (Dax Example)

takbb · March 24, 2024, 6:09pm

A good discussion point and I agree with you @Felipereis50 that sometimes KNIME is hard. Sometimes it blows me away at how a node or a couple of nodes can make life so simple and yet at other times something I think ought to be easy takes a lot of effort.

Your experience with components and their complexity is spot on. I write quite a few components to try to make something that is complex (but may need to be repeated in future) much simpler but the result is inevitably a component that contains far greater complexity than the original workflow complexity it seeks to avoid! And the main reason for that is exactly as you described…

… The component has to be far more flexible and generic than it would be if it were just a simple workflow written for a single use case. It seeks to handle a lot of different eventualities and possible error conditions that (might) arise and from the outside observer’s perspective sometimes the sequence of nodes used appears nonsensical.

Sometimes, even just adding the configuration nodes requires additional complexity.

For example… “Variable to Table Row” followed by “Column Rename (Regex)” followed by “Column Selection Configuration” from the outside might have people scratching their heads as to their overall purpose, but something like that is a pattern I use just to allow a user to choose a Path Variable in the component’s configuration.

Unless I annotate the nodes or the component, somebody who has never done that might be wondering what those nodes are doing, and can only find out by working the individual steps, and trying to imagine what the component developer was thinking!

Other times the purpose of a sequence of nodes is even less clear, even when worked through, as they simply protect against an eventuality that may not occur when executed: why put a Column Splitter before a String Manipulation (multi-column) node and then put a Column Appender afterwards just to bring all those columns back together again?

That might only make sense if you’ve personally experienced what happens if the said String Manipulation (multi) node is passed anything other than String, Integer or Double columns, and you want to protect against column types that weren’t present when the component was written.

I’m trying to work through a workflow, one of the things that I would prefer KNIME did differently is being ablr to access the configuration of many of the nodes without first executing the workflow. Since nodes key you see inside regardless. Others point black refuse unless they have data. I would very much like to always be able to open a node and see the last available config for it, even if it cannot be changed. Even if there were just an option on every node that displayed the value for each seeing in text form it would be better than it is now.

It isn’t always practical or even possible to actually execute a workflow. Take a workflow that updates a database. I don’t want to execute it just to check that something is configured correctly!

Another example is that i have some workflows I wrote a year or so ago that access a Web API. Now I’d like to check how I did some things because I want to reuse some of the ideas on a new project I’m working on. The trouble is the old web API is no longer available. So I cannot now run the old workflow and those ideas I had then are now unavailable to me. I cannot now work out what parts of my old workflow did, or more to the point how I configured it! That was a painful discovery!

So yes, KNIME is an exceptional tool, but sometimes it is hard even with your own workflows, and especially with working out other people’s! Often it is not so much the understanding of what the individual nodes are doing, but understanding what was going on in the mind of the person who put those nodes in that particular sequence… it is the understanding of the recipe as a whole that can elude all of us especially if there is no annotation (and I am certainly guilty of often not putting sufficient annotation inside my own components!)

Felipereis50 · March 24, 2024, 6:47pm

Hello takbb,

Everything you said is absolutely true. When I analyze a component, there are so many nodes within it that for some reason make no sense at all. However, the creator had to test and test and test, perhaps, to find all possible errors, and then made several updates so that the final result was “perfect.” That’s why I think studying Knime based on components isn’t the best way. It will only confuse you more.

But then again, the purpose of the component is to streamline your work, and it doesn’t matter how it was created; what matters is that it works and the mission is to save you time. On the other hand, there’s the feeling that you’re not learning anything by using a component, and in my case, I feel like I need to understand all the steps to one day be able to adjust my own project.

Moreover, perhaps, much of these difficulties in understanding some flows or components stem from the lack of annotations. Annotations are essential. The problem is that making annotations “tires” you out, disrupting the flow of creation, and we always think: when I finish the flow, I’ll make the annotations. The problem is that when you finish, you feel lazy to annotate all the points. (you need to annotate right away). Without annotations, with many details, a year from now, it won’t be the flow itself that will help you understand, it will be the annotations that you took the time to write. Sometimes annotations can be extensive, with the sole purpose of explaining just one concept of that small strategy of selecting that node.

In your sentence: Another example is that I have some workflows I wrote a year or so ago that access a Web API. Now I’d like to check how I did some things because I want to reuse some of the ideas on a new project I’m working on. The trouble is the old web API is no longer available. So I cannot now run the old workflow and those ideas I had then are now unavailable to me. I cannot now work out what parts of my old workflow did, or more to the point how I configured it! That was a painful discovery!

… I understand your frustration exactly, and that’s why it will always be important to save a part of the original database to use later. In your case, it was a web link, a bit trickier to save. But these are some details we forget.

I really liked your story.

But what I feel is that I’m starting to enter the peak of difficulty, but I’m still at the beginning, there is still room for more “simple” learning.

The important thing is that we have people like you to help us, and I know you are one of the most influential people in the community. I continue in the phase of asking for help. Later on, I may be able to help others too.

A big hug. Brazil

takbb · March 24, 2024, 7:47pm

@Felipereis50 , Nice of you to say, and a hug back from UK! although I don’t think I’m influential lol, but hopefully half of what I write is helpful… Although I know I am guilty of writing too much .

Think on this though… In February 2021, just over three years ago, I had not even heard of KNIME.

A number of the regulars who provide the majority of solutions have been members of the forum for even less time than I have so it shows that you don’t have to have been here for many years to make a difference!

In March 2021 I started answering questions on the forum. I had just downloaded KNIME and this was my way of teaching myself how it worked: I researched other people’s new questions by finding old posts on the forum and using those to work out solutions for these questions.

In the process I learned by reading, modifying, and doing. I didn’t always come up with the best solutions (I still don’t, btw) and others sometimes pulled a node out that I’d never come across or they just had a more efficient way of doing something. But I learned each time.

I’ve learned about a lot of new nodes or techniques from great people like @ipazin , and @mlauber71, and reading their posts. Maybe (hopefully) in some cases I’ve managed to return the favour by looking at something from a different angle. The fact is though we’re all continually learning, and every one of the regulars here who answer the majority of questions on the forum have clear specialities. We all bring different skills and nobody knows everything. It’s a great collaboration.

There are a huge number of nodes about which I know nothing… my primary skillset is in data transformations databases, with some java for good measure. But ask me a question about actual data science, machine learning and predictive modeling and I’ll probably give you a blank stare .

I keep reading and keep learning. I think I learn more by reading forum solutions than I do by reading the actual workflows, although sometimes I do download a solution to see how it works.

Discovering and remembering that something is achievable is key for me. Six months after reading a post, somebody will ask a question and I’ll think… I’m sure somebody had an answer for something like that… and I’ll go searching, and then I’ll see if that old solution can be adapted. That’s how I learn…

iCFO · March 24, 2024, 10:01pm

This is kind of the double edge sword of having a blank slate open approach to problem solving. There are so many ways to solve problems and individual users come from so many different skill sets and disciplines. Each of them leans into their strengths when building workflows and components, so the resulting workflow methodology varies wildly.

Components require dynamic approaches everywhere and trial and error adjustments which dramatically complicates outside review. It is more like system design than workflow building. They will always be messy to internalize.

One thing that I have started doing in my workflows and components is making a dummy data hard copy of basic data inputs and placing it in the workflow unconnected. That way the workflow can be manually connected to the sample data in the future and the solving approach can be reviewed / leveraged in the future.

The auto annotiate node approach is also promising for simplifying some basic annotations with no effort.

mlauber71 · March 25, 2024, 5:47pm

@Felipereis50 thank you for your thoughts on KNIME and Components.

I think like all more complex code components can be difficult to understand. But that would be true for every complex coding enterprise that is not well documented - and even if it is it can be challenging. KNIME nodes would add an additional layer since they have functions and settings of their own (with underlying code) and then the structure of the component itself.

Bu there are also good news Thru the workflow structure there is alway a clear way to go where as pure code might reference parts that are somewhere else and you would then have to figure that out.

Complex coding projects are seldom meant to be understood by itself you will always depend on documentation. KNIME has the benefit that you can have the documentation (necessarily) step by step and you can also organise it in Metanodes/Sub-Compnents and (colourful) annotations.

But in the end you always depend on the skill and care of the programmer; if all options and edge cases have been taken into account. Which is why there is the concept in KNIME of:

https://www.knime.com/verified-components

… where you can be sure that the KNIME team has tested them. With other components there always is the question of quality and maintenance - but that is true for all open source projects. At least it is transparent and you can check what is going on. If you release a component you should make sure other people might be able to understand it. I try to add some links and text to the ones I publish. Also I try to give ‘complete’ examples including sample data so a workflow from my repository can run instantly.

@takbb and I had the honour of having the most downloads of examples from the hub in the last year - so maybe we are doing something right after all. But if someone suggests improvements we will surely listen carefully.

takbb · March 26, 2024, 7:00am

Thanks @mlauber71 for sharing Rosaria’s article. That one had somehow passed me by. I had no idea! I am indeed honoured. Note to self… Improve my annotations,
lol ! (And congratulations on the workflow downloads, btw)

Felipereis50 · March 26, 2024, 9:40am

Congratulations for both and all contributors.

system · June 24, 2024, 9:41am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.