Why Do Experts Favor Python/RStudio Over KNIME/<insert any other visual ETL tool here> for ETL and Data Preprocessing?

I personally prefer KNIME for all my ETL and data preprocessing—it’s visual, intuitive, and gets the job done. But I’ve noticed that most experts seem to default to Python or RStudio for these tasks, even for things KNIME can handle.

Why do you think Python/RStudio is so popular among experts for ETL and transformations?
Are there specific limitations in KNIME that make scripting tools a better choice for advanced use cases?
For those who use both, do you still lean on Python/R for certain tasks, even if KNIME can technically do them?
Any tips on when to stick with KNIME vs. switching to scripting?

I’m trying to understand if I’m missing something or if it’s just a matter of preference.

Thanks!

@aryaman_sharma this is a good question and I have attached two threads (with additional articles) and a link to a presentation by the KNIME CEO about the ‘philosophy’ behind KNIME.

The basic question is: how does one want to structure their data analytics processes and how should be involved. If you are Google and you have 10k highly trained people on standby to develop your tasks in every programming language available then maybe you will not use KNIME. But if you are a mid-size company with limited resources and want all your departments to have easy access to such tools - and maybe even want them to work together and share ideas and solution. Then KNIME is for you.

The same if you are a large company and you have a broad community and want all your people to speak the same language and work on a common platform then KNIME also is for you.

With people heavy into Coding I sometimes have the impression they so not want to use KNIME because it might make their work look too simple. Instead of pages and pages of (hopefully) ‘magic’ code you have a workflow that sort of documents itself and you can comment and other people can understand. Or it does look really AI-heavy if you spend hours on managing Python dependencies and yaml configurations instead of just loading an extension. And sometimes the management level is not fully aware of these challenges and just does want something with code.

In this regard KNIME sometimes is either stuck in the middle or in the perfect place - depending on how you see it.

With the advance of LLMs the skill of coding might spread and people might be less inclined to use a low-code tool. I just hope this will still be happy once they have to maintain the code.

I personally like to mix KNIME for ease of workflow and automation and access with some tasks that are very specific and can be done with R and Python.

If it is just for you: just use what you like. If it is for a company: best you have a strategy how to really include all stakeholders and give them the right tool and also make sure they can all use it - so not to loose the information and collaboration and expertise which is key to the success of data analytics, machine learning and the use of AI.



“The greatest data science books ever” by KNIME CEO Michael Berthold

(starting 07:00) “One hippo, all alone, calls two hippos on the phone.”

5 Likes

Yeah you will see a lot of complaints regarding low code tools e.g. in the data engineering/ data science subreddits. If you’re familiar with KNIME it’ll quickly become apparent that most of those posters have a very limited understanding of the tool or just repeat what they’ve heard (after their colleague dabbled in KNIME for an hour). In my view the best argument against KNIME is the adoption at the enterprise scale (although thankfully that’s changing fast!), especially for the advanced use cases. I can understand why data professionals want to learn tools that are being used at companies they apply to, where undoubtedly python/sql/r + cloud platforms dominate.

6 Likes

I’m using both KNIME and Python. And surprisingly both have their pros/cons.

In my case I use KNIME mostly to grep my data from all different sources, prepare tables and do some math. Quite often I stay in KNIME to do some plots with built in or python nodes. If the results should be shared to a broader audience I use Tableau for Dashboards. The combination of KNIME/KNIME Server/Tableau Server is just perfect :slight_smile:
If the data gets large and I must use iterative methods like row by row calculations or other numerical stuff I switch to Python. Mostly I start with a Python Script node and end up with a full Python scripts which is then used locally or on Databricks.

When I look around me many of my colleagues work in Python only as they get it tought when the start their “data scientist” story. But the coding is quite basic and takes a while. In my opinion they would run faster in KNIME today. BUT as the LLM Models like ChatGPT or these amazings IDEs like Windsurf or Cursor are wider used it get’s better and better.

5 Likes

Hi. I do most of my KNIME work to ETL data to SQL Server. Either from files or different dbs. I do ETL stuff in a way to limit data transfers between KNIME Business Hub and SQL Server to just the necessary ones:

  • If I can do all the transformations with the source data only - then whole process is in KNIME and nice and pretty data set lands in final table.
  • If my ETL needs additional dictionaries, joining, updates of large facts tables etc, then I split responsibility between KNIME and SQL stored procedure. KNIME does initial cleansing, loads data to to L0 table and triggers stored procedure which finishes the job and populates final table safely.
    Why? Mostly because I found out this method is fast - in my environment.

Now, on other ETL apps. Before KNIME I used Alteryx to do the same, but I had switched with no regrets. I prefer KNIME nodes - they seem to do less than ones in Alteryx, but this allows for greater flexibility. But mostly I prefer KNIME Business Hub over Alteryx Gallery.

3 Likes

Thanks for this helpful reply @ActionAndi . Are you able to say a bit more about your iterative row-by-row methods that you switch to Python for? How does Python help in those scenarios? I’m trying to get a feel for what people might use it for. Thanks

I think it’s more just because prejudice against low/no code.

Hi,

As an engineer, one of my primary responsibilities is working with measurement data, often in the form of time series. These time series can be quite noisy, requiring filtering or additional mathematical processing. While these tasks can be accomplished in KNIME, I prefer using Python due to its numerous convenient and efficient methods

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.