Impact of different OS on Knime workflows

Hello,

I am considering changing OS from Windows to Linux but would like to understand the impact on my workflows first.

Can a workflow created using Knime Analytical Platform on a Windows machine can be used on a Knime analytical platform installed on a Linux machine?
Are there any types of limitations at the distribution/ platform/ node levels?

Thanks

Nicolas

To my experience it is no problem. I can run workflows created on my windows machine and on my Ubuntu machine, and the other way around. I haven’t encountered any problems so far.
If you run Python nodes, be sure you have the same Python packages on both machines.

gr. Hans

5 Likes

Hi there @nba ,

I was using Windows for development while production was on Linux and everything was working fine as long as I was careful with paths :slight_smile:

Br,
Ivan

5 Likes

Hans, ipazin,

Thanks. Nicolas

1 Like

I use windows for workflow development for deployment on knime server running on linux. As @ipazin mentioned, you need to be careful with paths (both the actual path, i.e. C:\foo\bah versus /foo/bah and particularly the fact that the linux filesystem is case sensitive whereas windows isnt), but other than that, everything pretty much just works.

Steve

3 Likes

Hi @nba,

to the greater extend I agree to the previous comments. However, while I am working on Windows, Mac OS X and Amazon AWS WorkSpace for several years now, I do perceive a difference. Whilst performance is mainly driven by the machine specs, the main difference I noticed are related to the Knime UI. Please be wary, that is highly based on each individuals perception.

Windows establishes the most comfortable base line, followed by OS X (which is based on BSD) and then the Linux AWS WorkSpace. In the past years Knime crashed or hang more frequently on OS X and Linux than on Windows. The auto-save feature proved to be causing most of the issues when Knime was processing fast amount of data. I ended up deactivating it years ago in favor of stability and for the sake of workflows finishing quicker.

Most noticeable UI / UX difference can be perceived when connecting nodes. The AWS WorkSpace pretty much sucks here. Point, click, drag and drop of connections and drag & drop replace of nodes is really not as enjoyable as on Windows or OS X. Very likely this is impacted by some latency added due to session streaming.

Overall, I got used to all three environments and can conclude, each has it’s small pros and cons but perform very comparable.

Windows offers, in terms of hardware configuration, a very important benefit compared to OS X (closed system). Though, OS X is tailored better to drive more performance out of the hardware. Not to mention some additional topics which are not subject to this discussion. Linux offers a fast variance of UIs so the mentioned usability issues might vanish depending on your choice and config.

If you require a machine for heavy data lifting, hardware configurability is favored. So Windows or Linux. If overall system stability and recovery comfort are crucial compared with fewer system maintenance, I can recommend OS X. Again, most often it comes down to personell perception.

Kind regards
Mike

7 Likes

Hi Mike,

Many thanks for your detailed answer.

I am wondering if, as a user of AWS Workspace, you have noticed any gap in performance between Knime on AWS and on a local machine?

Some time ago I have posted on the forum (AWS Cloud Performance) about the despicable performance I was experiencing on AWS cloud using large fairly machines (z1x12 large, c5.12xlarge, m4x16 large) compared to my own computer. After a long search, I realized that performance problems on virtual machines configured with more vCPUs that actual cores are well documented, which could be one of the causes of the performance problems. But I am interested in having feedbacks from experienced AWS users.

Thanks, Nicolas

You are welcome. I did noticed a difference but that is neglectable. I compared my MacBook Pro (16 GB memory, 1 TB SSD, Quad Core i7 2,6) vs. AWS WorkStation of the type “Power” (4 vCPU, 16 GiB memory).

The workflow I compared is little complex, > one thousand nodes, and processes files of different size each with 60k to 300k rows and 20 to ca. 1.5k columns. So a fast spectrum

The “penalty” is around 20 to 30 %. Keep in mind that IOPS and overall throughout are crucial. AWS WorkStations use EBS volumes with a significant lower throughput than some SSD on the machine in front of you.

EBS of the type gp2 have a max. throughput of 250 MB/s per volume. The SSD on my machine is claimed to be at least two times faster.

Parallelism, especially when overdoing it, can easily drain IOPS / throughput. A fairly old study (2007), don’t take it as a reference, suggested to not run more than 4 threads.

My personell experience tells me that it’s about the balance of cores available that determine the parallel threads, memory required during execution and IOPS / throughput. In 90 % of all situations it’s about proper workflow design. Always challenge yourself if that can be done differently or ask “what does this and that cause to the system”.

Hardware wise I recommend to have one external SSD for data storage. A second fast SSD to save the workspace on so it’s separated from the OS. 16 GB of memory allows you to run even Chrome alongside Knime (I do so on my MBP!).

The C5 instance are nice because you’ve not got to wait for CPU credits to build up or do some fancy burst mode calculation to keep expenses in check. Though, bottle neck is, like pointed out before, the EBS volume. Maybe worth to scale down and test with provisioned IOPS. Amount of cores are not that important, eight are plenty.

Update: Recently discovered something strange Knime core files consuming lot of space

2 Likes

Hi Mike, Thanks for your feedback.

It is interesting that you observe a “penalty” between local and AWS machines. I agree with you that EBS volumes are notably slower than any newer (M.2 type) SSD disks, which make EBS volumes the “natural suspect” to explain performance degradation.

However, in my case, I had difficulties explaining the fact that half of the vCPU were sitting nearly idle. Moreover, the threads sitting idle were always coupled with a thread that was “active”, thus pointing to the fact that vCPUs belonging to the same core might be competing for shared core resources. Have you ever noticed that on your AWS workspace?

This is what led me to the previously mentioned fact that in virtual machines, the number of vCPU must be chosen carefully regarding the number of cores. It also led me to the understanding that Knime might natively be optimized for NUMA architectures with multiple CPUs. I have posted on that subject but did not get much feedback.

I planed some additional tests in the next two weeks to validate these hypotheses. I will keep in mind your suggestions.

As per your update, I did experience the same problem but did not inquire about the matter. My solution was always to increase the volume size.

Nicolas

So far, when I checked CPU consumption, all cores were used. Regardless of the OS. Did you adjusted the “maximum working threads for all nodes”?

There are also some tweaks, though unrelated to parallelism, you can make to the knime.ini. Check this article.

At very least you may use the Parallel Chunk Start / End nodes.

Increasing volume size comes in with higher throughput when using EBS gp2.

Happy testing :wink:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.