We’re looking at buying some higher end workstations to support our KNIME developers.
Sadly a server is not in the cards for the time being.
I’m trying to get a sense of relative priority for certain features, and make sure that we don’t purchase higher specifications than KNIME can make use of in a Windows installation.
We will be stuck with this solution for at least 4 years so, trying to be forward looking.
Key Questions re: Desktop Analytics Platform on Windows:
Is there a maximum number of Cores KNIME Desktop Analytics can take advantage of?
Is there a maximum number of Threads KNIME can take advantage of?
Is any guidance available on Core Frequency vs. Core Count?
– I ask because we’re vendor / model locked (sigh) so for our Xeon considerations as there is quite a price difference between slower 12C and faster 12C (for example).
Is there any guidance on specific processor instruction sets? I.e. does KNIME benefit from AVX 512 or other specialized instructions?
Is there any specific guidance on maximum RAM that can be taken advantage of? 128 or 256 or greater?
Any specific guidance storage setup / performance requirements - intending to go full NVME for local work.
I will say I’m pretty knowledgeable about hardware and you raise very good questions. The answers however depend on the workload or mix of the workload.
Cores / Threads and Frequency
Depends on what nodes and extensions you use. Many nodes (algorithms) are simply single-threaded so more cores won’t help you and in fact sever cpus can be a bad choice as they tend to clock a lot lower.
Note that certain windows versions limit number of cores and with AMD threadripper CPUs you can for sure run into these limits. This is what I would also recommend. AMD Threadripper (up to 64 cores, 128 threads)
Amount of cores depends on usage as well. I’m not sure if you plan to give each developer a separate workstation or a single machine with shared access? In the later case you will for sure need more cores and especially more RAM (how man people can actually use a windows machine at the same time?)
Unless you have a very specific software in mind that you can use from KNIME, you use in most workflows and gives huge speed-ups, forget about AVX-512. Maybe you have to check prices again but about a year ago you got about double the amount of cores by buying AMD vs intel. Meaning in almost all cases the AMD one will be a lot faster and and not that far behind in AVX-512 software.
(If you are using KNIME from windows I doubt you will compile 3rd party or your own software on the machine to target specific instructions. And even for binaries like python packages you need to really take care from were they come. Eg. conda and specifically anaconda channel or else it is usually not built vs intel MKL and not AVX-512 enabled)
RAM depends on your workload (amount of data) and how many of your KNIME developers will be using the machine at the same time. If you work with millions of rows and have 5 people connected then even 256 GB might be on the low side (note also check max amount for the specific CPU/mainboard). If it is tens of thousands of rows and 1 or 2 people at same time 64 GB would be enough (or even less really).
For storage I agree, NVME all the way. But don’t cheap out on them. Take the more expensive versions. there are cheap nvme drives with QLC or TLC cells and not dram buffer. they can be slower than a SATA ssd. I think ssd performance is very important for knime especially for the “data cleaning” steps.
Addressable RAM/Core will more be a hardware limitation and which version of windows you have. Why? because workstation hardware is cheaper than server hardware but has certain artificial limits imposed like how much RAM they can support. Same for Windows versions.
I’m strictly in the “PC ecosystem” so can’t comment too much on Apple especially not the software part as in only what I have read and not by actually usage myself.
M1and I would assume so will be M2 are superior in terms of efficiency compared to x86. I don’t know latest status but there were for sure compatibility issues with software most notably python libraries. In essence you were still better off or forced to use x86 version via Rosetta with according performance reduction. I would say the typical early adopter issues. I personally would not want to deal with such issues at work.
For KNIME/Java don’t really know if M1/M2 have their own Java VM implementation, use a generic ARM one or if it uses x86 + rosetta. The ssd performance is very good which will certainly help with KNIME.
Apple with their more and more closed-down ecosystem is simply a no-go for me. We all complained about MS but even in their worst day on some level windows was pretty open as you could install whatever from wherever you wanted. You also have less choice and at a higher price.
The only aspect I would add to this is cooling and power. If you are going to run your computers hard then you may want to consider how you are going to power and cool everything to prevent thermal throttling of components. I’ve added a large power supply and water cooling to my workstation and it keeps things stable and quiet - there is nothing worse than noisy computer.
In addition you may also want to look at networking - you can store data locally on the file system, but better to have network storage and databases. In which case you may want low latency GbE with a chipset that offloads processing from the CPU.
You also need to think about GPU - are you going multi-monitor/4K? With the way that KNIME is going with Python nodes, it may be worth considering CUDA support if you want to benefit from GPU accelerated algorithms implemented in Python, but integrated into KNIME.
In reality, the specification of the machine is as diverse as the workload. Different workloads drive the hardware in different ways. Different algorithms have different impacts on the hardware - some algorithms can work on chunks of data at a time, some are single thread only, others can distribute their workload around all the cores and use memory efficiently. Sometimes it is better to perform operations directly on the database, other times on the local machine.
My apologies if this is a bit vague in answering the question directly, but it really depends upon what you want to do.
Have you considered using Amazon Web Service or Azure to rent compute and storage capacity for a couple of months to determine what your needs are? It is something we have done previously, where we rented compute capacity accessed using remote desktop and fine tuned the specification as we went. It will save you having to commit to a capital purchase before you know what you need.
The reminder about most nodes being single threaded is really helpful.
That helps me focus on higher frequency lower core options at the same price point
Our organization is vendor locked so I’m stuck with Intel.
I may sadly even be Workstation model locked, but am trying to fight that.
I’m stuck with whatever we buy for at least 4 years.
I agree current uses are limited, but the recent AMD investor day announcements indicated AMD is also going AVX512 in their next generation, so would like to ensure I’m future proofed a bit as adoption increases.
This is my workstation, and likely same specs for any subsequent staff.
We’re already at hundreds of millions of rows.
I just wanted to make sure if I went for 128/256 that KNIME can actually use it.
Looking forward for that dev answer.
Agreed, it will be enterprise grade.
Will of course use appropriate Windows license, but it’s a good reminder for others who may not be aware of the issue.
re: Cooling and Power
Totally agree and this is an excellent reminder, but sadly out of my hands.
I’m vendor locked so I have to use a pre-built workstation.
I have worked in a previous environment where they underspec’d PSU and Cooling and their Threadripper would randomly shut off during training, but their IT was convinced they knew what they were doing. sigh.
Agreed, pretty sure I’m locked to 1GBE but I will see if I can go 2.5 or 10.
That’ll unfortunately be outside the budget of this workstation, but may be possible in a subsequent year. I’m specing to ensure the PSU is sufficient.
re: Workload and Server Side vs. Local Processing
This is another good flag.
We server side process where we can, but frankly the organizational capabilities are a bit limited in this space at the moment.
Most things are sadly going to end up processed locally beyond simple aggregations and counts.
No worries, it is an extremely broad use case.
re: AWS / Azure / GCP
This is another good suggestion I also thought of that that I’m sadly organizationally and jurisdictionally prevented from pursuing right now. My industry also has extra sensitivities in that space.
Setting aside organizational privacy/security requirements, none of our data is in the cloud. So even being able to perform the analysis and data integration I’d want would be a significant challenge architecturally.
I’m hoping this will change in the future.
At that point I will probably actually push for a KNIME Server to be stood up in the cloud alongside our data, but we’re just not there… yet.
Even with Intel i argue that I would actual prefer the workstation cores (HEDT) as they tend to clock higher than server parts. On the other hand intel hasn’t updated that line for quiet a while now. Not sure if they are even still available or if you are looked into xeons. Core i9-10980XE would be an option if still available (it clocks much higher than never xeon parts). If you need to go with a xeon, it will likley need to be a xeon gold if you want to reach 4 ghz or higher (this likley cost about 4xtimes as much as the i9-10980XE)
In regards to RAM below blog might be interesting:
In essence yes you should be able to use all that RAM but it will likley require adjusting the defaults.