Hi @rkehrli! This might be one of my favorite threads here for an optimization nerd like myself! I have a few observations after reading the thread:
I can see a few issues with the setup. Throwing more hardware at the problem is definitely not ideal and will only get you so far. I see a few architectural flaws that can be addressed with nearly zero cost (provided you have some flexibility over that in your org).
1/ CPU desktop rendering : you correctly noted the server you’re using does not have an IGPU & CPU is used for desktop rendering. The performance will be disastrous ESPECIALLY on Windows ( which i’m guessing you’re running on guest vms since rdweb is used). I previously successfully used a remote headless vm on linux via xrdp. If it is possible at all to change the OS on the remote server’s guest vms, i’d opt for linux with a lightweight desktop environment like xfce & lower resolutions (e.g. 720P) – knime ap worked like a charm for me that way.
2/ Scheduled workflows & resource optimization : sounds like you need an orchestrator that will manage the concurrency & dependancy of your scheduled workflows. You need to be able to prevent resource over-allocation across tasks & have much better observability of your server’s resources, task failures etc. I’d highly recommend running AP workflows via NodePit Batch https://nodepit.com/product/batch & orchestrating them via Cronicle : https://cronicle.net/. This setup is free, & incredibly easy to work out for less technical users (as opposed to having something more elaborate like apache airflow). Cronicle can even scale to multiple machines so getting a 2nd ,3rd , 4th server & orchestrating all tasks across these servers would not be a problem (you can even specify to figure out the most optimal path across servers & orchestrate scripts to minimize ram usage/ maximize parallelism & speed etc).
3/ resource optimization at workflow design level : a few users in the thread pointed out that doing most of the transformations within the DB is the way to go. I understand your users are not SQL pros, but how comfortable are they using the DB nodes? How much of your workflow logic can be pushed to a DB vs knime native nodes? I would really recommend using state of the art olap engines like duckdb which very nicely integrates with knime via jdbc driver. Duckdb can also directly read from mysql/postgresql/sqlite.
Using knime-native nodes although super useful for rapid development/experimentation, have some serious drawbacks when running in production (no parallelism + eager execution aka materializing every transformation step).
Buying 2x powerful hardware will cost you a ton & you’ll only get 2x performance boost (at the very most & resource scaling often doesn’t happen linearly so gains would probably be sth like 1.5x). By using duckdb you can achieve 50x speedups for free with much lower resource usage & much better memory management (aka no crashes). I’d really recommend this approach for anything “greenfield”. if you can additionally afford some time to refactor a few resource-intensive knime-node workflows, this will give you an additional boost.
If you already have a lot of workflows built entirely with knime nodes, you can achieve some serious performance improvements by moving to spark with spark workflow executor with minimal workflow changes. This will give you both parallelism as well as lazy evaluation, although still likely 5-10x slower vs duckdb.
https://hub.knime.com/knime/extensions/com.knime.features.bigdata.knosp/latest
https://www.youtube.com/watch?v=LTEMNEluHAo in this video made by KNIME you can see how to set it up around 1:11:35 mark. This can be used with local big data environment but the version of AP would need to be kept below 5.3 (as per my post here : https://forum.knime.com/t/local-big-data-environment-workflow-executor-not-compatible-after-ap-version-5-3/89972/5)
hope this helps and let me know what you think about the above!