AWS Cloud Performance

mwiegand · October 22, 2019, 6:20pm

I recently exchanged in Impact of different OS on Knime workflows with another Knimer about this. Maybe worth to read through it.

As a brief recap. Overdoing it with too much parallelism will create a lot of bottle necks. It is less about cores, threads and memory but more about IOPS and throughput.

Each Knime node caches its data in the workflow directory. This can easily grow into several dozen GB. Using Don’t save Start / End nodes might help. You might also configure Knime in the ini-file to better leverage the memory, save data uncompressed.

Diving very complex workflows into smaller ones, calling each one in combination with a garbage collector or make use of streaming execution are other good way to manage system resources.

My personell remark, optimize your workflows. I frequently run through five or more iterations to come to new, interesting and novel approaches. I.e.pivoting, joining or multi-rule evaluations are very computational demanding. Unpivoting greatly decreased complexity. Always ask yourself “How can I divide the problem into even smaller pieces”.

Cheers
Mike