I need to convince someone that distributed processing is better than using just one executor.
So, I’m wondering what the benefits of Distributed Executors.
Let me show an example.
Assuming I have a license of 24 cores for KNIME Server Large, in terms of speed for 3 workflow jobs, what’s the difference between operating on one Executor assigned 24 cores and operating on three Executors assigned 8 cores each?
What I know is that even if I use only one Executor, it will run all the jobs at the same time because it treat the workflow jobs as ‘parallel’.
So I don’t think there is much difference between operating multiple jobs with one executor and with distributed processing, so if then I wonder whether do I need to use distributed processing in terms of performance?
I would say the big difference is more in being able to manage scaling and complex software environments over raw performance given the same number of license tokens.
For example, if you have a massive executor in the cloud that you only need periodically, you can start it up when you need to be running a job and shut it down afterwards. This can greatly reduce your hosting costs.
Thank you for your reply.
Your opinion about reducing the hosting costs is a nice benefit in terms of management.
But, I need to get some information of benefit in terms of performance.
Is there any more difference that you think?
In addition to what Aaron just said, distributed executors also ensure that resources are distributed evenly across jobs. If you only have a single executor, you could potentially run a very large job that consumes all available resources, leaving nothing for other jobs that you might want to run at the same time. With multiple executors, a job can at most occupy the resources of one executor, while the other executors will still accept new requests.
Roland has a good point there.
It may also be worth considering that while the isolation is (very) nice to have, the total performance of the system will likely be slower in your specific case than with a single larger instance. This is for two reasons:
The extra network trip between KNIME Master node and Executor introduces some small but occasionally noticeable lag.
If the jobs are not balanced in terms of runtime, the executors that finish first can’t contribute compute power to the other remaining jobs, increasing the total runtime compared to a single instance.
Hi @Aaron_Hart and @RolandBurger,
Thanks for your opinions.
I didn’t know because I haven’t operated a large amount of data yet.
I’m sure when the Job’s size is more huge, I think the Distributed Executor will be able to display its ability, as you guys said.
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.