Using GPGPU for Parallel Execution


I most probably missed something hidden in the extensions but I'm looking for a way to harness GPGPU for my KNIME workflows. Is there a CUDA oder OpenCL support node for parallel execution? Or did anyone try anything like that before?




For parallel execution I doubt there will be a generic solution as that means that it requires to reimplement all algorithms using CUDA or OpenCL. There might be enthusiasts -or professionals/students-, who have implemented some specific algorithms. May I ask which algorithm(s) would you like to use?

Cheers, gabor

I did not have particular algorithms in mind when posing this question. Having used the CPU parallel execution nodes I encountered the shortfalls but also learned to love the speed. In my simple world, I could easily swap the CPU parallel execution start and end nodes for the GPGPU version utilizing the GPU but also - of course - having to think about what part of my workflow makes sense for massive parallelization after all.

We did quite a few experiments with GPGPU about two years ago and the results were disappointing. The problem (back then) was that from Java you first need to copy (and convert) the data structures into the C world and then copy them onto the GPU. And back again. This caused so much overhead that it wasn't faster at all. Maybe this has change by now with newer libraries but then again normal CPUs are also having more and more cores and it's much easier and efficient to use those.

What kind of experiments were run back then? GPGPU computing is useful tasks where there is alot of iterative processing on data such that the processing time using CPUs would far exceed the data conversion and transfer time.

We should target only certain operators, such as the neural network, SVM, and perhaps the K-Means as the low hanging fruit.

In the meantime though, we can always use python nodes in KNIME to invoke GPGPU frameworks. 

That does sound tempting although it sort of misses the idea of simplification by using a UI like KNIME if I had to source the vital parts of my workflow out into coded world.

Yet I wonder how even modern CPU archictures of about 4 to 6 physical cores can possibly offset the - despite all inefficiency through translation - hundreds and thousands of shaders in consumer GPUs for - admittedly - specific algorithms mentioned that profit heavily from massive parallelization?

Yeah I think it would be great if KNIME used some GPGPU stuff. IMHO it must be OpenCL (and not CUDA) so it is more generally applicable especially because every modern client PC processor would benefit because they mostly do have an iGPU in them.

As thor said, yeah there will always be a ton of overhead. Still for time consuming functions I'm pretty sure it will be worth it. Sadly that often means the Node Developers and their software they use need to enable GPGPU first. KNIME base nodes, I don't know how much they would profit.

Also many KNIME nodes are not parallel and could first be parallelized for CPU. Another issue is that there is no support from Java for many modern CPU extensions, eg any form of vetorization like SSE or AVX. You need to do that in C/C++ as well. Same goes for Intels TXT extensions and so forth. Here it might be also nice if some critical parts could be coded in C/C++ and vectorized.

As normal software is usually at least 10 years behind the hardware.

With KNIME 3.2 I realized the now available library Deeplearning4J is the first to offer CUDA supported (has to be activated!) deep learning algorithms! (

This is what I was looking for...gonna be testing a lot now. :)

Thanks to the KNIME team! (Missed you on the large ecosystem picture:

While it's a step in the right direction, I would greatly favor open-stanards (eg. OpenCL) over proprietary, closed source. OpenCL would also run on Intel iGPU which almost every client PC will have anyways.