Big data processing - the free option?

zizoo · July 29, 2019, 9:03am

Hello,
I am a beginner with big data processing and the cloud concept and I see various options in Knime. But I am not sure which one are free to use for academia and when they can give better performance than the use of common tables in the workflow. I have hundreds of thousands of samples.
Thanks,

ScottF · July 29, 2019, 7:41pm

Hi @zizoo -

The good news is that the Big Data functionality in KNIME is free for everyone regardless of whether you are in the corporate world, in academia, or just an individual user! All that is required is to download appropriate extensions for what you’re trying to do (Spark, file handling, local big data, etc). The KNIME Hub is super helpful for identifying the extensions you need.

As far as when Big Data algorithms will give better performance… that depends on the data and the use case. Usually when folks refer to big data, they are referring to datasets on the order of millions of rows at least - sometimes much larger.

KNIME’s Create Local Big Data Environment node gives you the ability to set up a local cluster on your laptop very easily. Using it, you can try all sorts of database & Spark processing on a toy dataset. Then, when it comes time to try things out on your actual data, you just replace the LBDE node with separate nodes that point to, for example, an Amazon instance where your big data lives.

If you have more details about what you’re trying to do, I’m sure folks would be happy to share ideas about how to approach analyzing your dataset.

beginner · July 30, 2019, 4:53am

KNIME works fine for millions of rows assuming you don’t have thousands of columns or these aren0t images or other “high content data”. big data is for large orgs like Google or MS and most people don’t really need it.
Also you will need to have a cluster setup to connect to. If you run a local environment, it’s useless as you don’t get more resources.

system · January 28, 2020, 4:53pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.